991108.11 | B | A | R. Brender | Fortran | Fortran90 arrays |
Issues 991026.1, 991108.11 (previous
text), and 991108.12 are replaced by
the following proposal.
1) New stack machine operator DW_OP_push_object_address
Push the address of the "current object" on the Dwarf stack. This
object may correspond to an independent variable or be a component
of an array or record/struct/class [whose address is known only to
the debugger as a result earlier expression evaluation steps].
Has no operand.
Note: the availability of this operation makes it *unnecessary* to
introduce any notion of implicitly pushing the base of an array
on the Dwarf stack (by analogy with records).
2) New attribute DW_AT_data_location
Optional attribute for use with any DIE that describes a type. Takes a
block FORM of operand, which is interpreted as a stack machine that
computes the data address of the storage for an object of that type.
If this attribute is not present, then the data address is the same as
the object address.
This attribute will typically begin with a DW_OP_push_object_address
operation.
This attribute caters to implementation techniques that use a
descriptor in combination with array or other type objects that typically
involve some kind of explicit allocation/deallocation. In this model, the
"object" corresponds to the descriptor and the "data"
corresponds to the
dynamically managed storage. The descriptor may be adjacent to the data
or may include components that point to the data (directly or indirectly).
Note: Use of this attribute is not limited just to "two part"
types.
It is useful in other cases as well.
3) New attribute DW_AT_stride
This attribute is optionally allowed on either an DW_TAG_subrange_type
or DW_TAG_enumeration_type that is a bound for an array type.
If present, it specifies the number of bytes of memory between
successive elements of the given dimension (which supercedes the
stride that might otherwise be implied from, for example,
DW_AT_stride_size, the ordering, and so on).
It has one operand. Interpretation of the operand depends on its FORM,
as follows:
FORM
Interpretation
----
--------------
constant The
value of the constant is the value of the stride.
reference The value
"points to" the DIE for an entity whose
contents is the value of the stride.
block
A Dwarf stack machine that computes the
*value* of the
stride.
Note: A Dwarf stack machine may well include use of the
DW_OP_push_object_address operation.
Note: The stride can be negative.
4) New attributes DW_AT_allocated and DW_AT_associated
These attribute are optionally allowed for any DIE that describes a
type. The presence of the DW_AT_allocated attribute implies that the
object of that type has the F90 ALLOCATABLE property (or its analog if
used for other languages). The presence of the DW_AT_associated attribute
implies that the object of that type has the F90 POINTER property (or
its analog if used for other languages). If both are present, then
the object is assumed to have the F90 POINTER property (rather than
ALLOCATABLE), but the DW_AT_allocated may optionally be used by the
implementation to indicate the allocation status of the object (that
is, whether the currently associated storage resulted from execution
of an ALLOCATE statement rather than pointer assignment). If neither
is present, then the object has neither of these properties.
These attributes have one operand, whose value indicates whether the
data is allocated or not. This operand is interpreted in the same way
as for DW_AT_stride (see above).
These attributes result in a boolean value as follows: non-zero => true,
0 => false.
[I see no reason for separate DW_AT_allocatable or DW_AT_pointer
"flag" (no operand) attributes in addition.]
5) New operand class block (stack machine) allowed for DW_AT_lower_bound,
DW_AT_upper_bound and DW_AT_count attributes
The Dwarf2 spec currently only allows/defines the use of operands of
class constant and reference. Add class block (for a stack machine)
as an additional alternative; in this case, the result of executing
the stack machine (the value on the top of the stack) is the *value*
of the attribute [not its address].
Examples
-------------
Note: as indicated in my mail of 2 March re "constant class
operand for
DW_AT_data_member_location", my use of the likes of
DW_AT_data_member_location(constant 4)
in the following examples is currently not legal Dwarf2. My proposal in that
same would make it legal, so I leaving this usage in the hopes that that
separate proposal will also be approved.
Main Example
------------
The key example posed by Dave Anderson and Jim Crownie is this:
type array_ptr
real :: myvar
real, dimension (:), pointer :: ap
end type array_ptr
type (array_ptr), allocatable, dimension (:) :: arrays
allocate (arrays(20))
do i = 1,20
allocate (arrays(i)%ap(i+10))
end do
For allocatable and pointer arrays, it is essentially required by the F90
semantics that each array consist of two parts: let me call them 1) the
descriptor and 2) the raw data. A descriptor has often been called a dope
vector in other contexts (although it is not always a vector, more likely
a structure/record, and the origin of "dope" is probably lost in the mists
of time). Because there are two parts, and the lifetime of the descriptor
is necessarily longer (includes) that of the raw data, there must be an
address somewhere in the descriptor that points to the raw data -- when
there is some, that is, when the "variable" is allocated or associated.
For concreteness, let me posit that a descriptor looks something like the
following C struct -- however, it is a goal of the proposed design that 1)
a debugger needs no builtin knowledge of this structure and 2) there doesn't
even need to be be an explicit representation of this structure in the
DWARF2 input to the debugger.
struct desc {
void * base;
// pointer
to raw data
long
el_len;
int
assoc : 1;
int
ptr_alloc : 1;
int
num_dims : 6;
struct dims_str {
// For each dimension...
long low_bound;
long upper_bound;
long stride;
} dims[63];
};
In practice, of course, only test systems have arrays with as many as 63
dimensions, so "real" descriptors have dimension substructures only for as
many dimensions as are specified in the num_dims component. (Imagine that the second to
last line was instead written as
} dims[num_dims];
C does not allow this sort of thing, but other languages do -- and it doesn't
really matter because we are not going to describe this structure to the
debugger in any case.)
Because these arrays come in two parts, we have to be very careful about
how we talk about them. In particular, the "address of the variable" or
equivalently, the "base address of the object" *always* refers to the
descriptor! Always!!
For arrays that do not come in two parts (non-allocatable, non-pointer arrays),
an implementation has a choice: it can provide a descriptor anyways, thereby
giving it two parts, thereby making it just like the others -- which may be
very convenient for general runtime support (I/O or the like) unrelated to
debugging -- in which case the above vocabulary applies as stated. Or, it can
do without a descriptor, in which case the "address of the variable" or
equivalently the "base address of the object" refers to the "raw data"
(the
real data, the only thing around that can be the object!).
Forgive me if I sound pedantic -- keep this vocabulary straight and I think
most of the rest of the presentation follows pretty clearly (I hope).
The F90 derived type array_ptr can now be redescribed in C-like terms that
exposes some of the represention as in
struct array_ptr {
float myvar;
desc<1> ap;
};
Similarly for arrays:
desc<1> arrays;
I wrote "desc<1>" to indicate the 1-dimension version of desc. Since the
number of dimensions is compile-time known and constant, the exact version
with fixed compiletime known size can be used.
Finally, I will use this notation:
sizeof(type) size in
bytes of entities of the given type
offset(type, comp) offset in bytes of the comp component
within
entities of the given type
The Dwarf2 description is now
1$: DW_TAG_array_type
! No name, default (F90) ordering, default
stride_size
DW_AT_type(reference to basetype REAL)
DW_AT_associated(machine=
! Test raw data address for non-zero
DW_OP_push_object_address
DW_OP_deref
DW_OP_lit0
DW_OP_ne)
DW_AT_data_location(machine=
! Get raw data address
DW_OP_push_object_address
DW_OP_deref)
2$: DW_TAG_subrange_type
! No name, default stride
DW_AT_type(reference to basetype INTEGER)
DW_AT_lower_bound(machine=
DW_OP_push_object_address
DW_OP_lit<n>
!
where n ==
! offset(desc, dims) +
! offset(dims_str, lower_bound)
DW_OP_add
DW_OP_deref)
DW_AT_upper_bound(machine=
DW_OP_push_object_address
DW_OP_lit<n>
! where n ==
! offset(desc, dims) +
! offset(dims_str, upper_bound)
DW_OP_add
DW_OP_deref)
!
! Note: for the m'th dimension, the second
operator becomes
! DW_OP_lit<x> where
! x == offset(desc, dims) +
! (m-1)*sizeof(dims_str) +
! offset(dims_str, [lower|upper]_bound)
! That is, the stack machine does not get
longer and longer
! for each successive dimension (other than to
express the
! larger offsets involved).
3$: DW_TAG_structure_type
DW_AT_name("array_ptr")
DW_AT_size(constant 4 + sizeof(desc<1>))
4$: DW_TAG_member
DW_AT_name("myvar")
DW_AT_type(reference to
basetype REAL)
DW_AT_data_member_location(constant 0)
5$: DW_TAG_member
DW_AT_name("ap");
DW_AT_type(reference to 1$)
DW_AT_data_member_location(constant 4) ! Assume
sizeof(REAL)==4
6$: DW_TAG_array_type
! No name, default (F90) ordering, default
stride_size
DW_AT_name("arrays")
DW_AT_type(reference to 3$)
DW_AT_allocated(machine=
! Test raw data address for
non-zero
DW_OP_push_object_address
DW_OP_deref
DW_OP_lit0
DW_OP_ne)
DW_AT_data_location(machine=
! Get raw data address
DW_OP_push_object_address
DW_OP_deref)
7$: DW_TAG_subrange_type
! No name, default
stride
DW_AT_type(reference to
basetype INTEGER)
DW_AT_lower_bound(machine=
DW_OP_push_object_address
DW_OP_lit<n>
! where n
== ...
DW_OP_add
DW_OP_deref)
DW_AT_upper_bound(machine=
DW_OP_push_object_address
DW_OP_lit<n>
! where n
== ...
DW_OP_add
DW_OP_deref)
8$: DW_TAG_variable
DW_AT_name("arrays")
DW_AT_type(reference to 6$)
DW_AT_location(machine=
...as appropriate...)
! Assume static allocation
That covers the Dwarf2 description.
Now, suppose the program has executed and we are stopped immediately
following completion of the do loop. Suppose the user enters the
following debug command:
dbg> print arrays(5)%ap(2)
Interpretation of this expression is now straightforward (he says with a
smile).
1) Lookup name arrays. We find that it is a variable, whose type is given by
the unnamed type at 6$. Notice that it has an array type.
2) Find the 5th element of that array object. To do array indexing we
need several pieces of information:
a) the address of the array storage
b) the lower bounds of the array
[If we wanted to check that 5 is within bounds we
would need the
upper bound too, but we'll skip that for this
example]
c) the stride size
For a), check for a DW_OP_data_location attribute. Since there is one, go
execute the stack machine, whose result is the address we need. The object
address used in this case is the object we are working on, namely the
variable named "arrays", whose address we found in step 1).
[Had there been no DW_OP_data_location attribute, the desired address
would be the same as the address from step 1.]
For b), for each dimension of the array (only one in this case), go interpret
the usual lower bound attribute. Again this is a stack machine, which again
begins with DW_OP_push_object_address. This object is *still* arrays, from
step 1). [We haven't begun to actually perform any indexing yet.]
For c), the default stride size applies. Since there is no DW_AT_stride
attribute, use the size of array element type, which is the size of type
array_ptr (at 3$).
Having acquired all the necessary data, we perform the indexing operation
in the usual manner -- which has nothing to do with any of the attributes
involved up to now. Those just helped provide the actual parameters to the
indexing step.
The result, of course, is an object within the memory that was dynamically
allocated for arrays.
3) Find the ap component of the object just identified, whose type is
array_ptr.
This is a conventional record component lookup and interpretation. It happens
that the ap component in this case begins at offset 4 from the begining of
the containing object. ap has the unnamed array type defined at 1$ in the
Dwarf symbol table.
4) Find the 2th element of the array object found in step 3. To do array
indexing we need several pieces of information:
a) the address of the array storage
b) the lower bounds of the array
[If we wanted to check that 2 is within bounds we
would need the
upper bound too, but we'll skip that for this
example]
c) the stride size
This is all just like what we did in step 2), so I won't write out the
details. Suffice it to note that the object address of interest here is
the address that resulted from step 4).
Note: we happen to be accessing a pointer array here instead of an allocatable
array; but because we chose a common underlying representation, the
mechanics are the same. We could have chosen a completely different
descriptor arrangement and the mechanics would still be the same -- only
the stack machines would be different to reflect the different arrangement
of fields.
Example 2
---------
To show the flexibility of these new attributes and operators, let me also
present the Ada example used in my mail of 11 Feb.
M : INTEGER := <exp>;
type REC1 is record
VEC : array (1..M) of INTEGER;
end record;
type REC2(N : INTEGER range 1..100) is record
VEC : array (1..N) of INTEGER;
end record;
OBJ2B : REC2;
The Dwarf2 representation should be about like so
1$: DW_TAG_variable
DW_AT_name("M")
DW_AT_type(reference to basic type INTEGER)
2$: DW_TAG_array_type
! No name, default (Ada) order, default stride
DW_AT_type(reference to basic type INTEGER)
3$: DW_TAG_subrange_type
DW_AT_type(reference to basic type INTEGER)
DW_AT_lower_bound(constant 1)
DW_AT_upper_bound(reference to variable M, at
1$)
4$: DW_TAG_structure_type
DW_AT_name("REC1")
DW_TAG_member
DW_AT_name("VEC")
DW_AT_type(reference to
unnamed array type at 2$)
The first part above is straightforward and needs no new mechanism.
5$: DW_TAG_subrange_type
DW_AT_type(reference to basic type INTEGER)
DW_AT_lower_bound(constant 1)
DW_AT_upper_bound(constant 100)
6$: DW_TAG_structure_type
DW_AT_name("REC2")
DW_TAG_member
DW_AT_name("N")
DW_AT_type(reference to
unnamed subtype at 4$
DW_AT_data_member_location(machine= ! Possibly omitted?
DW_OP_nop)
7$: DW_TAG_array_type
! No name, default (Ada) order, default stride
! Default data location
DW_AT_TYPE(reference to basic type INTEGER)
8$: DW_TAG_subrange_type
DW_AT_type(reference to
subrange type at 5$)
DW_AT_lower_bound(constant 1)
DW_AT_upper_bound(machine=
DW_OP_push_object_address,
DW_OP_lit<n>
! where n
==
! offset(REC2, VEC) - offset(REC2, N)
DW_OP_neg
DW_OP_add)
! computes address of N given VEC
9$: DW_TAG_member
DW_AT_name("VEC")
DW_AT_type(reference to
unnamed array type at 8$)
DW_AT_data_member_location(machine=
DW_OP_lit<n>
! where n
== offset(REC2, VEC)
10$: DW_TAG_variable
DW_AT_name("OBJ2B")
DW_AT_type(reference to
type REC2 at 6$)
DW_AT_location(...as
appropriate...)
The interesting aspects about this example are
1) The array VEC is "immediately" contained within structure REC2 (there
is no intermediate descriptor or indirection), which is reflected in
the absence of a DW_AT_data_location attribute on the array type at 7$.
2) One of the bounds of VEC is nonetheless dynamic and part of the same
containing record, so must be gotten to using an address calculation
relative to the VEC object (component of REC2).
Hopefully there is no need to walk thru the interpretation of a debugger
command such as
debug> print OBJ2B.VEC(3)
but I will if requested.
Hopefully I have not made many typos or other distracting errors in preparing
these examples and all is now completely clear.