Issue 991108.11

991108.11

R. Brender

Fortran

Fortran90 arrays

Issues 991026.1, 991108.11 (previous text), and 991108.12 are replaced by
the following proposal.

1) New stack machine operator DW_OP_push_object_address

   Push the address of the "current object" on the Dwarf stack. This
   object may correspond to an independent variable or be a component
   of an array or record/struct/class [whose address is known only to
   the debugger as a result earlier expression evaluation steps].

   Has no operand.

   Note: the availability of this operation makes it *unnecessary* to
   introduce any notion of implicitly pushing the base of an array
   on the Dwarf stack (by analogy with records).

2) New attribute DW_AT_data_location

   Optional attribute for use with any DIE that describes a type. Takes a
   block FORM of operand, which is interpreted as a stack machine that
   computes the data address of the storage for an object of that type.
   If this attribute is not present, then the data address is the same as
   the object address.

   This attribute will typically begin with a DW_OP_push_object_address
   operation.

   This attribute caters to implementation techniques that use a
   descriptor in combination with array or other type objects that typically
   involve some kind of explicit allocation/deallocation. In this model, the
   "object" corresponds to the descriptor and the "data" corresponds to the
   dynamically managed storage. The descriptor may be adjacent to the data
   or may include components that point to the data (directly or indirectly).

   Note: Use of this attribute is not limited just to "two part" types.
   It is useful in other cases as well.

3) New attribute DW_AT_stride

   This attribute is optionally allowed on either an DW_TAG_subrange_type
   or DW_TAG_enumeration_type that is a bound for an array type.
   If present, it specifies the number of bytes of memory between
   successive elements of the given dimension (which supercedes the
   stride that might otherwise be implied from, for example,
   DW_AT_stride_size, the ordering, and so on).

   It has one operand. Interpretation of the operand depends on its FORM,
   as follows:

        FORM          Interpretation
        ----          --------------

        constant      The value of the constant is the value of the stride.

        reference     The value "points to" the DIE for an entity whose
                      contents is the value of the stride.

        block         A Dwarf stack machine that computes the *value* of the
                      stride.

                     Note: A Dwarf stack machine may well include use of the
                     DW_OP_push_object_address operation.

   Note: The stride can be negative.

4) New attributes DW_AT_allocated and DW_AT_associated

   These attribute are optionally allowed for any DIE that describes a
   type. The presence of the DW_AT_allocated attribute implies that the
   object of that type has the F90 ALLOCATABLE property (or its analog if
   used for other languages). The presence of the DW_AT_associated attribute
   implies that the object of that type has the F90 POINTER property (or
   its analog if used for other languages). If both are present, then
   the object is assumed to have the F90 POINTER property (rather than
   ALLOCATABLE), but the DW_AT_allocated may optionally be used by the
   implementation to indicate the allocation status of the object (that
   is, whether the currently associated storage resulted from execution
   of an ALLOCATE statement rather than pointer assignment). If neither
   is present, then the object has neither of these properties.

   These attributes have one operand, whose value indicates whether the
   data is allocated or not. This operand is interpreted in the same way
   as for DW_AT_stride (see above).

   These attributes result in a boolean value as follows: non-zero => true,
   0 => false.

   [I see no reason for separate DW_AT_allocatable or DW_AT_pointer
   "flag" (no operand) attributes in addition.]

5) New operand class block (stack machine) allowed for DW_AT_lower_bound,
   DW_AT_upper_bound and DW_AT_count attributes

   The Dwarf2 spec currently only allows/defines the use of operands of
   class constant and reference. Add class block (for a stack machine)
   as an additional alternative; in this case, the result of executing
   the stack machine (the value on the top of the stack) is the *value*
   of the attribute [not its address].

Examples
-------------

Note: as indicated in my mail of 2 March re "constant class operand for
DW_AT_data_member_location", my use of the likes of

DW_AT_data_member_location(constant 4)

in the following examples is currently not legal Dwarf2. My proposal in that
same would make it legal, so I leaving this usage in the hopes that that
separate proposal will also be approved.

Main Example
------------

The key example posed by Dave Anderson and Jim Crownie is this:

    type array_ptr
    real :: myvar
    real, dimension (:), pointer :: ap
    end type array_ptr

    type (array_ptr), allocatable, dimension (:) :: arrays

    allocate (arrays(20))

    do i = 1,20
        allocate (arrays(i)%ap(i+10))
    end do

For allocatable and pointer arrays, it is essentially required by the F90
semantics that each array consist of two parts: let me call them 1) the
descriptor and 2) the raw data. A descriptor has often been called a dope
vector in other contexts (although it is not always a vector, more likely
a structure/record, and the origin of "dope" is probably lost in the mists
of time). Because there are two parts, and the lifetime of the descriptor
is necessarily longer (includes) that of the raw data, there must be an
address somewhere in the descriptor that points to the raw data -- when
there is some, that is, when the "variable" is allocated or associated.

For concreteness, let me posit that a descriptor looks something like the
following C struct -- however, it is a goal of the proposed design that 1)
a debugger needs no builtin knowledge of this structure and 2) there doesn't
even need to be be an explicit representation of this structure in the
DWARF2 input to the debugger.

    struct desc {
        void *     base;               // pointer to raw data
        long        el_len;
        int         assoc : 1;
        int         ptr_alloc : 1;
        int         num_dims : 6;
        struct     dims_str {          // For each dimension...
            long low_bound;
            long upper_bound;
            long stride;
        } dims[63];
    };

In practice, of course, only test systems have arrays with as many as 63
dimensions, so "real" descriptors have dimension substructures only for as
many dimensions as are specified in the num_dims component. (Imagine that the second to last line was instead written as

        } dims[num_dims];

C does not allow this sort of thing, but other languages do -- and it doesn't
really matter because we are not going to describe this structure to the
debugger in any case.)

Because these arrays come in two parts, we have to be very careful about
how we talk about them. In particular, the "address of the variable" or
equivalently, the "base address of the object" *always* refers to the
descriptor! Always!!

For arrays that do not come in two parts (non-allocatable, non-pointer arrays),
an implementation has a choice: it can provide a descriptor anyways, thereby
giving it two parts, thereby making it just like the others -- which may be
very convenient for general runtime support (I/O or the like) unrelated to
debugging -- in which case the above vocabulary applies as stated. Or, it can
do without a descriptor, in which case the "address of the variable" or
equivalently the "base address of the object" refers to the "raw data" (the
real data, the only thing around that can be the object!).

Forgive me if I sound pedantic -- keep this vocabulary straight and I think
most of the rest of the presentation follows pretty clearly (I hope).

The F90 derived type array_ptr can now be redescribed in C-like terms that
exposes some of the represention as in

    struct array_ptr {
        float   myvar;
        desc<1> ap;
    };

Similarly for arrays:

    desc<1> arrays;

I wrote "desc<1>" to indicate the 1-dimension version of desc. Since the
number of dimensions is compile-time known and constant, the exact version
with fixed compiletime known size can be used.

Finally, I will use this notation:

    sizeof(type)         size in bytes of entities of the given type

    offset(type, comp)   offset in bytes of the comp component within
                         entities of the given type

The Dwarf2 description is now

1$: DW_TAG_array_type
        ! No name, default (F90) ordering, default stride_size
        DW_AT_type(reference to basetype REAL)
        DW_AT_associated(machine=          ! Test raw data address for non-zero
            DW_OP_push_object_address
            DW_OP_deref
            DW_OP_lit0
            DW_OP_ne)
        DW_AT_data_location(machine=      ! Get raw data address
            DW_OP_push_object_address
            DW_OP_deref)
2$: DW_TAG_subrange_type
        ! No name, default stride
        DW_AT_type(reference to basetype INTEGER)
            DW_AT_lower_bound(machine=
            DW_OP_push_object_address
            DW_OP_lit<n>                  ! where n ==
                                          ! offset(desc, dims) +
                                          ! offset(dims_str, lower_bound)
            DW_OP_add
            DW_OP_deref)
        DW_AT_upper_bound(machine=
            DW_OP_push_object_address
            DW_OP_lit<n>                   ! where n ==
                                          ! offset(desc, dims) +
                                          ! offset(dims_str, upper_bound)
            DW_OP_add
            DW_OP_deref)
        !
        ! Note: for the m'th dimension, the second operator   becomes
        ! DW_OP_lit<x> where
        ! x == offset(desc, dims) +
        ! (m-1)*sizeof(dims_str) +
        ! offset(dims_str, [lower|upper]_bound)
        ! That is, the stack machine does not get longer and longer
        ! for each successive dimension (other than to express the
        ! larger offsets involved).

3$: DW_TAG_structure_type
        DW_AT_name("array_ptr")
        DW_AT_size(constant 4 + sizeof(desc<1>))
4$:     DW_TAG_member
            DW_AT_name("myvar")
            DW_AT_type(reference to basetype REAL)
            DW_AT_data_member_location(constant 0)
5$:     DW_TAG_member
        DW_AT_name("ap");
        DW_AT_type(reference to 1$)
        DW_AT_data_member_location(constant 4) ! Assume sizeof(REAL)==4

6$: DW_TAG_array_type
        ! No name, default (F90) ordering, default stride_size
        DW_AT_name("arrays")
        DW_AT_type(reference to 3$)
        DW_AT_allocated(machine=           ! Test raw data address for non-zero
            DW_OP_push_object_address
            DW_OP_deref
            DW_OP_lit0
            DW_OP_ne)
        DW_AT_data_location(machine=       ! Get raw data address
            DW_OP_push_object_address
            DW_OP_deref)
7$:     DW_TAG_subrange_type
            ! No name, default stride
            DW_AT_type(reference to basetype INTEGER)
            DW_AT_lower_bound(machine=
            DW_OP_push_object_address
                DW_OP_lit<n>               ! where n == ...
                DW_OP_add
                DW_OP_deref)
            DW_AT_upper_bound(machine=
                DW_OP_push_object_address
                DW_OP_lit<n>               ! where n == ...
                DW_OP_add
                DW_OP_deref)

8$: DW_TAG_variable
        DW_AT_name("arrays")
        DW_AT_type(reference to 6$)
        DW_AT_location(machine=
            ...as appropriate...)          ! Assume static allocation

That covers the Dwarf2 description.

Now, suppose the program has executed and we are stopped immediately
following completion of the do loop. Suppose the user enters the
following debug command:

    dbg> print arrays(5)%ap(2)

Interpretation of this expression is now straightforward (he says with a
smile).

1) Lookup name arrays. We find that it is a variable, whose type is given by
the unnamed type at 6$. Notice that it has an array type.

2) Find the 5th element of that array object. To do array indexing we
need several pieces of information:

    a) the address of the array storage
    b) the lower bounds of the array
       [If we wanted to check that 5 is within bounds we would need the
       upper bound too, but we'll skip that for this example]
    c) the stride size

For a), check for a DW_OP_data_location attribute. Since there is one, go
execute the stack machine, whose result is the address we need. The object
address used in this case is the object we are working on, namely the
variable named "arrays", whose address we found in step 1).

    [Had there been no DW_OP_data_location attribute, the desired address
    would be the same as the address from step 1.]

For b), for each dimension of the array (only one in this case), go interpret
the usual lower bound attribute. Again this is a stack machine, which again
begins with DW_OP_push_object_address. This object is *still* arrays, from
step 1). [We haven't begun to actually perform any indexing yet.]

For c), the default stride size applies. Since there is no DW_AT_stride
attribute, use the size of array element type, which is the size of type
array_ptr (at 3$).

Having acquired all the necessary data, we perform the indexing operation
in the usual manner -- which has nothing to do with any of the attributes
involved up to now. Those just helped provide the actual parameters to the
indexing step.

The result, of course, is an object within the memory that was dynamically
allocated for arrays.

3) Find the ap component of the object just identified, whose type is
array_ptr.

This is a conventional record component lookup and interpretation. It happens
that the ap component in this case begins at offset 4 from the begining of
the containing object. ap has the unnamed array type defined at 1$ in the
Dwarf symbol table.

4) Find the 2th element of the array object found in step 3. To do array
indexing we need several pieces of information:

    a) the address of the array storage
    b) the lower bounds of the array
       [If we wanted to check that 2 is within bounds we would need the
       upper bound too, but we'll skip that for this example]
    c) the stride size

This is all just like what we did in step 2), so I won't write out the
details. Suffice it to note that the object address of interest here is
the address that resulted from step 4).

Note: we happen to be accessing a pointer array here instead of an allocatable
array; but because we chose a common underlying representation, the
mechanics are the same. We could have chosen a completely different
descriptor arrangement and the mechanics would still be the same -- only
the stack machines would be different to reflect the different arrangement
of fields.

Example 2
---------

To show the flexibility of these new attributes and operators, let me also
present the Ada example used in my mail of 11 Feb.

    M : INTEGER := <exp>;
    type REC1 is record
        VEC : array (1..M) of INTEGER;
    end record;

    type REC2(N : INTEGER range 1..100) is record
        VEC : array (1..N) of INTEGER;
    end record;

    OBJ2B : REC2;

The Dwarf2 representation should be about like so

1$: DW_TAG_variable
        DW_AT_name("M")
        DW_AT_type(reference to basic type INTEGER)

2$: DW_TAG_array_type
        ! No name, default (Ada) order, default stride
        DW_AT_type(reference to basic type INTEGER)
3$: DW_TAG_subrange_type
        DW_AT_type(reference to basic type INTEGER)
        DW_AT_lower_bound(constant 1)
        DW_AT_upper_bound(reference to variable M, at 1$)

4$: DW_TAG_structure_type
        DW_AT_name("REC1")
        DW_TAG_member
            DW_AT_name("VEC")
            DW_AT_type(reference to unnamed array type at 2$)

The first part above is straightforward and needs no new mechanism.

5$: DW_TAG_subrange_type
        DW_AT_type(reference to basic type INTEGER)
        DW_AT_lower_bound(constant 1)
        DW_AT_upper_bound(constant 100)

6$: DW_TAG_structure_type
        DW_AT_name("REC2")
        DW_TAG_member
            DW_AT_name("N")
            DW_AT_type(reference to unnamed subtype at 4$
            DW_AT_data_member_location(machine=      ! Possibly omitted?
DW_OP_nop)
7$: DW_TAG_array_type
        ! No name, default (Ada) order, default stride
        ! Default data location
        DW_AT_TYPE(reference to basic type INTEGER)
8$:     DW_TAG_subrange_type
            DW_AT_type(reference to subrange type at 5$)
            DW_AT_lower_bound(constant 1)
            DW_AT_upper_bound(machine=
                DW_OP_push_object_address,
                DW_OP_lit<n>               ! where n ==
                                          ! offset(REC2, VEC) - offset(REC2, N)
                DW_OP_neg
                DW_OP_add)                 ! computes address of N given VEC
9$:     DW_TAG_member
            DW_AT_name("VEC")
            DW_AT_type(reference to unnamed array type at 8$)
            DW_AT_data_member_location(machine=
                DW_OP_lit<n>               ! where n == offset(REC2, VEC)

10$: DW_TAG_variable
            DW_AT_name("OBJ2B")
            DW_AT_type(reference to type REC2 at 6$)
            DW_AT_location(...as appropriate...)

The interesting aspects about this example are

1) The array VEC is "immediately" contained within structure REC2 (there
   is no intermediate descriptor or indirection), which is reflected in
   the absence of a DW_AT_data_location attribute on the array type at 7$.

2) One of the bounds of VEC is nonetheless dynamic and part of the same
   containing record, so must be gotten to using an address calculation
   relative to the VEC object (component of REC2).

Hopefully there is no need to walk thru the interpretation of a debugger
command such as

    debug> print OBJ2B.VEC(3)

but I will if requested.

Hopefully I have not made many typos or other distracting errors in preparing
these examples and all is now completely clear.