000531.1 W R. Brender Representation Add DW_AT_pc_ranges attribute


This proposal provides a means for a compiler to describe discontigous

DWARF V2 recognizes that scopes may not be a single contiguous
range of addresses, but makes no provision for how to describe such
a scope.


Add the attribute DW_AT_pc_ranges whose value is of class block. The
contents of the block is interpreted as a sequence of <SLEB128, ULEB128>
(signed LEB, unsigned LEB) pairs, where each pair describes one contiguous
segment of the scope. The scope is the union of all the component segments.

Note: there is no explicit count of the number of pairs present. That is
determined by the length of the block of data together with decoding the
contents of the block as alternating SLEB and ULEB values. (It is a bug if
the end of the block does not match the end of a ULEB value.)

The signed SLEB value of the first pair specifies the beginning of the
first segment as a signed delta relative to the entry point (DW_AT_entry_pc)
of the scope, if present, otherwise relative to the DW_AT_low_pc. If
neither is present, then use the entry pc otherwise low pc of the containing
scope (block), and so on, up through and including the innermost containing
routine. If no such entry pc or low pc is found, then the attribute is ignored
(the DWARF description is flawed).

The signed SLEB value of all pairs after the first is a signed delta
relative to the ending address of the previous delta plus 64. (The reason
for the 64 is explained below.)

The ULEB (second) value of each pair specifies the number of units that make
up the range for a segment.

All SLEB and ULEB values are scaled by the value of minimum_instruction_size
(mis for short in the following) as found in the header of the statement
program section (.debug_line, see 6.2.4). That is, these values can be
interpreted as instruction counts. If there is no such section for the
containing compilation unit, then 1 is assumed.

Thus, given an entry or low pc of epc, the low (first) address of the first
segment is given by

    low-pc[1] = epc + sleb[1]*mis

and the high address (first byte past the last instruction of the segment)

    high-pc[1] = low-pc[1] + uleb[1]*mis

For segments after the first, the low and high addresses are given by

    low-pc[n] = (high-pc[n-1] + 64*mis) + sleb[n]*mis
    high-pc[n] = low-pc[n] + uleb[n]*mis

The segments are not required to be sorted. However, the representation is
most space efficient when they are sorted so that the segments occur
in monotonic increasing order.


DWARF V2 is able to describe the addresses associated with a scope
if and only if the scope can be characterized using a single contiguous
range of addresses (using the DW_AT_low_pc and DW_AT_high_pc). Recognizing
that such a description may well not be adequate, DWARF goes on to state
(see V2, section 3.1, page 23):

    "The presence of low and high pc attributes in a compilation unit entry
    imply that the code generated for that compilation unit is contiguous
    and exists totally within the boundaries specified by those boundaries
    specified by those two attributes. If that is not the case, no low pc
    and high pc attributes should be produced."

    [Similar statements are made for other entities where the low and high
    pc attributes may be used.]

However, no specification is given for what should be done to describe
a non-contiguous range of addresses.

The 64 comes about as follows: This same scheme will be used for other
purposes (the fodder of future proposals) where the series of ranges involved
need not be disjoint. Thus the first value of each pair should always be
a signed value to allow negative deltas to the start of the next segment.
But, to take advantage of sorted ranges that more often than not are
in fact non-overlapping, it is desirable to start each segment using a
negative delta that can be expressed in one byte -- since -128 is the most
negative 1-byte SLEB value, we bias the base for computing the beginning
of a segment by half that amount relative to the end of the previous
segment (for other than the first). Experience shows that this works well
for both overlapping and non-overlapping ranges when the ranges are small
and the choice doesn't much matter when the ranges are large.

Notice that this representation is compact, completely position independent,
and requires no relocations even in a relocatable object file. It takes
advantage of a "starting address" that must otherwise already be present
for other reasons.

Example: Suppose we have a routine whose scope address ranges are
illustrated in the following (for simplicity, mis==1 is assumed):

    1000: entry/first instruction of routine scope

    1100: first instruction of inner scope 1, segment 1
    1109: last instruction of inner scope 1, segment 1

    1120: first instruction of inner scope 1, segment 2

    1130: first instruction of nested scope 2 (only segment)
    1139: last instruction of nested scope 2 (only segment)

    1149 last instruction of inner scope 1, segment 2
    1149 last instruction of routine scope

The address range for the scope of the routine as a whole can be represented
in either of two ways:

  - DW_AT_low_pc(1000), DW_AT_high(1150)            [2 target addresses +
                                                    2 relocations]

  - DW_AT_entry_pc(1000), DW_AT_pc_ranges(0, 50)   [1 target address +
                                                    1 relocation + 2 bytes]

The address ranges for inner scope 1 are given by:

  - DW_AT_pc_ranges(0, 10, -54, 30)                 [4 bytes]

The address range for nested scope 2 is given by:

  - DW_AT_pc_ranges(30, 10)                         [2 bytes]

This provides a mechanism for describing discontiguous scopes for blocks or functions. As described,
several problems were identified. One is that the proposal would require assemblers or linkers to
support LEB128 data types, since generally compilers are unable to determine offsets within object files.
In other places in Dwarf 2 where an assembler or linkers is expected to provide values, these values are
the size of a machine address. It was felt that while the functionality proposed was desirable, requireing
extensions to assemblers and linkers in order to obtain this functionality was undesirable.

Withdrawn pending revision.