010221.1 A R. Brender Representation Multisection Location and Range Lists

Location lists and range lists share a common problem when there
are multiple independently relocated text sections. Earlier proposals have
considered several alternative solutions.


PROBLEM STATEMENT
-----------------

The DWARF V2 design of location lists for representing the locations of
variables that change over time is not adequate to deal with variables
declared in functions whose code is split into multiple program (text)
sections. Similarly for discontiguous scopes.

Recall that in DWARF V2, location lists are defined in Sections 2.4,
2.4.6 and 7.7.2. In DWARF V2.1 Draft 5DW they are defined in Figure 3
(see specifically loclistptr), Sections 2.5, 2.5.4, 7.5.4 (see
specifically loclistptr) and 7.7.3. In both cases they are referenced
from attributes, most notably DW_AT_location, whose value is interpreted
as an offset in the .debug_loc section where the beginning of a location
list is found.

Recall that in DWARF V2.1 Draft 5DW, discontiguous scopes are defined in
Figure 3 (see specifically rangelistptr), Sections 2.6.3, 7.5.4 (see
specifically rangelistptr) and 7.24. A range list is referenced from
a DW_ranges attribute whose value is interpreted as an offset in the
.debug_ranges section where the beginning of a range list is found.

A location list consists of a sequence of location list entries, where
each entry consists of

  - A beginning address
  - An ending address
  - A location expression

The two addresses are specified to be "relative to the base address of
the compilation unit referencing this location list". (An attractive
consequence of this formulation is the two addresses need not be relocated
at linktime or loadtime, which has both size and system performance
benefits.)

In DWARF V2, there are two problems with this formulation. The first is
that in a compilation that is split into more than one program section,
it is ambiguous which program section establishes "the base address".
DWARF V2.1 Section 3.1 resolves this problem as follows:

    "The base address of a compilation is defined as follows: If a
    DW_AT_entry_pc is given for the unit, the base address is the
    value of that attribute. Otherwise, if a DW_AT_low_pc attribute
    is specified, the base address is the value of that attribute.
    If neither is present, then the base address is undefined and
    any DWARF [debugging information] entry or structure defined
    in terms of the base address of that compilation is not valid."

(Section 2.5.4 would benefit from a reference to this definition.)

The second problem is more subtle and is the problem that this proposal
seeks to correct.

The second problem derives from the definition of the beginning and
ending addresses as relative to the base address, which is to say
as the difference (offset) of the given address minus (relative to)
the base address. That is, for a location list entry with beginning
address BEG, ending address END, location register R0, and base address
BASE, the location list entry is represented as

    (BEG - BASE, END - BASE, <location expression for register R0>)

If there is only a single text program section, then there is no problem:
BEG and BASE are at known offsets relative to the beginning of that section
and the difference is easily computed by the compiler. However, if BEG and
END are defined in one program section and BASE is defined in another,
then there is a problem: Few object file representations in common use
today provide relocation commands that singly or in combination are able
to compute the required difference and use it to initialize storage at
link time (or possibly even load time).

    As a key example, ELF for the IA-32 architecture does not provide such
    relocation capability.

Without such relocation capability, optimizing compilers that split
modules and functions into more than one program section cannot implement
DWARF location lists.

Similarly, a range list consists of a sequence of range list entries,
where each entry consists of

  - A beginning address
  - An ending address

The two addresses are specified to be "relative to the base address of
the compilation unit referencing this range list".

A range list is just like a location list in its use of relative
addresses to define the beginning and ending addresses of each entry of
the list. As a result, it shares all of the problems of location lists
when multiple sections containing generated machine code are involved.


PROPOSAL
--------

This proposal defines a new kind of entry used in both location lists
and range lists to specify the base address used in computing the
relative addresses for subsequent entries.

First, recall that the addresses BEG and END of a relocation or range
list entry must satisfy the relationship BEG < END. (The end address is
defined as one past the last address in the range, so even for a range
consisting of one address ADR we have ADR < ADR + 1.) Since the two
relative addresses must logically be relative to the same base address,
we can define the low relative address L = BEG - BASE and the high
relative address H = END - BASE, and it immediately follows that L < H.

As a consequence, any relative address pair (L, H) that does not satisfy
L < H can be used to encode something other than a "normal" location
list entry. DWARF already exploits this! The pair (0, 0), that is, a pair
of zeros, identifies the end of a list.

Define a "base address selection entry" to consist of the pair
(MAX_ADDRESS, B) where MAX_ADDRESS is the largest address possible
on the target (either 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF). It follows
that MAX_ADDRESS > B. This entry specifies that subsequent entries
in the same range or location list are defined relative to the
address B (until the next selection entry is encountered, of course).

There is no need to append a fake DWARF expression (eg, a 2-byte count
of zero) in the case of a location list entry in a (misguided) attempt
to make the base address selection entry "look like" a location entry.
(The (0, 0) end of list entry does not have a pseudo-location either.)
The ability of dumpers to scan and interpret the .debug_loc section
independent of other information is not compromised by this omission.

Finally, adjust the definition of location list to be a sequence of
entries, each of which is either a base address selection entry or
a location list entry, followed by an end of list entry. Similarly
for range lists.


DOCUMENT CHANGES
----------------

A) In Section 2.5.4, replace "Each entry in a location list consists
    of:" together with the following bullets with (note: | helps mark
    the actual changes):

|         "Each entry in a location list is either a location list entry,
|         a base address selection entry, or an end of list entry.

|         "A location list entry consists of:

|         1. A beginning address. This address is relative to the
             applicable base address of the compilation unit referencing
             this location list. It marks the beginning of the address
             range over which the location is valid.

|         2. An ending address, again relative to the applicable base
             address of the compilation unit referencing this location
             list. It marks the first address past the end of the address
|            range over which the location is valid. The ending address
|            must be greater than the beginning address.

          3. A location expression describing the location of the object
             over the range specified by the beginning and ending address.

|         "The applicable base address of a location list entry is
|         determined by the closest preceding base address selection entry
|         in the same location list (see below). If there is no such
|         selection entry, then the applicable base address defaults to
|         the base address of the compilation unit (see Section 3.1).

|         "<i>In the case of a compilation unit where all of the
|         machine code is contained in a single contiguous section, no base
|         address selection entry is ever needed.</i>"

B) In Section 2.5.4 (again), replace the last paragraph (which begins
    "The end of any given locations list...") with (note: | helps mark
    the actual changes):

|     "A base address selection entry consists of:
|
|         1. The value of the largest representable address.
|
|         <i>This value is 0xFFFFFFFF in the 32-bit DWARF format and
|         0xFFFFFFFFFFFFFFFF in the 64-bit DWARF format (see Section
|         7.5.4).</i>
|
|         2. An address, which defines the appropriate base address for
|         use in interpreting the beginning and ending relative
|         addresses of subsequent entries of the location list.
|
|     "<i>A base address selection entry affects only the list in which
|     it is contained.</i>
|
|     "The end of any given location list is marked by an end of list
|     entry, which consists of a 0 for the beginning address and a 0
|     for the ending address. A location list containing only an end of list
      entry describes an object that exists in the source code but not
      in the executable program.

|     "Neither a base address selection entry nor an end of list entry
|     includes a location expression."
|
|     "<i>A base address selection entry and an end of list entry for a
|     location list are identical to a base address selection entry and
|     end of list entry, respectively, for a range list
|     (see Section 2.16.3) in interpretation and representation.</i>"

C) In Section 2.16.3, replace "Each entry in a range list consists
    of:" together with the following bullets and the entire remainder
    of the section with (note: | helps mark the actual changes):

|         "Each entry in a range list is either a range list entry,
|         a base address selection entry, or an end of list entry.

|         "A range list entry consists of:

          1. A beginning address. This address is relative to the
|            applicable base address of the compilation unit referencing
             this location list. It marks the beginning of the address
             range.

|         2. An ending address, again relative to the applicable base
             address of the compilation unit referencing this location
             list. It marks the first address past the end of the address
             range. The ending address must be greater than the beginning
             address.

|         "The applicable base address of a location list entry is
|         determined by the closest preceding base address selection entry
|         in the same location list (see below). If there is no such
|         selection entry, then the applicable base address defaults to
|         the base address of the compilation unit (see Section 3.1).
|
|         "<i>In the case of a compilation unit where all of the
|         machine code is contained in a single contiguous section, no base
|         address selection entry is ever needed.</i>"
|
|     "A base address selection entry consists of:
|
|         1. The value of the largest representable address.
|
|         <i>This value is 0xFFFFFFFF in the 32-bit DWARF format and
|         0xFFFFFFFFFFFFFFFF in the 64-bit DWARF format (see Section
|         7.5.4).</i>
|
|         2. An address, which defines the appropriate base address for
|            use in interpreting the beginning and ending relative
|            addresses of subsequent entries of the location list.
|
|         "<i>A base address selection entry affects only the list in which
|         it is contained.</i>
|
|     "The end of any given range list is marked by an end of list
|     entry, which consists of a 0 for the beginning address and a 0
|     for the ending address. A location list containing only an end
|     of list entry describes an empty scope (which contains no
|     instructions).
|
|     "<i>A base address selection entry and an end of list entry for a
|     range list are identical to a base address selection entry and
|     end of list entry, respectively, for a location list
|     (see Section 2.5.4) in interpretation and representation.</i>"

D) In Section 3.1:

    1)  Add the following to the first bullet (which describes the
        use of DW_AT_low_pc/_high_pc and DW_AT_ranges to specify the
        range(s) of a compilation unit):

        "A DW_AT_low_pc attribute may also be specified in combination
        with DW_AT_ranges to specify the default base address for use
        in location lists (see Section 2.5.4) and range lists (see
        Section 2.16.3)."

    2)  Replace the last paragraph (which defines the base address of a
        compilation unit) with the following:

        "The base address of a compilation unit is defined as
        the value of the DW_AT_low_pc attribute, if present; otherwise,
        it is undefined. If the base address is undefined, then any
        DWARF entry or structure defined in terms of the base address
        of that compilation unit is not valid.

E) In Section 7.7.3, replace the current first paragraph with (note:
    | helps mark the actual changes)

|         "Each entry in a location list is either a location list entry,
|         a base address selection entry, or an end of list entry.

|         "A location list entry consists of two relative addresses
          followed by a 2-byte length, followed by a block of contiguous
          bytes. The length specifies the number of bytes in the block
          that follows. The two addresses are the same size as used
          by DW_FORM_addr on the target machine.

|         "A base address selection entry and end of list entry each
|         consist of two (constant or relocated) addresses. The addresses
|         are the same size as used by DW_FORM_addr on the target machine.

F) In Section 7.24, replace the current first paragraph with (note:
    | helps mark the actual changes)

|         "Each entry in a range list is either a range list entry,
|         a base address selection entry, or an end of list entry.

|         "A range list entry consists of two relative addresses.
          The addresses are the same size as used by DW_FORM_addr on
          the target machine.

|         "A base address selection entry and end of list entry each
|         consist of two (constant or relocated) addresses. The addresses
|         are the same size as used by DW_FORM_addr on the target machine.

G) In Appendix A, Figure 38: remove DW_AT_entry_pc from the applicable
    attributes for DW_TAG_compile_unit.

        [Editor's note: DW_AT_entry_pc is unintentionally missing from
        the DW_TAG_compile_unit in Draft 5 -- so this is a "virtual"
        deletion!]


DISCUSSION
----------

There are a variety of ways to address the problem described, but an
important constraint is that the solution is upward compatible with
the DWARF V2 specification of location lists. The proposal does that.

In this proposal, a base address selection entry can be viewed as
analogous to a DW_LNE_set_address command in the line number table
section, and similarly requires an associated relocation in the
underlying object language.

A key advantage of this proposal compared to others is the lack of
a fixed central or common set of base addresses that must be used
throughout all location lists or range lists of the compilation unit;
each function, or scope, or whatever can employ whatever base addresses
are convenient, and new ones can be introduced incrementally.

The DW_AT_entry_pc attribute was introduced for a compilation unit back
in October 2000 as part of the discussion re 000914.1 as a means to
specify a base address for a compilation unit which was discontiguous
because then neither DW_AT_low_pc nor DW_AT_high_pc would be present.
Strictly speaking, this attribute is not needed in combination with
this proposal because an appropriate base address can always be
introduced as needed in a location or range list itself.

However, there is still a space advantage to having a "default" base
address so that even for a discontiguous unit some, perhaps many, location
lists and ranges lists can avoid starting with a base address selection
entry. We could leave DW_AT_entry_pc as the way to do this by omitting
part D) in the above proposal, although the "entry pc" connotation of the
name is a bit strange. Or we could use DW_AT_low_pc to do this as proposed
here, which seems natural because DW_AT_low_pc already defines the base
address when used in a contiguous unit. The change in this case compared
to V2 is allowing DW_AT_low_pc without DW_AT_high_pc on a
DW_TAG_compile_unit DIE.

For comparison, here is the current text (which would be retained if
part D were not adopted):

    "The base address of a compilation is defined as follows: If a
    DW_AT_entry_pc is given for the unit, the base address is the
    value of that attribute. Otherwise, if a DW_AT_low_pc attribute
    is specified, the base address is the value of that attribute.
    If neither is present, then the base address is undefined and
    any DWARF [debugging information] entry or structure defined
    in terms of the base address of that compilation is not valid."

This proposal recommends using DW_AT_low_pc as the only way to specify
the base address of a compilation unit (whether used with DW_AT_high_pc
or DW_AT_ranges).

Note: none of this affects the use of DW_AT_entry_pc for modules or
subroutines--only its use specifically for DW_TAG_compile_unit is at
issue here.

Editorial Note: As proposed, the description of base address selection
entry is duplicated in both 2.5.4 and 2.16.3. This is because I could
not find a reasonably natural third place in the document to introduce
such an entry (without any context) and then reference that place from
2.5.4 and 2.16.3. If you have a suggestion, I would love to hear it.
Same goes for the end of list entry description, actually, but that was
already duplicated in any case.


ALTERNATIVES CONSIDERED
-----------------------

Here is how this proposal compares to the earlier proposals 001101.1,
001130.1 and 010205.1:

Compared to 001101.1:
    - This proposal has no "central" specification of a fixed set of
      base addresses; 001101.1 uses the DWARF V2 defined .debug_aranges
      section to provide that information.
    - This proposal and 001101.1 both exploit the requirement that
      L < H in a valid location list entry, so that other pairs of
      values can be interpreted in a different way. This proposal
      uses the pair (M, B) where M is the maximum possible address
      (either 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF) and B is the base
      address to be used for subsequent entries; 001101.1 uses the
      pair (N, 0) to encode an index that identifies the applicable
      base address.
    - This proposal uses the DW_AT_ranges attribute on a compilation
      unit to specify the set of ranges for a discontiguous compilation
      unit; 001101.1 did not need (nor allow) DW_AT_ranges on a
      compilation unit because the full range information is already
      available in the .debug_aranges section.
    - This proposal does not change the implementer optional status of
      the .debug_aranges section; 001101.1 requires the use of the
      .debug_aranges section for those (and only those) compilation
      units that generate code in multiple independently relocated
      sections.

Compared to 001130.1: This proposal is a redraft of 001130.1 designed
to be self contained, to more clearly cover both location lists
and range lists, and to detail the textual impact of the proposal.

Compared to 010205.1:
    - This proposal has no "central" specification of a fixed set of
      base addresses; 010205.1 has a single "central" attribute that
      specifies the set of base addresses for a compilation unit.
    - This proposal and 010205.1 both exploit the requirement that
      L < H in a valid location list entry, so that other pairs of
      values can be interpreted in a different way. This proposal
      uses the pair (M, B) where M is the maximum possible address
      (either 0xFFFFFFFF or 0xFFFFFFFFFFFFFFFF) and B is the base
      address to be used for subsequent entries; 010205.1 uses the
      pair (N, 0) to encode an index that identifies the applicable
      base address.
    - This proposal requires a relocation for each occurrence of a base
      address (one for each base address selection entry); 001130.1
      requires no relocations in the .debug_loc section;

The 001101.1 and 010205.1 proposals drop the use of DW_AT_entry_pc to
define "the" base address of a compilation unit, because the limitation
to a single base address is in fact part of the problem. Strictly
speaking, 001130.1 and this proposal don't require DW_AT_entry_pc either
because a new base address can be specified locally. However, the notion
of a default base address is still useful as a space optimization;
whether this is best expressed using DW_AT_entry_pc or DW_AT_low_pc
is mostly a matter of taste.

One could also explore possible schemes where the control information
that associates a base address with an location list entry is external
to the .debug_loc section. While I played with some such ideas, they
all seemed to be less intuitive and more complicated than any of the
ones mentioned here.


Proposal accepted.