001101.1 W R. Brender Representation Discontiguous scopes

This PROPOSAL corrects an oversight in 000914.1 regarding discontiguous
scopes.


Problem Statement
-----------------

In the discussion of 000914.1 on 3 October, we noted that when a
compilation unit consists of discontiguous parts there was no unambiguous
"base address" to use in determining the location from which the
"relative addresses" used for range lists and location lists are defined.
We resolved this by saying that in such cases, a DW_AT_entry_pc attribute
must be provided to define that base.

However, there is a deeper problem that went unnoticed and undiscussed.
Suppose that the code of a (single) compilation unit is split across
multiple program sections, which can be independently relocated at link
and even again at load time. In principle, any executable entity within
the compilation unit can also be split across those same sections so
that a single location or range list has entries in any or all of those
sections.

The problem is how to create the location or range list entries using
the underlying object language. In effect, our specification requires
that each relative address consist of the difference of two addresses
*which may happen to be defined in separate program sections.* To the
best of my knowledge, few object file representations provide appropriate
relocation directives that make this possible. (For one widely used
example, the relocations available in ELF on the IA-32 architecture do
not support this.)

While future versions of ELF or other object file representations are
likely to include such support, it seems undesirable for DWARF to be
dependent on such futures if in fact there is a reasonable alternative
that makes it unnecessary.

This proposal defines such an alternative.


PROPOSAL
--------

There are several parts:

1) Changes to 000914.1 as approved:

    a) Remove the applicability of attribute DW_AT_ranges to
       DW_TAG_compile_unit DIEs.

    b) Remove the applicability of attribute DW_AT_entry_pc to
       DW_TAG_compile_unit DIEs.

2) Specify that if a compilation unit has a discontiguous scope,
   then there must be a corresponding address ranges contribution
   to the .debug_aranges section. That is, the .debug_aranges
   section is no longer an option whose purpose is solely to
   facilitate accelerated access; for discontiguous compilation
   units it becomes a requirement, while for all other cases it
   remains an option.

   Note that addresses in the .debug_aranges section are (real)
   relocated addresses, hence not subject to the problems engendered
   by relative addresses.

   Note that the low address of each address range entry in the
   .debug_aranges section defines a natural "base address" for that
   part of the compilation unit. Further, the list provides a natural
   ordering of those base addresses (1st, 2nd, 3rd, etc) independent
   of and unrelated to the values of the addresses as such.

   For a contiguous compilation unit, the 1st (and only) base
   address is defined to be the same as the value of DW_AT_low_pc
   (which provides the obvious definition of base address which
   happens to be missing in DWARF V2).

3) Define a means to use the low-address/high-address pair that is
   or is included in range lists and location lists to indicate
   which base address applies. To do this, note that every valid
   address range (L, H) satisfyies the relation L < H. Thus,
   any pair for which this relation does not hold can be defined
   to have some other interpretation. One such interpretation is
   already defined: the pair (0, 0) indicates the end of list.

   Define a "base address selection entry" to consist of the pair
   (N, 0) where N > 0; it specifies that subsequent entries in the
   same range or location list are defined relative to the N'th base
   address (as specified in the .debug_aranges section), until the
   next selection entry is encountered (of course). The first entry
   is assumed to be relative to the 1st base address (so an initial
   (1,0) is redundant and need not be present).

   For a location list, there is no need to append a fake DWARF
   expression (eg, a 2-byte count of zero) in a (misguided) attempt
   to make the base address selection entry "look like" a location
   entry. The ability of dumpers to scan and interpret the location
   list section independent of other information is not compromised
   by this omission.


Discussion
----------

There are numerous advantages to this proposal, including:

  1) It solves the relative address relocation problem.

  2) It is upward compatible with DWARF V2.

  3) It eliminates potential redundant representation of information
     in cases where DW_AT_ranges was needed even though appropriate
     .debug_aranges information was also present.

  4) If the problem were specific to range lists and not location
     lists, we might rethink 000914.1 from the ground up and consider
     other approaches that did not involve "tricks" based on the
     nature of address ranges. However, the problem exists in DWARF V2
     (even before the 000914.1 proposal) so it is critical to formulate
     a solution that works for location lists as well. This proposal
     does.

  5) This representation is easy for a compiler to generate. Note that
     the base addresses in the .debug_aranges section are nothing more
     than the 0'th address in each respective "text" section and the
     relative addresses used in location and range list address pairs
     are nothing more than the (local) section offsets within those
     respective sections.

Note that no attribute is proposed to point from a compilation
unit DIE to the corresponding .debug_aranges contribution; none
is needed because the .debug_aranges header points back to the
corresponding compilation unit. This does imply one of two
things: at debugger start up, it needs to make a scan through
the .debug_aranges section to acquire the appropiate discontigous
ranges information--a debugger that supports/exploits the
.debug_aranges section will be making this scan in any case, so
this is no additional work. Alternatively, a debugger can wait
until it encounters or has need to access a compilation unit that
lacks DW_AT_low_pc/DW_AT_high_pc information and then go scan the
.debug_aranges section looking for the matching information. (Such
"lazy" acquisition may be more expensive in the long run if most or
all compilation units are eventually read but this is generally true
of most lazy techniques and is certainly viable in any case.)

[FWIW, note that the pairs (M, N) where M > N > 0 remain "available"
should we ever need to embed other control information in location
and/or ranges list. Quite frankly I hope we never feel motivated to
explore this "opportunity". :-]

Thanks go to Mark Schimmel for suggesting the use of the .debug_aranges
section for the compilation level ranges information. And, thanks go to
Dave Anderson for discussion that helped clarify some of the ideas and
presentation.


This proposal is replace by 010119.1.