000914.1 | A | Brender | Representation | Discontinuous Ranges |
This proposal replaces 000531.1, which was discussed by the
Committee on
27 June 2000. In reviewing that proposal, the group requested an alternative
approach that
- Did not require use of LEB representation
- Would work for subprograms whose code might be split among
multiple program sections
I originally intended to offer two proposals, with the intent that both be
adopted. The first was to be kept as simple as possible yet still be
sufficient to cover all cases of discontiguous scopes. The second
was to be more complicated (more like the earlier 000531.1), but designed to
be as compact as possible. The reason for two was to allow a compiler
vendor to choose whichever better suited its design goals and constaints.
Because the committee is trying to wrap up its work on this revision,
and because of my own time constraints, I am presenting just the simpler
of the two at this time.
META CONTEXT/ASIDE:
I think the following four presentational capabilities provide a core
set that allows sophisticated compilers and debuggers to support a
significant and highly useful level of debugging for optimized code:
- Represent split lifetimes for variables
- Represent subprogram inlining
- Represent discontiguous scopes
- Represent semantic breakpoint locations
The first two are already included in DWARF V2. The third is the
subject of this proposal. The fourth will (hopefully) be the subject
of a future (real soon now) proposal.
If we can complete this set, then I think we can claim a significant
improvement in the ability of DWARF to allow support for debugging
optimized code.
END ASIDE
GENERAL DISCUSSION
------------------
In the presence of optimization (and sometimes even for non-optimized code),
it is desirable to be able to describe a scope that consists of a set of
address ranges rather than just a single range.
Let us consider possible representation approaches from the inside out.
1) Address range description as such
For consistency and simplicity, I looked for precedents that could
serve
as models. There appear to be (just) two precedents for representing a
sequence of address ranges:
- a location list, found in the .debug_loc section
- the tuples for the addresses of a compilation unit, found
in the
.debug_aranges section
A location list entry is
- a starting address offset (relative to start of
compilation unit)
- an ending address offset (relative to start of
compilation unit)
- a block containing a location expression
Since the two addresses are offsets, they really are just constants
and require no associated relocation. (In effect, relocation needs to
be
performed by the debugger when it uses this information.) Both
addresses
occupy the size of an address on the target system.
A tuple for the addresses of a compilation unit is
- an address
- a length
Since the address is really an address, there must be an associated
relocation. (No relocation is required by a debugger prior use of the
information.) The address and length both occupy the size of an address
on the target system.
Both kinds of lists are terminated by a pair of zero values.
Since no relocation is involved, the location list model is more
attractive. However, a location list also includes a location
expression
which is not needed for scopes.
So, let us define a "scope list entry" as
- a starting address offset (relative to start of
compilation unit)
- an ending address offset (relative to start of
compilation unit)
and a "scope list" is
a sequence of scope list entries terminated by
a pair of
(address-sized) zero values.
Also observe that there is no need for a scope list, just like a
location
list, to "point back" to the entity that references it.
Conversely,
tuples for the addresses of a compilation unit do point to the
.debug_info
section and there is in fact no pointer from .debug_info to
.debug_aranges.
This reinforces the affinity between scope lists and locations lists.
2) Location of scope lists
Since scope lists are conceptually much like location lists and since
there is no precedent for including similar kinds of lists immediately
within the .debug_info section, it seems reasonable to store such
lists in a separate section.
There are two choices:
- define a new section specifically for scope lists, named,
oh say,
.debug_scope or the like
- use the existing .debug_loc section
3) Some way to reference a scope list from a scoping DIE that represents
a scope (DW_TAG_block, DW_TAG_subroutine, etc)
Depending on where the information is located, there are these choices:
- for location lists stored in a new section, define a new
attribute,
named say DW_AT_ranges, whose single operand is
new class rangeptr,
which can use either DW_FORM_data4 or
DW_FORM_data8 as appropriate.
(This is strictly analogous to DW_AT_location
and class locptr.)
- for location lists stored in the .debug_loc section,
define a new
attribute named say DW_AT_ranges, whose single
operand is the
(existing) class locptr (which can use either
DW_FORM_data4 or
DW_FORM_data8 as appropriate).
[- A less attractive variation would be to re-use the existing
attribute
DW_AT_location on the scope DIEs. This seems
unnecessarily obtuse
and perhaps confusing.]
Of the possibilities, reusing the existing .debug_loc section seems
attractive, in which case the new attribute DW_AT_ranges with a locptr
operand completes the needs.
There is one and only one downside to combining both location lists and
scope lists in a single section: since neither is self-describing, it
becomes impossible to make a simple linear scan of the section to parse
and interpret location/scope list data. There is no reason for a
debugger
to do this as far as I can imagine. But it might be convenient for a
debugger or compiler implementor that is trying to debug DWARF2 related
tools. If it really were important to retain parsability, a new section
should be used or a scope list could be made to look like a location
list
by including two bytes of zeros in every entry; I don't think either is
warranted but I solicit other input.
4) More regarding scope lists
Without loss of generality, we can restrict the set of address ranges
to be
- a sequence of address ranges (scope list entries), such
that
- the address ranges occur sorted in increasing beginning
address order,
- all adjacent pairs of ranges have a gap between them
(that is, they
are not only disjoint but also cannot be
combined into a single range
without also including an address that should
not be included).
This provides a canonical representation for
the discontiguous range of
addresses.
Requiring a canonical representation creates some additional work
for producers but may have advantages for consumers. However, neither
location lists nor the tuples in the address range table are required
to be sorted, so no such requirement is proposed here.
It does seem worthwhile to require a modicum of
minimality/well-formednes
in the following sense:
- all pairs of ranges are disjoint (there are no overlaps)
Bringing the pieces all together, we get the following proposal.
PROPOSAL (with one open choice)
-------------------------------
Add the ability to describe discontiguous scopes as follows:
1) Add new attribute DW_AT_ranges, which takes a single argument of
class locptr (DW_FORM_data4 or DW_FORM_data8 as appropriate).
2) This attribute can be used with the following DIEs (all of which describe
scopes of one form or another [essentially any DIE that allows
DW_AT_low_pc
and DW_AT_high_pc]):
- DW_TAG_catch_block
- DW_TAG_compile_unit
- DW_TAG_inlined_subroutine
- DW_TAG_lexical_block
- DW_TAG_module
- DW_TAG_subprogram
- DW_TAG_try_block
- DW_TAB_with_stmt
3) DW_AT_ranges and DW_AT_low_pc/DW_AT_high_pc cannot both be used on the
same DIE.
4) If DW_AT_ranges is used and DW_AT_entry_pc is absent, then the entry
point for the scope defaults to be:
WE NEED TO CHOOSE ONE:
a) The lowest PC of the scopelist
b) The low PC of the first range of the scopelist
Note that if a) is chosen, then there is an advantage to requiring that
the ranges of a scope are sorted by address. b) has the advantage that
an entrypoint other the lowest PC can sometimes be specified without
using DW_AT_entry_pc merely by putting the appropriate range first in
the scope list. (In the likely case that the entry is at the lowest PC,
both choices can be used to avoid needing a DW_AT_entry_pc attribute.)
END CHOICE:
5) The argument of DW_AT_ranges is an offset in the .debug_loc section (by
virtue of being of class locptr) that begins a scope list.
6) A scope list consists of a sequence of scope list entries, where each
entry consists of a beginning address offset and an ending address
offset
(the first address past the last address of the that range). The list
is terminated by a pair of zero address offsets. [This encoding is
identical to location lists except that there is no location
description
in an scope list entry.]
7) Location lists and scope lists may be freely intermixed in the .debug_loc
section.
EDITORIAL CHANGES
-----------------
The following summarizes where changes will be made and indicates the kind
of change:
- Section 1.5: list "discontigous scopes"
- Section 2.2, Figure 2: Add DW_AT_ranges attribute
- Section 2.16: Add discussion of DW_AT_ranges
- Section 3.1: Add discussion of DW_AT_ranges (new item 12) [or see below]
- Section 3.3.3: Add discussion of DW_AT_ranges
- Section 3.3.8.1: Add DW_AT_ranges in list of non-occuring attributes
- Section 3.3.8.2: Add discussion of DW_AT_ranges
- Section 3.4: Add discussion of DW_AT_ranges
- Section 3.6: Add discussion of DW_AT_ranges
- Section 3.7: Add discussion of DW_AT_ranges
- Section 7.5.4, Figure 18: Add DW_AT_ranges
- Appendix 1: Add DW_AT_ranges to appropriate DIEs
- Appendix 7, (f) and figure: mention scope list
EDITORIAL SUGGESTION
--------------------
The general and complete description of DW_AT_low_pc and DW_AT_high_pc
attributes now occurs in Section 2.16. Most DIEs that have an address range
include the following sort of description, which is replicated many places
(with the obvious substitution for each kind of entry):
"The <xyz> entry has a DW_AT_low_pc attribute whose value is
the relocated
address of the first machine address generated for the <xyz>. It
also has
a DW_AT_high_pc attribute whose value is the relocated address of the
first
location past the last instruction generated for the <xyz>."
One editorial approach would be to add a third sentence following each
occurrence of the above, something like the following:
"Alternatively, the <xyy> may instead have a DW_AT_ranges
attribute whose
value describes the several ranges of instructions generated for the
<xyz>.
What I suggest instead is to delete the existing sentences and replace them
with something like:
"The <xyz> entry has either DW_AT_low_pc and DW_AT_high_pc
attributes or
alternatively a DW_AT_ranges attribue, whose value(s) describe the one
or
more ranges of instructions generated for the <xyz> (see Section
2.16
for the description of these attributes)."
In at least one case (notably Section 3.1), this means combining two
bullets into one (and avoiding addition of a third).
The proposal was accepted with a few modifications: the range
entries are
unsorted and contained in a new section: .debug_ranges. Any object with a
discontinuous range must specify DW_AT_entry_pc.