010322.2 A T. Allen Ada DW_FORM_ref_addr crossing shared lib boundaries

PROBLEM:

(This is just a reiteration of an earlier mail, which I'm including in the
proposal for context.)

Let me first start with a little background for those unfamiliar with
Ada:

    The Ada language has a very strong model for modularization. At the top
    level, the entire application is divided into library units, and they are
    further divided into other units. A unit is either a package (akin to
    module or namespace), subprogram, task, or protected unit.

    There are no stand-alone types or objects in Ada. They are always nested
    within some sort of unit. In order for the type or object entity to be
    referenced, the containing library unit must be visible. Usually that's
    achieved with a "with" clause.

    Units are composed of a declaration/specification, which provides the
    interfaces, and a body, which provides the implementation.

In our implementation, specifications and bodies are compiled separately, and
have separate object files. One of our very early design decisions was to
use this fact to reduce the size of the DWARF. In our implementation, an
entity is described _once_, in the library unit in which it is declared. And
there is always only one such library unit, in contrast to C where a global
object may be declared in a header file and thus may be described repeatedly.
In our Ada implementation, it is only described in the object associated with
the library unit containing its declaration, and never in any other object.
This eliminates the need for any coalescing code in the static linker, or in
a post-link tool. And that's a good thing because describing entire
hierarchies of units of all the library units mentioned in with clauses for
every object would get really huge really fast.

This means that the DWARF entries for any externally visible entities had to
have externally visible symbols, so that they could be referenced from other
objects. And the references were DW_FORM_ref_addr with relocations
associated with their values. The static linker resolved those references,
and you couldn't tell from the resultant program that anything unusual had
been done.

When shared libraries were implemented shortly thereafter, we had an
interesting consequence. A library unit that defined an entity could be
placed in one shared library, while other entities that referenced it could
be in another one. And there was no way to know in what shared libraries the
objects for any given library unit might be placed, at the time the library
unit was compiled. In fact, it might be placed in multiple shared libraries
in any number of combinations.

The references between the loadable sections of one shared library and
another are resolved by the dynamic linker at run-time, but the references
between their DWARF sections aren't, since they aren't even loaded.

We considered a number of solutions to this, and decided that the goal should
be to keep the size of the DWARF down. Replicating whole hierarchies of
units and their nested entities didn't seem appropriate. So, we placed the
burden for dealing with this problem on the debugger. It had to resolve the
relocations between DWARF sections of shared libraries in a "DWARF dynamic
linker" pass that mirrored the work of the dynamic linker for the loadable
sections.

This deviates from the text in 7.5.4 discussing references of form
DW_FORM_ref_addr, where it says "within the same executable or shared
object".

Afterword:

    I've oversimplified this discussion a little bit. The static part of the
    executable can be considered an extra "shared library" for this
    discussion.

    Ada also has subunits, which are textually separated from their containing
    library units, but logically nested within them. In our implementation,
    these subunits have their own objects, and they can be placed in distinct
    shared objects from their parents, too. Also, our implementation may
    artificially separate out parts of a unit into separate objects for its
    own purposes (such as generic instantiation), and these objects also can
    be placed in separate shared libraries. You can consider subunits and
    artificially separated nested units as library units for this discussion.

WORDING CHANGE:

  | 7.5.4 Attribute Encodings
  |
  | ...
  |
  | reference [on page 116]
  |
  | ...
  |
  | The second type of reference can identify any debugging information entry
  | within the same executable; it may refer to an entry in a different
  | compilation unit from the unit containing the reference, and may refer
  | to an entry in a different shared object. This type of reference
  | (DW_FORM_ref_addr) is an offset from the beginning of the .debug_info
  | section; it is relocatable in a relocatable object file and frequently
  | relocated in an executable file or shared object. For references from
  | one shared object or static executable file to another, the relocation
  | must be performed by the consumer. In the 32-bit DWARF format, ...


Adopted.