001016.1 | A | R. Brender | Representation | Interludes (aka trampolines) |
Motivation
C++ implementations sometimes use small compiler generated functions,
here called "interludes", that serve as surrogate method functions when a
class is inherited into another class. The sole purpose of the interlude
function is to adjust the value of the implicit 'this' pointer parameter and
then pass control to the real method function in the inherited class.
(Interludes are sometimes called "trampolines" or "thunks".)
When a debugger is asked to set a breakpoint on a method function that happens
to be implemented by an interlude, the breakpoint name may resolve to the
entry point of that derived function (interlude). Similarly, if the user is
single stepping into a derived method, control may well step into the interlude.
In either case, since the interlude is an artifact of the inheritance rather
than a distinct user visible member function in the inheriting class, it is
desirable practice for the debugger to set the breakpoint for stopping in, or
step into, the ultimate real method instead of the interlude as such. For
breakpoints, this assures that the breakpoint will trigger regardless of
whether original or derived function is called.
The following provides a means to accomplish this.
PROPOSAL
--------
Add the following to DWARF in a new Section 3.3.9:
"3.3.9 Interludes
"<i> An interlude is a compiler generated member function of
a class
whose purpose is to adjust the implicit 'this' pointer and then call a
corresponding member function from another inherited class.</i>
"An interlude is represented by a debugging information entry with
the
tag DW_TAG_subprogram or DW_TAG_inlined_subprogram that has a
DW_AT_interlude attribute. The value of that attribute indicates the
corresponding member function of the inherited class. (An interlude
entry may but need not also have a DW_AT_artificial attribute.) The
value
may be either of class reference or class address. If class reference
is
used, it refers to the debugging information entry for the declaration,
if
available, otherwise the definition, of the inherited member function.
If class address is used, it specifies the value of the entry PC for
the
generated code of the inherited member function. In either case, the
inherited member function may itself be an interlude. (Such a sequence
of interlude functions necessarily ends with a non-interlude function.)
"<i> A reference can always be used if the inherited member
function is
defined as part of the current compilation unit. An address can always
be used if the inherited member function is defined outside of the
current compilation unit. (An address can even be used when the
inherited
member function is defined in a compilation unit that does not have
DWARF debugging information.) </i>"
Note: For the purposes of this proposal, any of the terms "interlude",
"trampoline" and "thunk" may be considered equivalent. In some email
exchanges,
it appears that "trampoline" may be preferable to many. I have no problem
with such a change of name.
DISCUSSION
----------
There are two parts to achieving the goals mentioned above:
1) Identify that a method function is an interlude
2) Given an interlude, determine the corresponding member function in
the inherited class from which it is derived
Identification
There seem several ways to make this identification:
1) The interlude-ness may be reflected in the mangled name
2) The generated code for the method might be examined to determine whether
it "looks" like an interlude.
Is it possible for an optimizing compiler to transform an explicit
member
function so that it cannot be distinguished from an interlude based
only
on the generated code? I think the answer is yes. Consider, for
example,
a member function that does nothing more than call the
"corresponding"
member function of a class that it inherits. As a result, depending
on just examination of the generated code can lead to false positive
identification.
3) The DW_AT_artificial attribute might be used as a hint that a function
is an interlude. Since artificial functions might be generated for
various purposes, this hint needs some kind of confirming action
such as checking the generated code to see if it "looks" like
that
of an interlude.
Is it possible for a compiler generated member function that is not
an interlude to look like one? This seems pretty unlikely but I am
reluctant to claim it is impossible. If it is possible, then even the
combination of the DW_AT_artificial attribute and generated code
examination could lead to a false positive.
4) We might define a more explicit DW_AT_interlude attribute that would
make this identification simple and unambiguous.
If interludes can be inlined by a compiler, so that the 'this' pointer
adjustment occurring directly as part of the calling function, then no
technique that depends even in part on examination of generated code is
likely to be both reliable and simple enough to be practical. That appears
to leave only 1) and 4) as viable approaches.
Note: I assume a debugger that does have good support for inlining.
That is, it is not the mere occurrence of inlining of itself that is
significant but rather that even with good inlining support examination
of the generated code is untenable.
"Un-derivation"
Suppose that the appropriate interlude has been identified and confirmed
by some mechanism, and next consider how best to work back to find the
member function from which it derives.
1) If the interlude is identified on the basis of its mangled name, could
the function from which it derives also be determined from the name?
This is possible, but probably not attractive.
- Such names will tend to be long (perhaps double the length
they would otherwise have?)
- The extra information is only relevant to debugging. Implementations
are likely hesitant to modify the mangling rules for
debugging
purposes if there is a viable alternative that is available
as
part of the debugging information itself (here, DWARF).
2) If the original method function is defined in the same compilation unit
as the derived one, then a debugger can probably start at the interlude
and, using a combination of knowledge of the name mangling scheme and
the DWARF representation, work backward to identify the original
method function. (If the method is overloaded then the algorithm may be
non-trivial but is still quite doable.)
If the original method function is defined in some other compilation
and the DWARF information for the class declaration is less than
complete
(for example, because the implementation is using a space optimization
technique which attempts to describe a complete class only once) this
becomes rather harder.
3) If there is a non-inlined (closed form) version of the interlude, then
it is probably possible to interpret that code to identify the address
of the target member function that it invokes.
In simple cases, this is viable. If the interlude can be
inlined into the caller, this starts to become hard if not impossible.
And if it is possible for the original member function to be inlined
into the interlude (does that make it not an interlude? it is surely
still artificial...), then the mind boggles.
4) We might define a more explicit DW_AT_interlude attribute that would
make this relationship simple and unambiguous.
Here, perhaps even more than in the earlier step, we see that inlining and
other compiler optimizations either complicate or eliminate approaches that
involve debugger interrogation of the generated code.
The DW_AT_interlude proposal
Since all of the other approaches have problems of one kind of another,
I was lead to offering the proposal given above.
With this formulation, both the identification of a member function as
being an interlude as well as the member function that is inherited are
explicit and simple to determine. There is no need to analyze or derive
any information from the generated code.
The only remaining question should be whether either the DWARF and/or
debugger support for inlining are sufficient to handle the full complexity
that might result. While I cannot speak from experience, I do suggest
that any weakness in this regard can and should be considered an inlining
problem as such rather than a problem with interludes or a reason to not
define/use interludes.
Aside: besides the inherited function, the other key property of an interlude
is the amount of the 'this' pointer adjustment. While this could be included in
the interlude representation, I don't know of any particular purpose to which
this information could be usefully put by a debugger...
The proposal was accepted with the stipulation that the new
attribute
be named DW_AT_trampoline. Additionally, it was stipulated that the
FORM of the DIE can be any of at least the following: A string, an
actual function name. This function name lookup is implementation-
dependent and could reference an ABI/implementation/object specific
table, such as an Elf symbol table. An Address of a function that
will be called. A reference to a DIE, the function definition DIE
of the function that is being called. A flag (when there is no way
to know the function address or name). When it's just a flag the
debugger must step (or equivalent) to get to the target function.