000131.1 B A D. Anderson Processor Prolog identification

The following proposal replaces the original proposal, which
is retained at the bottom for reference.

Proposal 000131.1 prologue identification

===Summary

Debuggers (often) want a 'function entry' breakpoint placed after the
function prologue has executed (thus setting up the frame and (usually)
space for local variables, and 'homing' arguments on to the stack where
applicable). But before any user-coded statements are executed.

But it has been difficult and compiler-dependent to find the end of a
function prologue and the beginning of the user's statements.

A closely related problem is finding the beginning of a function epilogue.
This has been even more difficult and compiler dependent.

===Current Specification

Nothing in the current spec addresses this problem.

===What debuggers do now
gdb uses compiler-dependent heuristics mostly, and do (or can) depend
on the line table, I believe.

SGI's debuggers use a heuristic depending on how line numbers are assigned
to find the beginning of the first line (see the sgi mips-extension document
in the libdwarf source distribution or on my web pages for the ugly details).

For a function epilogue, a debugger might want to set a breakpoint here to
allow stopping to show values before the frame is torn down (the "finish the
current function" notion [Ref 2]). Or to print a 'Slime Trail' report [Ref 2].

As an alternate, one can imagine disassembling functions to try to
find return code (to identify the function epilogues).

SGI's debuggers don't attempt to find the function epilogue at all. Without
compiler support it was judged not worth the effort.


===Proposed modification to the document

Sec 6.2.2, page 51

prologue_end        A boolean indicating that the current address
                    is that of the first user statement in the
                    function. Normally after the local frame
                    is set up.

epilogue_begin      A boolean indicating that the current address
                    is after the last user statement in the
                    function. Before the local frame
                    is torn down.

and for the register state setup of the spec at the end of 6.2.2:

prologue_end        ``false''
epilogue_begin      ``false''


Sec 6.2.5.1 page 56, add
        5. set the epilogue-start register to false.
        6. set the prologue-end register to false.

Sec 6.2.5.2, page 56, add
        10. DW_LNS_set_prologue_end
            set the prologue-end register to true.

in italics:
" This exists so a debugger can know where to set breakpoints
after the frame is completely set up (but no user
code has been executed).
Only a single DW_LNS_set_prologue_end should appear per
function entry path. So if a single function has a fast-path
prologue and a 'normal' prologue, that means two
DW_LNS_set_prologue_end must be set on different addresses and
in any given entry sequence only one of the addresses may
be executed. In the presence of optimization,
choosing the address to set this becomes more difficult."

        11. DW_LNS_set_epilogue_begin
            set the epilogue-begin register to true.

in italics:
" This exists so a debugger can know where to set breakpoints
after the function has finished its work
but the local frame is still set up (but no user
code is left to be executed).
Only a single DW_LNS_set_epilogue_begin should appear per
function return path. So if a single function has multiple
return code sequences,
multiple
DW_LNS_set_prologue_end must be set on different addresses and
in any given return sequence only one of the addresses may
be executed. In the presence of optimization,
choosing the address to set this becomes more difficult."

===Why multiple prologue/epilogue points per subprogram

Should the following be an italicized group of paragraphs in the revised
dwarf2 document?  Or is it over-specifying?
---
Traditionally subprograms had a single entry sequence and a single exit.

Compilers began emitting multiple return sequences where that was
profitable-in-cpu-time years ago.

Some compilers now use a "fast path" or "shrink wrapping" to determine
(in the initial instructions) that a subprogram has essentially nothing to
do so the fast path returns without every really finishing frame setup.
And another path or paths does the real work.

So, today, the notion of a 'prologue' and an 'epilogue' have gotten more
complicated.

The contract (compiler to debugger) recommended here is that the compiler
will mark only one instruction in any given execution sequence (of a
subprogram) to be the prologue end.

And the compiler will mark only one instruction in any given execution
sequence (of a subprogram) to be the epilogue start.

No matter how many prologue or epilogue sets there are in a subprogram.

Consider the function
    int A(int v,double y, char *p)
    {
        if(v == 0)
            return v;
        v = y*3.0 + *p;
        return v;
    }

In rough pseudo code as an example (not C, lower level than that to try
to make this clear).
        Subprogram A
        if (arg0 == 0) {
            mark prologue end
            mark epilogue start
            return 0
        }
        set up frame
        mark prologue end
        calculate v
        mark epilogue start
        tear down frame
        return v

===The Fortran aspect of this

Some Fortran compilers emit part of the prologue at the end of the function.

Low level pseudo code:
        Subprogram B
        go to 3;
        4:
        // subprogram does something
        return
        3:
        set up stack frame
        go to 4:

In this case, the prologue-end is at 4:

While setting a prologue-length can work (if interpreted as length to th
prologue-end instruction) it is a non-intuitive way to think of 'prologue
length'.

===Using Tags and Attributes instead of the line table

Various people proposed:
        DW_AT_prologue_end (with address)
        DW_AT_prologue_length (uleb)
        DW_AT_prologue_ends (uleb count, count sleb offsets)

The 'length' or 'offset' here would mean bytes (or, some prefer,
instructions) from the function entry point.

An advantage of this (as compared to the line table) is that these
attributes (and epilogue attributes) are available on simply reading
the subprogram DIE.  Efficient if the debugger actually reads the
DIEs before reading the line table.

A disadvantage is that it takes more space than using the line table.

Consider a simple single-entry single exit function.  Because it must
be a list, it takes 2 leb numbers (a count and an offset) for the entry
end-prologue and the same for the epilogue. So two bytes per marker.
(Plus an abbreviation). And if the distance grows beyond 127 bytes (as
it usually would, for the epilogue) that is 3 bytes (two for the leb
offset of the epilogue number).  (Providing a means of counting 'by
instructions' where that is practical would require a multiplier from
some other attribute. Such is not considered further here).

The line table approach takes 1 byte per marker (the standard op code)
plus a 'special' opcode to signify a table row (necessary to get the
address emitted specifically).

This is a rather small difference in size...

===Using .debug_frame information instead of the line table

One might imagine
        DW_CFA_set_prologue_end
        DW_CFA_restore_prologue_end
        DW_CFA_set_epilogue_begin
        DW_CFA_restore_epilogue_begin
as frame settings of flags.

The implementation might define a pseudo register, say DW_FRAME_FLAGS
where the value of the register is a flag value, where bit one on means
it is the prologue end, and bit two on means it is the epilogue begin.
Or separate registers.

One problem here is that most of the pseudo-registers are implementation
defined.

Another problem is that such flags would increase the size of .debug_frame,
and for implementations using .debug_frame for stack unwinding (c++
exception handling etc) the extra performance hit in unwinding (reading
more frame instructions) is not something that should be welcomed.
So I suggest we abandon this approach.

===The Key Instruction idea and the is_stmt boolean

For more information on the Key Instruction idea, see Ref 1.

Recall that the prologue-end is to be marked before the first instruction
of user code.

But for user code the idea is that what one wants to mark as the breakpoint
for instructions is not necessarily the _first_ instruction of those for
the statement. In optimized code that is a bad choice. Instead, find the
statement that does something user-visible. The thing that has what the
user thinks of as the effect of the statement. Call that the key instruction.

In the case of a single prologue the prologue marker would be before the
key instruction of the first user code in the subprogram and after the frame
was set up (to the extent a frame is set up at all).

If there *are* no key instructions of user code, this is the offset of the
'key' instruction of the return (before epilogue code).

If the entire subprogram is a single instruction there are no two instructions
to place separate prologue and epilogue breakpoints on. In that unique case
the prologue and epilogue addresses would have to be on the same instruction.

For simple traditional non-optimized-code debugging, instructions and
prologue are not mixed so the key instruction could simply be the first
instruction of user code for a particular user statement. This corresponds
to the line table 'is_stmt' flag.

For instruction scheduling and a more subtle definition, the 'key'
instruction could be defined as the first instruction causing a user-variable side-effect. This might correspond to an interpretation of the 'is_stmt'
in the line table, though it might be stretching the line table definition
a bit.

In case no side effect exists, (a useless statement) some instruction
must be chosen by the compilation system as the key instruction (only
if any instructions generated).

In any case, the definition adopted by a compilation system must be
defined if it is to be generated in the dwarf2.  Debuggers will simply
assume that the is_stmt boolean is the place to set a breakpoint (if
there is such a boolean set).

===libdwarf
This need take no more space than current libdwarf for the
line-table-per-compilation-unit data areas (the interface data). A
separate call interface to return just the relevant prologue/epilogue
info could be provided.


===Acknowledgements.
Several people on the dwarf2 committee contributed heavily to this topic.
Their help is much appreciated.

===Ref 1. Key Instructions
http://www.cs.berkeley.edu/~cmtice/ has, in postscript:
OPTVIEW: A New Approach for Examining Optimized Code
Caroline Tice and Susan L. Graham, Proceedings of the 1998 ACM SIGPLAN
Workshop on Program Analysis for Software Tools and Engineering,
ACM SIGPLAN Notices (7)33, July 1998


===Ref 2. Slime Trail Debugging
"How Debuggers Work"
by Jonathan B. Rosenberg.
Pub by Wiley.
Pages 130,131 discuss the "finish the current function" idea. And the
"Slime Trail" idea. The "Slime Trail" is the debugger not stopping on
function entry exit, but printing useful information on every entry and
exit. Occasionally useful (and if available, saves lots of debugging
time when it is needed). The typical situation when this is valuable is
when the application trashes its stack so a return winds up at address
0 or some other bogus address, resulting in ad immediate, or near-immediate,
seg fault (or other error). And the programmer has no idea where it was
before the crash (so no idea what might have trashed the stack).


Proposal for DW_AT_prologue_end.

One thing that has been difficult and compiler-dependent
has been a way to find the end of a function prologue
and the beginning of the user's statements.

gdb uses compiler-dependent heuristics mostly.

SGI's debuggers use a heuristic depending on
how line numbers are assigned to find the beginning of the first
line (see the sgi mips-extension document in the libdwarf
source distribution or on my web pages for the ugly details).

Old f77 compilers did things like
insert a branch to end-area of a routine in the prologue and
at the end of setting up, a branch back to the
beginning (back to to the label following the branch to
end-area). The code at the end-area was prologue code too.

The reason this matters:
a debugger often wants to stop after the prologue has been
run so that arguments have been assigned to their home
in the function.

This is a proposal that
DW_AT_prologue_end
be added to dwarf2. It is, like almost all attributes,
optional. When present it improves the ability of the
debugger to 'do the right thing' without machine
specific heuristics.

It is defined to have a value of
the unsigned offset from the beginning
of the function to the first key user-level
instruction in the function.
Normally a form uleb number, but other forms achieving
the same thing are ok (DW_FORM_data1, etc), as in normal dwarf2.

If there *are* no key instructions of user code, this is the
offset of the 'key' instruction of the return
(epilogue code).

A 'key' instruction depends on how your debuggers work.

For simple traditional (-g) debugging
instructions and prologue are not mixed so the key instruction
could simply be the first instruction of user code for a
particular user statement.
This corresponds to the line table 'is_stmt' flag.

For instruction scheduling and a more
subtle definition, the 'key' instruction could be
defined as the first instruction causing a user-variable
side-effect. This might
correspond to an interpretation of the 'is_stmt'
in the line table, though it might be streching
the line table definition a bit.

In case no side effect exists, (a useless statement)
some instruction must be choseen by the compilation system
as the key instruction (only if any instructions generated).

In any case, the definition adopted by a compilation
system must be defined if it is to be generated in the dwarf2.

[For the justification of the key instruction idea,
see
http://www.cs.berkeley.edu/~cmtice/
which has
OPTVIEW: A New Approach for Examining Optimized Code Caroline Tice and Susan
L.
Graham, Proceedings of the 1998 ACM SIGPLAN Workshop on
Program Analysis for Software Tools and Engineering,
ACM SIGPLAN Notices (7)33, July 1998, available in Postscript.

a short presentation of the ideas in Caroline Tice's PhD Thesis.
(when I last checked, Berkeley did not have her Thesis on line,
but perhaps it is there now somewhere)

]

An alternative definition of the attribute as having
an address (not an offset) would be perfectly ok, but
the offset is smaller, normally. And the offset
does not require later linker update (no relocation
record required). So I suggest we define it as an offset.

At an initial dwarf definition meeting (in the early 1990's),
it was suggested that there be a lexical block with
low pc and high pc corresponding to this 'prologue end
and eplilogue begin'. This was rejected, which
was a good thing, since it is often true that modern
compilers generate multiple epilogues
where it is profitable-at-run-time so a simple
range would not have worked well anyway.