000517.2 W B. Nettleton Representation Line Number Table Is_Stmt

Proposal for DWARF 2.1 change:
------------------------------

This change modifies the is_stmt register of the state
machine to have an initialized value of "true", and clarifies the
responsibility of a pipeline scheduling code generator to
identify some instruction as the "beginning" of a source line.

 

Here's what the DWARF 2.0.0 spec says about the Is_Stmt
boolean.

6.2 Line Number Information

...
If space were not a consideration, the information provided in
the .debug_line section could be represented as a large matrix,
with one row for each instruction in the emitted object code.
The matrix would have columns for:
...
- whether this instruction is the beginning of a source statement
...

6.2.2 State Machine Registers
...
is_stmt A boolean indicating that the current instruction is the
beginning of a statement.
...
At the beginning of each sequence within a statement program, the
state of the registers is:
...
is_stmt determined by default_is_stmt in the statement program
prologue
...

6.2.4 The Statement Program Prologue
...
5. default_is_stmt (ubyte) The initial value of the is_stmt
register.

A simple code generator that emits machine instructions in the order
implied by the source program would set this to "true," and every entry
in the matrix would represent a statement boundary. A pipeline
scheduling code generator would set this to "false" and emit a specific
statement program opcode for each instruction that represented a
a statement boundary.

6.2.5.2 Standard Opcodes
...
6. DW_LNS_negate_stmt
Takes no arguments. Set the is_stmt register of the state machine to the
logical negation of its current value.


This seems straight forward enough except the part in
section 6.2.4 about a pipeline scheduling code generator. This is
where the problem gets interesting (without this case the boolean
would be unnecessary anyway).  There does seem to be a potential
argument that a pipeline optimizing compiler writer could make
that no instruction is a statement boundary!

While I understand the meaning of the theoretical boolean
in section 6.2, it seems less clear in the context of the state
machine and the actual is_stmt boolean. What would one expect a
debugger to do with entries where the is_stmt boolean is false?
This note has more discussion, ad nauseam, of the issue after a
proposal for change.



Textual changes to the specification:


6.2 Line Number Information
...
Such a matrix, however, would be impractically large. We
shrink it with two techniques. First, we delete from the matrix
each row whose file, line and source column information is
identical with that of its predecessors. [new text] Any deleted
rows would never be the beginning of a source statement.
[end new text]
...

6.2.2 State Machine Registers
...
is_stmt A boolean indicating that the current instruction is
the beginning of a statement.

[new text] Every distinct line number within should always have
one and only one instruction for which this boolean is true.
Except in the case of inlining or template expansion where a
line number is semantically repeated in a source file, then each
expansion of a line number should always have one and only one
instruction for which this boolean is true.

A simple code generator that emits machine instructions in the order
implied by the source program would never modify this register and every
entry in the matrix would represent a statement boundary. A pipeline
scheduling code generator might mark some instructions as false when
instructions from several source statements are intermixed.[end new text]
...
At the beginning of each sequence within a statement
program, the state of the registers is:
...
is_stmt [modified text] "true" [end modified text] basic_block ...

6.2.4 The Statement Program Prologue
...
5. [modified text] unused (ubyte) This byte is currently unused.
[end modified text]

6. line_base (sbyte)
...

Further Discussion:
-------------------

The current spec always for, and in fact says a pipeline
scheduling code generator should default the is_stmt boolean
to "false".  This is wrong in that the first instruction of
any sequence would seem by definition to be the beginning of
a source line! It is allowed for a compiler to generated
instructions which aren't associated with any line number
in which case the line number is identified as 0. A debugger
would largely ignore these instructions anyway (especially
the is_stmt boolean for these). So even if an optimizing
compiler generated instructions which aren't associated with a
line number then eventually the first instruction generated
for an actual source line would still seem to be the first
instruction for that source line.

So what might a debugger do with entries in the table where
is_stmt is false. Debuggers use the line number tables for
basically four things:

1 - To set a breakpoint at the beginning of a source line.

2 - When stepping at the source level to identify when a
new source line has been encountered.

3 - When displaying interspersed disassembled machine code
with source code the line number tables are used to identify
where to insert source code into the disassembly listing.

4 - When a hardware exception occurs, or when displaying a
stack trace back the tables are used to identify the particular
source line associated with an instruction address.

Number 4 is probably the main situation where instructions
with both "true" and "false" is_stmt's are useful. Certainly
for number 1 only the "true" is_stmt instructions are interesting.
It isn't clear whether the "false" is_stmt instructions would
or should be used for items 2 and 3 (while using them might
be more technically accurate it also would significantly add to
the "noise" when debugging, stepping back and forth over
several lines is distracting).

One might ask "Do we need an is_stmt boolean anyway? Can't a
debugger simply identify the first instruction associated with a
line number and use this for setting breakpoints and then deal
with the other situations as needed?" The answer is that yes
we do need the is_stmt boolean to handle situations where a
source line is expanded multiple times in a file. For
example an inline subroutine which was called twice would
have it's source lines "begin" twice in the instruction
sequence. It's not clear that this is why the DWARF 2 spec
originally included this boolean, but this probably does
justify it's existence.


Withdrawn pending rewording.  This proposal is only editorial and does not require committee
approval providing no substantive changes are made to the normative text.