Published by: Bevin R Brett
Based on the
UNIX International
Programming Languages SIG
Revision: 2.0.0 (July 27, 1993)
document that was
Copyright © 1992, 1993 UNIX International, Inc.
Permission to use, copy, modify, and distribute this documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name UNIX International not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. UNIX International makes no representations about the suitability of this documentation for any purpose. It is provided "as is" without express or implied warranty.
UNIX INTERNATIONAL DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS DOCUMENTATION, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL UNIX INTERNATIONAL BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS DOCUMENTATION.
Trademarks:
Intel386 is a trademark of Intel Corporation.
UNIX ® is a registered trademark of UNIX System Laboratories in the United States and
other countries.
FOREWORD
Figure 1. Tag names
This document specifies the second generation of symbolic debugging information based on the DWARF format that has been developed by the UNIX International Programming Languages Special Interest Group (SIG).
This version is based on the UNIX International DWARF Debugging Information Format Revision: 2.0.0 (July 27, 1993) but has had additional links inserted in a manner believed to be implied by, but not explicit in, the original.
This document defines the format for the information generated by compilers, assemblers and linkage editors that is necessary for symbolic, source-level debugging. The debugging information format does not favor the design of any compiler or debugger. Instead, the goal is to create a method of communicating an accurate picture of the source program to any debugger in a form that is economically extensible to different languages while retaining backward compatibility.
The design of the debugging information format is open-ended, allowing for the addition of new debugging information to accommodate new languages or debugger capabilities while remaining compatible with other languages or different debuggers.
The debugging information format described in this document is designed to meet the symbolic, source-level debugging needs of different languages in a unified fashion by requiring language independent debugging information whenever possible. Individual needs, such as C++ virtual functions or Fortran common blocks are accommodated by creating attributes that are used only for those languages. The UNIX International Programming Languages SIG believes that this document sufficiently covers the debugging information needs of C, C++, FORTRAN77, Fortran90, Modula2 and Pascal.
This document describes DWARF Version 2, the second generation of debugging information based on the DWARF format. While DWARF Version 2 provides new debugging information not available in Version 1, the primary focus of the changes for Version 2 is the representation of the information, rather than the information content itself. The basic structure of the Version 2 format remains as in Version 1: the debugging information is represented as a series of debugging information entries, each containing one or more attributes (name/value pairs). The Version 2 representation, however, is much more compact than the Version 1 representation. In some cases, this greater density has been achieved at the expense of additional complexity or greater difficulty in producing and processing the DWARF information. We believe that the reduction in I/O and in memory paging should more than make up for any increase in processing time.
Because the representation of information has changed from Version 1 to Version 2, Version 2 DWARF information is not binary compatible with Version 1 information. To make it easier for consumers to support both Version 1 and Version 2 DWARF information, the Version 2 information has been moved to a different object file section, .debug_info.
The intended audience for this document are the developers of both producers and consumers of debugging information, typically language compilers, debuggers and other tools that need to interpret a binary program in terms of its original source.
There are two major pieces to the description of the DWARF format in this document. The first piece is the informational content of the debugging entries. The second piece is the way the debugging information is encoded and represented in an object file.
The informational content is described in sections two through six. Section two describes the overall structure of the information and attributes that are common to many or all of the different debugging information entries. Sections three, four and five describe the specific debugging information entries and how they communicate the necessary information about the source program to a debugger. Section six describes debugging information contained outside of the debugging information entries, themselves. The encoding of the DWARF information is presented in section seven.
Section eight describes some future directions for the DWARF specification.
In the following sections, text in normal font describes required aspects of the DWARF format. Text in italics is explanatory or supplementary material, and not part of the format definition itself.
This document does not attempt to cover all interesting languages or even to cover all of the interesting debugging information needs for its primary target languages (C, C++, FORTRAN77, Fortran90, Modula2, Pascal). Therefore the document provides vendors a way to define their own debugging information tags, attributes, base type encodings, location operations, language names, calling conventions and call frame instructions by reserving a portion of the name space and valid values for these constructs for vendor specific additions. Future versions of this document will not use names or values reserved for vendor specific additions. All names and values not reserved for vendor additions, however, are reserved for future versions of this document. See section 7 for details.
The following is a list of the major changes made to the DWARF Debugging Information Format since Version 1 of the format was published (January 20, 1992). The list is not meant to be exhaustive.
DWARF uses a series of debugging information entries to define a low-level representation of a source program. Each debugging information entry is described by an identifying tag and contains a series of attributes. The tag specifies the class to which an entry belongs, and the attributes define the specific characteristics of the entry.
The set of required tag names is listed in Figure 1. The debugging information entries they identify are described in sections three, four and five.
The debugging information entries in DWARF Version 2 are intended to exist in the .debug_info section of an object file.
Bevin notes: Many of the following have very little documentation. The reader is left to guess their intended usage from their spelling.
Each attribute value is characterized by an attribute name. The set of attribute names is listed in Figure 2.
The permissible values for an attribute belong to one or more classes of attribute value forms. Each form class may be represented in one or more ways. For instance, some attribute values consist of a single piece of constant data. "Constant data" is the class of attribute value that those attributes may have. There are several representations of constant data, however (one, two, four, eight bytes and variable length data). The particular representation for any given instance of an attribute is encoded along with the attribute name as part of the information that guides the interpretation of a debugging information entry.
Bevin notes: Many of the following have very little documentation. The reader is left to guess their intended usage from their spelling.
Attribute value forms may belong to one of the following classes.
address | Refers to some location in the address space of the described program. |
block | An arbitrary number of uninterpreted bytes of data. |
constant | variable length format known as LEB128 (see section 7.6). |
flag | A small constant that indicates the presence or absence of an attribute. |
reference | Refers to some member of the set of debugging information entries that describe the program. There are two types of reference. The first is an offset relative to the beginning of the compilation unit in which the reference occurs and must refer to an entry within that same compilation unit. The second type of reference is the address of any debugging information entry within the same executable or shared object; it may refer to an entry in a different compilation unit from the unit containing the reference. |
string | A null-terminated sequence of zero or more (non-null) bytes. Data in this form are generally printable strings. Strings may be represented directly in the debugging information entry or as an offset in a separate string table. |
There are no limitations on the ordering of attributes within a debugging information entry, but to prevent ambiguity, no more than one attribute with a given name may appear in any debugging information entry.
A variety of needs can be met by permitting a single debugging information entry to "own" an arbitrary number of other debugging entries and by permitting the same debugging information entry to be one of many owned by another debugging information entry. This makes it possible to describe, for example, the static block structure within a source file, show the members of a structure, union, or class, and associate declarations with source files or source files with shared objects.
The ownership relation of debugging information entries is achieved naturally because the debugging information is represented as a tree. The nodes of the tree are the debugging information entries themselves. The child entries of any node are exactly those debugging information entries owned by that node.
Note: While the ownership relation of the debugging information entries is represented as a tree, other relations among the entries exist, for example, a pointer from an entry representing a variable to another entry representing the type of that variable. If all such relations are taken into account, the debugging entries form a graph, not a tree. Information about the location of program objects is provided by location descriptions.
The tree itself is represented by flattening it in prefix order. Each debugging information entry is defined either to have child entries or not to have child entries (see section 7.5.3). If an entry is defined not to have children, the next physically succeeding entry is the sibling of the prior entry. If an entry is defined to have children, the next physically succeeding entry is the first child of the prior entry. Additional children of the parent entry are represented as siblings of the first child. A chain of sibling entries is terminated by a null entry.
In cases where a producer of debugging information feels that it will be important for consumers of that information to quickly scan chains of sibling entries, ignoring the children of individual siblings, that producer may attach an AT_sibling attribute to any debugging information entry. The value of this attribute is a reference to the sibling entry of the entry to which the attribute is attached.
The debugging information must provide consumers a way to find the location of program variables, determine the bounds of dynamic arrays and strings and possibly to find the base address of a subroutine's stack frame or the return address of a subroutine. Furthermore, to meet the needs of recent computer architectures and optimization techniques, the debugging information must be able to describe the location of an object whose location changes over the object's lifetime.
Information about the location of program objects is provided by location descriptions. Location descriptions can be either of two forms:
The two forms are distinguished in a context sensitive manner. As the value of an attribute, a location expression is encoded as a block and a location list is encoded as a constant offset into a location list table.
Note: The Version 1 concept of "location descriptions" was replaced in Version 2 with this new abstraction because it is denser and more descriptive.
A location expression consists of zero or more location operations. An expression with zero operations is used to denote an object that is present in the source code but not present in the object code (perhaps because of optimization). The location operations fall into two categories, register names and addressing operations. Register names always appear alone and indicate that the referred object is contained inside a particular register. Addressing operations are memory address computation rules. All location operations are encoded as a stream of opcodes that are each followed by zero or more literal operands. The number of operands is determined by the opcode.
The following operations can be used to name a register.
Note that the register number represents a DWARF specific mapping of numbers onto the actual registers of a given architecture. The mapping should be chosen to gain optimal density and should be shared by all users of a given architecture. The Programming Languages SIG recommends that this mapping be defined by the ABI (System V Application Binary Interface, consisting of the generic interface and processor supplements for each target architecture) authoring committee for each architecture, e.g. in the System V Application Binary Interface, consisting of the generic interface and processor supplements for each target architecture.
Each addressing operation represents a postfix operation on a simple stack machine. Each element of the stack is the size of an address on the target machine. The value on the top of the stack after "executing" the location expression is taken to be the result (the address of the object, or the value of the array bound, or the length of a dynamic string). In the case of locations used for structure members, the computation assumes that the base address of the containing structure has been pushed on the stack before evaluation of the addressing operation.
The following operations all push a value onto the addressing stack.
The following operations push a value onto the stack that is the result of adding the contents of a register with a given signed offset.
The following operations manipulate the "location stack." Location operations that index the location stack assume that the top of the stack (most recently added entry) has index 0.
The following provide arithmetic and logical operations. The arithmetic operations perform "addressing arithmetic," that is, unsigned arithmetic that wraps on an address-sized boundary. The operations do not cause an exception on overflow.
The following operations provide simple control of the flow of a location expression.
There are two special operations currently defined:
DW_OP_piece takes a single argument which is an unsigned LEB128 number. The number describes the size in bytes of the piece of the object referenced by the addressing expression whose result is at the top of the stack.
The stack operations defined in section 2.4.3.3 are fairly conventional, but the following examples illustrate their behavior graphically.
Before Operation After
0 17 DW_OP_dup 0 17 1 29 1 17 2 1000 2 29 3 1000
0 17 DW_OP_drop 0 29 1 29 1 1000 2 1000
0 17 DW_OP_pick 2 0 1000 1 29 1 17 2 1000 2 29 3 1000
0 17 DW_OP_over 0 29 1 29 1 17 2 1000 2 29 3 1000
0 17 DW_OP_swap 0 29 1 29 1 17 2 1000 2 1000
0 17 DW_OP_rot 0 29 1 29 1 1000 2 1000 2 17
The addressing expression represented by a location expression, if evaluated, generates the runtime address of the value of a symbol except where the DW_OP_regn, or DW_OP_regx operations are used.
Here are some examples of how location operations are used to form location expressions:
DW_OP_reg3
The value is in register 3.
DW_OP_regx54
The value is in register 54.
DW_OP_addr0x80d0045c
The value of a static variable is at machine address 0x80d0045c.
DW_OP_breg1144
Add 44 to the value in register 11 to get the address of an automatic variable instance.
DW_OP_fbreg-50
Given an DW_AT_frame_base value of "OPBREG31 64," this example computes the address of a local variable that is -50 bytes from a logical frame pointer that is computed by adding 64 to the current stack pointer (register 31).
DW_OP_bregx54 32 DW_OP_deref
A call-by-reference parameter whose address is in the word 32 bytes from where register 54 points.
DW_OP_plus_uconst4
A structure member is four bytes from the start of the structure instance. The base address is assumed to be already on the stack.
DW_OP_reg3DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
A variable whose first four bytes reside in register 3 and whose next two bytes reside in register 10.
Location lists are used in place of location expressions whenever the object whose location is being described can change location during its lifetime. Location lists are contained in a separate object file section called .debug_loc. A location list is indicated by a location attribute whose value is represented as a constant offset from the beginning of the .debug_loc section to the first byte of the list for the object in question.
Each entry in a location list consists of:
Address ranges may overlap. When they do, they describe a situation in which an object exists simultaneously in more than one place. If all of the address ranges in a given location list do not collectively cover the entire range over which the object in question is defined, it is assumed that the object is not available for the portion of the range that is not covered.
The end of any given location list is marked by a 0 for the beginning address and a 0 for the end address; no location description is present. A location list containing only such a 0 entry describes an object that exists in the source code but not in the executable program.
Any debugging information entry describing a declaration that has a type has a DW_AT_type attribute, whose value is a reference to another debugging information entry. The entry referenced may describe a base type, that is, a type that is not defined in terms of other data types, or it may describe a user-defined type, such as an array, structure or enumeration. Alternatively, the entry referenced may describe a type modifier: constant, packed, pointer, reference or volatile, which in turn will reference another entry describing a type or type modifier (using a DW_AT_type attribute of its own). See section 5 for descriptions of the entries describing base types, user-defined types and type modifiers.
Some languages, notably C++ and Ada, have the concept of the accessibility of an object or of some other program entity. The accessibility specifies which classes of other program objects are permitted access to the object in question.
The accessibility of a declaration is represented by a DW_AT_accessibility attribute, whose value is a constant drawn from the set of codes listed in Figure 3.
Modula2 has the concept of the visibility of a declaration. The visibility specifies which declarations are to be visible outside of the module in which they are declared.
The visibility of a declaration is represented by a DW_AT_visibility attribute, whose value is a constant drawn from the set of codes listed in Figure 4.
C++ provides for virtual and pure virtual structure or class member functions and for virtual base classes.
The virtuality of a declaration is represented by a DW_AT_virtuality attribute, whose value is a constant drawn from the set of codes listed in Figure 5.
A compiler may wish to generate debugging information entries for objects or types that were not actually declared in the source of the application. An example is a formal parameter entry to represent the hidden this parameter that most C++ implementations pass as the first argument to non-static member functions.
Any debugging information entry representing the declaration of an object or type artificially generated by a compiler and not explicitly declared by the source program may have a DW_AT_artificial attribute. The value of this attribute is a flag.
In some systems, addresses are specified as offsets within a given segment rather than as locations within a single flat address space.
Any debugging information entry that contains a description of the location of an object or subroutine may have a DW_AT_segment attribute, whose value is a location description. The description evaluates to the segment value of the item being described. If the entry containing the DW_AT_segment attribute has a DW_AT_low_pc or DW_AT_high_pc attribute, or a location description that evaluates to an address, then those values represent the offset portion of the address within the segment specified by DW_AT_segment.
If an entry has no DW_AT_segment attribute, it inherits the segment value from its parent entry.
If none of the entries in the chain of parents for this entry back to its containing compilation unit entry have DW_AT_segment attributes, then the entry is assumed to exist within a flat address space. Similarly, if the entry has a DW_AT_segment attribute containing an empty location description, that entry is assumed to exist within a flat address space.
Some systems support different classes of addresses. The address class may affect the way a pointer is dereferenced or the way a subroutine is called.
Any debugging information entry representing a pointer or reference type or a subroutine or subroutine type may have a DW_AT_address_class attribute, whose value is a constant. The set of permissible values is specific to each target architecture. The value DW_ADDR_none, however, is common to all encodings, and means that no address class has been specified.
For example, the Intel386 (tm) processor might use the following values:
Name | Value | Meaning |
---|---|---|
DW_ADDR_none | 0 | no class specified |
DW_ADDR_near16 | 1 | 16-bit offset, no segment |
DW_ADDR_far16 | 2 | 16-bit offset, 16-bit segment |
DW_ADDR_huge16 | 3 | 16-bit offset, 16-bit segment |
DW_ADDR_near32 | 4 | 32-bit offset, no segment |
DW_ADDR_far32 | 5 | 32-bit offset, 16-bit segment |
A debugging information entry representing a program object or type typically represents the defining declaration of that object or type. In certain contexts, however, a debugger might need information about a declaration of a subroutine, object or type that is not also a definition to evaluate an expression correctly.
As an example, consider the following fragment of C code:
void myfunc() { int x; { extern float x; g(x); } }
ANSI-C scoping rules require that the value of the variable x passed to the function g is the value of the global variable x rather than of the local version.
Debugging information entries that represent non-defining declarations of a program object or type have a DW_AT_declaration attribute, whose value is a flag.
It is sometimes useful in a debugger to be able to associate a declaration with its occurrence in the program source.
Any debugging information entry representing the declaration of an object, module, subprogram or type may have DW_AT_decl_file, DW_AT_decl_line and DW_AT_decl_column attributes, each of whose value is a constant.
The value of the DW_AT_decl_file attribute corresponds to a file number from the statement information table for the compilation unit containing this debugging information entry and represents the source file in which the declaration appeared (see section 6.2). The value 0 indicates that no source file has been specified.
The value of the DW_AT_decl_line attribute represents the source line number at which the first character of the identifier of the declared object appears. The value 0 indicates that no source line has been specified.
The value of the DW_AT_decl_column attribute represents the source column number at which the first character of the identifier of the declared object appears. The value 0 indicates that no column has been specified.
Any debugging information entry representing a program entity that has been given a name may have a DW_AT_name attribute, whose value is a string representing the name as it appears in the source program. A debugging information entry containing no name attribute, or containing a name attribute whose value consists of a name containing a single null byte, represents a program entity for which no name was given in the source.
Note that since the names of program objects described by DWARF are the names as they appear in the source program, implementations of language translators that use some form of mangled name (as do many implementations of C++) should use the unmangled form of the name in the DWARF DW_AT_name attribute, including the keyword operator, if present. Sequences of multiple whitespace characters may be compressed.
This section describes debugging information entries that relate to different levels of program scope: compilation unit, module, subprogram, and so on. These entries may be thought of as bounded by ranges of text addresses within the program.
An object file may be derived from one or more compilation units. Each such compilation unit will be described by a debugging information entry with the tag DW_TAG_compile_unit.
A compilation unit typically represents the text and data contributed to an executable by a single relocatable object file. It may be derived from several source files, including pre-processed "include files."
The compilation unit entry may have the following attributes:
The address may be beyond the last valid instruction in the executable, of course, for this and other similar attributes.
The presence of low and high pc attributes in a compilation unit entry imply that the code generated for that compilation unit is contiguous and exists totally within the boundaries specified by those two attributes. If that is not the case, no low and high pc attributes should be produced.
DW_LANG_C | Non-ANSI C, such as K&R |
DW_LANG_C89 | ISO/ANSI C |
DW_LANG_C_plus_plus | C++ |
DW_LANG_Fortran77 | FORTRAN77 |
DW_LANG_Fortran90 | Fortran90 |
DW_LANG_Modula2 | Modula2 |
DW_LANG_Pascal83 | ISO/ANSI Pascal |
This information is placed in a separate object file section from the debugging information entries themselves. The value of the statement list attribute is the offset in the .debug_line section of the first byte of the line number information for this compilation unit. See section 6.2.
This information is placed in a separate object file section from the debugging information entries themselves. The value of the macro information attribute is the offset in the .debug_macinfo section of the first byte of the macro information for this compilation unit. See section 6.3.
The suggested form for the value of the DW_AT_comp_dir attribute on UNIX systems is "hostname:pathname". If no hostname is available, the suggested form is ":pathname".
DW_ID_case_sensitive is the default for all compilation units that do not have this attribute. It indicates that names given as the values of DW_AT_name attributes in debugging information entries for the compilation unit reflect the names as they appear in the source program. The debugger should be sensitive to the case of identifier names when doing identifier lookups.
DW_ID_up_case means that the producer of the debugging information for this compilation unit converted all source names to upper case. The values of the name attributes may not reflect the names as they appear in the source program. The debugger should convert all names to upper case when doing lookups.
DW_ID_down_case means that the producer of the debugging information for this compilation unit converted all source names to lower case. The values of the name attributes may not reflect the names as they appear in the source program. The debugger should convert all names to lower case when doing lookups.
DW_ID_case_insensitive means that the values of the name attributes reflect the names as they appear in the source program but that a case insensitive lookup should be used to access those names.
This attribute points to a debugging information entry representing another compilation unit. It may be used to specify the compilation unit containing the base type entries used by entries in the current compilation unit (see section 5.1).
This attribute provides a consumer a way to find the definition of base types for a compilation unit that does not itself contain such definitions. This allows a consumer, for example, to interpret a type conversion to a base type correctly. A compilation unit entry owns debugging information entries that represent the declarations made in the corresponding compilation unit.
Several languages have the concept of a "module."
A module is represented by a debugging information entry with the tag DW_TAG_module. Module entries may own other debugging information entries describing program entities whose declaration scopes end at the end of the module itself.
If the module has a name, the module entry has a DW_AT_name attribute whose value is a null- terminated string containing the module name as it appears in the source program.
If the module contains initialization code, the module entry has a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for that initialization code. It also has a DW_AT_high_pc attribute whose value is the relocated address of the first location past the last machine instruction generated for the initialization code.
If the module has been assigned a priority, it may have a DW_AT_priority attribute. The value of this attribute is a reference to another debugging information entry describing a variable with a constant value. The value of this variable is the actual constant value of the module's priority, represented as it would be on the target architecture.
A Modula2 definition module may be represented by a module entry containing a DW_AT_declaration attribute.
The following tags exist to describe debugging information entries for subroutines and entry points:
3.3.1 General Subroutine and Entry Point InformationThe subroutine or entry point entry has a DW_AT_name attribute whose value is a null- terminated string containing the subroutine or entry point name as it appears in the source program. If the name of the subroutine described by an entry with the tag DW_TAG_subprogram is visible outside of its containing compilation unit, that entry has a DW_AT_external attribute, whose value is a flag. Additional attributes for functions that are members of a class or structure are described in section 5.5.5. A common debugger feature is to allow the debugger user to call a subroutine within the subject program. In certain cases, however, the generated code for a subroutine will not obey the standard calling conventions for the target architecture and will therefore not be safe to call from within a debugger. A subroutine entry may contain a DW_AT_calling_convention attribute, whose value is a constant. If this attribute is not present, or its value is the constant DW_CC_normal, then the subroutine may be safely called by obeying the "standard" calling conventions of the target architecture. If the value of the calling convention attribute is the constant DW_CC_nocall, the subroutine does not obey standard calling conventions, and it may not be safe for the debugger to call this subroutine. If the semantics of the language of the compilation unit containing the subroutine entry distinguishes between ordinary subroutines and subroutines that can serve as the "main program," that is, subroutines that cannot be called directly following the ordinary calling conventions, then the debugging information entry for such a subroutine may have a calling convention attribute whose value is the constant DW_CC_program. The DW_CC_program value is intended to support Fortran main programs. It is not intended as a way of finding the entry address for the program. 3.3.2 Subroutine and Entry Point Return TypesIf the subroutine or entry point is a function that returns a value, then its debugging information entry has a DW_AT_type attribute to denote the type returned by that function. Debugging information entries for C void functions should not have an attribute for the return type. In ANSI-C there is a difference between the types of functions declared using function prototype style declarations and those declared using non-prototype declarations. A subroutine entry declared with a function prototype style declaration may have a DW_AT_prototyped attribute, whose value is a flag. 3.3.3 Subroutine and Entry Point LocationsA subroutine entry has a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for the subroutine. It also has a DW_AT_high_pc attribute whose value is the relocated address of the first location past the last machine instruction generated for the subroutine. Note that for the low and high pc attributes to have meaning, DWARF makes the assumption that the code for a single subroutine is allocated in a single contiguous block of memory. An entry point has a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for the entry point. Subroutines and entry points may also have DW_AT_segment and DW_AT_address_class attributes, as appropriate, to specify which segments the code for the subroutine resides in and the addressing mode to be used in calling that subroutine. A subroutine entry representing a subroutine declaration that is not also a definition does not have low and high pc attributes. 3.3.4 Declarations Owned by Subroutines and Entry PointsThe declarations enclosed by a subroutine or entry point are represented by debugging information entries that are owned by the subroutine or entry point entry. Entries representing the formal parameters of the subroutine or entry point appear in the same order as the corresponding declarations in the source program. There is no ordering requirement on entries for declarations that are children of subroutine or entry point entries but that do not represent formal parameters. The formal parameter entries may be interspersed with other entries used by formal parameter entries, such as type entries. The unspecified parameters of a variable parameter list are represented by a debugging information entry with the tag DW_TAG_unspecified_parameters. The entry for a subroutine or entry point that includes a Fortran common block has a child entry with the tag DW_TAG_common_inclusion. The common inclusion entry has a DW_AT_common_reference attribute whose value is a reference to the debugging entry for the common block being included (see section 4.2). 3.3.5 Low-Level InformationA subroutine or entry point entry may have a DW_AT_return_addr attribute, whose value is a location description. The location calculated is the place where the return address for the subroutine or entry point is stored. A subroutine or entry point entry may also have a DW_AT_frame_base attribute, whose value is a location description that computes the "frame base" for the subroutine or entry point. The frame base for a procedure is typically an address fixed relative to the first
unit of storage allocated for the procedure's stack frame. The DW_AT_frame_base
attribute can be used in several ways: Some languages support nested subroutines. In such languages, it is possible to reference the local variables of an outer subroutine from within an inner subroutine. The DW_AT_static_link and DW_AT_frame_base attributes allow debuggers to support this same kind of referencing. If a subroutine or entry point is nested, it may have a DW_AT_static_link attribute, whose value is a location description that computes the frame base of the relevant instance of the subroutine that immediately encloses the subroutine or entry point. In the context of supporting nested subroutines, the DW_AT_frame_base attribute value should obey the following constraints:
If a debugger is attempting to resolve an up-level reference to a variable, it uses the nesting structure of DWARF to determine which subroutine is the lexical parent and the DW_AT_static_link value to identify the appropriate active frame of the parent. It can then attempt to find the reference within the context of the parent. 3.3.6 Types Thrown by ExceptionsIn C++ a subroutine may declare a set of types for which that subroutine may generate or "throw" an exception. If a subroutine explicitly declares that it may throw an exception for one or more types, each such type is represented by a debugging information entry with the tag DW_TAG_thrown_type. Each such entry is a child of the entry representing the subroutine that may throw this type. All thrown type entries should follow all entries representing the formal parameters of the subroutine and precede all entries representing the local variables or lexical blocks contained in the subroutine. Each thrown type entry contains a DW_AT_type attribute, whose value is a reference to an entry describing the type of the exception that may be thrown. 3.3.7 Function Template InstantiationsIn C++ a function template is a generic definition of a function that is instantiated differently when called with values of different types. DWARF does not represent the generic template definition, but does represent each instantiation. A template instantiation is represented by a debugging information entry with the tag DW_TAG_subprogram. With three exceptions, such an entry will contain the same attributes and have the same types of child entries as would an entry for a subroutine defined explicitly using the instantiation types. The exceptions are:
3.3.8 Inline SubroutinesA declaration or a definition of an inlinable subroutine is represented by a debugging information entry with the tag DW_TAG_subprogram. The entry for a subroutine that is explicitly declared to be available for inline expansion or that was expanded inline implicitly by the compiler has a DW_AT_inline attribute whose value is a constant. The set of values for the DW_AT_inline attribute is given in Figure 9. Figure 9. Inline codes
3.3.8.1 Abstract InstancesFor the remainder of this discussion, any debugging information entry that is owned (either directly or indirectly) by a debugging information entry that contains the DW_AT_inline attribute will be referred to as an "abstract instance entry." Any subroutine entry that contains a DW_AT_inline attribute will be known as an "abstract instance root." Any set of abstract instance entries that are all children (either directly or indirectly) of some abstract instance root, together with the root itself, will be known as an "abstract instance tree." A debugging information entry that is a member of an abstract instance tree should not contain a DW_AT_high_pc, DW_AT_low_pc, DW_AT_location, DW_AT_return_addr, DW_AT_start_scope, or DW_AT_segment attribute. It would not make sense to put these attributes into abstract instance entries since such entries do not represent actual (concrete) instances and thus do not actually exist at run-time. The rules for the relative location of entries belonging to abstract instance trees are exactly the same as for other similar types of entries that are not abstract. Specifically, the rule that requires that an entry representing a declaration be a direct child of the entry representing the scope of the declaration applies equally to both abstract and non-abstract entries. Also, the ordering rules for formal parameter entries, member entries, and so on, all apply regardless of whether or not a given entry is abstract. 3.3.8.2 Concrete Inlined InstancesEach inline expansion of an inlinable subroutine is represented by a debugging information entry with the tag DW_TAG_inlined_subroutine. Each such entry should be a direct child of the entry that represents the scope within which the inlining occurs. Each inlined subroutine entry contains a DW_AT_low_pc attribute, representing the address of the first instruction associated with the given inline expansion. Each inlined subroutine entry also contains a DW_AT_high_pc attribute, representing the address of the first location past the last instruction associated with the inline expansion. For the remainder of this discussion, any debugging information entry that is owned (either directly or indirectly) by a debugging information entry with the tag DW_TAG_inlined_subroutine will be referred to as a "concrete inlined instance entry." Any entry that has the tag DW_TAG_inlined_subroutine will be known as a "concrete inlined instance root." Any set of concrete inlined instance entries that are all children (either directly or indirectly) of some concrete inlined instance root, together with the root itself, will be known as a "concrete inlined instance tree." Each concrete inlined instance tree is uniquely associated with one (and only one) abstract instance tree. Note, however, that the reverse is not true. Any given abstract instance tree may be associated with several different concrete inlined instance trees, or may even be associated with zero concrete inlined instance trees. Also, each separate entry within a given concrete inlined instance tree is uniquely associated with one particular entry in the associated abstract instance tree. In other words, there is a one-to-one mapping from entries in a given concrete inlined instance tree to the entries in the associated abstract instance tree. Note, however, that the reverse is not true. A given abstract instance tree that is associated with a given concrete inlined instance tree may (and quite probably will) contain more entries than the associated concrete inlined instance tree (see below). Concrete inlined instance entries do not have most of the attributes (except for DW_AT_low_pc, DW_AT_high_pc, DW_AT_location, DW_AT_return_addr, DW_AT_start_scope and DW_AT_segment) that such entries would otherwise normally have. In place of these omitted attributes, each concrete inlined instance entry has a DW_AT_abstract_origin attribute that may be used to obtain the missing information (indirectly) from the associated abstract instance entry. The value of the abstract origin attribute is a reference to the associated abstract instance entry. For each pair of entries that are associated via a DW_AT_abstract_origin attribute, both members of the pair will have the same tag. So, for example, an entry with the tag DW_TAG_local_variable can only be associated with another entry that also has the tag DW_TAG_local_variable. The only exception to this rule is that the root of a concrete instance tree (which must always have the tag DW_TAG_inlined_subroutine) can only be associated with the root of its associated abstract instance tree (which must have the tag DW_TAG_subprogram). In general, the structure and content of any given concrete instance tree will be directly analogous to the structure and content of its associated abstract instance tree. There are two exceptions to this general rule however.
Entries that represent members and anonymous types are omitted from concrete inlined instance trees because they would simply be redundant duplicates of the corresponding entries in the associated abstract instance trees. If any entry within a concrete inlined instance tree needs to refer to an anonymous type that was declared within the scope of the relevant inline function, the reference should simply refer to the abstract instance entry for the given anonymous type. If an entry within a concrete inlined instance tree contains attributes describing the declaration coordinates of that entry, then those attributes should refer to the file, line and column of the original declaration of the subroutine, not to the point at which it was inlined. 3.3.8.3 Out-of-Line Instances of Inline SubroutinesUnder some conditions, compilers may need to generate concrete executable instances of inline subroutines other than at points where those subroutines are actually called. For the remainder of this discussion, such concrete instances of inline subroutines will be referred to as "concrete out- of-line instances." In C++, for example, taking the address of a function declared to be inline can necessitate the generation of a concrete out-of-line instance of the given function. The DWARF representation of a concrete out-of-line instance of an inline subroutine is essentially the same as for a concrete inlined instance of that subroutine (as described in the preceding section). The representation of such a concrete out-of-line instance makes use of DW_AT_abstract_origin attributes in exactly the same way as they are used for a concrete inlined instance (that is, as references to corresponding entries within the associated abstract instance tree) and, as for concrete instance trees, the entries for anonymous types and for all members are omitted. The differences between the DWARF representation of a concrete out-of-line instance of a given subroutine and the representation of a concrete inlined instance of that same subroutine are as follows:
3.4 Lexical Block EntriesA lexical block is a bracketed sequence of source statements that may contain any number of declarations. In some languages (C and C++) blocks can be nested within other blocks to any depth. A lexical block is represented by a debugging information entry with the tag DW_TAG_lexical_block. The lexical block entry has a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for the lexical block. The lexical block entry also has a DW_AT_high_pc attribute whose value is the relocated address of the first location past the last machine instruction generated for the lexical block. If a name has been given to the lexical block in the source program, then the corresponding lexical block entry has a DW_AT_name attribute whose value is a null-terminated string containing the name of the lexical block as it appears in the source program. This is not the same as a C or C++ label (see below). The lexical block entry owns debugging information entries that describe the declarations within that lexical block. There is one such debugging information entry for each local declaration of an identifier or inner lexical block. 3.5 Label EntriesA label is a way of identifying a source statement. A labeled statement is usually the target of one or more "go to" statements. A label is represented by a debugging information entry with the tag DW_TAG_label. The entry for a label should be owned by the debugging information entry representing the scope within which the name of the label could be legally referenced within the source program. The label entry has a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for the statement identified by the label in the source program. The label entry also has a DW_AT_name attribute whose value is a null-terminated string containing the name of the label as it appears in the source program. 3.6 With Statement EntriesBoth Pascal and Modula support the concept of a "with" statement. The with statement specifies a sequence of executable statements within which the fields of a record variable may be referenced, unqualified by the name of the record variable. A with statement is represented by a debugging information entry with the tag DW_TAG_with_stmt. A with statement entry has a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for the body of the with statement. A with statement entry also has a DW_AT_high_pc attribute whose value is the relocated address of the first location after the last machine instruction generated for the body of the statement. The with statement entry has a DW_AT_type attribute, denoting the type of record whose fields may be referenced without full qualification within the body of the statement. It also has a DW_AT_location attribute, describing how to find the base address of the record object referenced within the body of the with statement. 3.7 Try and Catch Block EntriesIn C++ a lexical block may be designated as a "catch block." A catch block is an exception handler that handles exceptions thrown by an immediately preceding "try block." A catch block designates the type of the exception that it can handle. A try block is represented by a debugging information entry with the tag DW_TAG_try_block. A catch block is represented by a debugging information entry with the tag DW_TAG_catch_block. Both try and catch block entries contain a DW_AT_low_pc attribute whose value is the relocated address of the first machine instruction generated for that block. These entries also contain a DW_AT_high_pc attribute whose value is the relocated address of the first location past the last machine instruction generated for that block. Catch block entries have at least one child entry, an entry representing the type of exception accepted by that catch block. This child entry will have one of the tags DW_TAG_formal_parameter or DW_TAG_unspecified_parameters, and will have the same form as other parameter entries. The first sibling of each try block entry will be a catch block entry. 4. DATA OBJECT AND OBJECT LIST ENTRIESThis section presents the debugging information entries that describe individual data objects: variables, parameters and constants, and lists of those objects that may be grouped in a single declaration, such as a common block. 4.1 Data Object EntriesProgram variables, formal parameters and constants are represented by debugging information entries with the tags DW_TAG_variable, DW_TAG_formal_parameter and DW_TAG_constant, respectively. The tag DW_TAG_constant is used for languages that distinguish between variables that may have constant value and true named constants. The debugging information entry for a program variable, formal parameter or constant may have the following attributes:
4.2 Common Block EntriesA Fortran common block may be described by a debugging information entry with the tag DW_TAG_common_block. The common block entry has a DW_AT_name attribute whose value is a null-terminated string containing the common block name as it appears in the source program. It also has a DW_AT_location attribute whose value describes the location of the beginning of the common block. The common block entry owns debugging information entries describing the variables contained within the common block. 4.3 Imported Declaration EntriesSome languages support the concept of importing into a given module declarations made in a different module. An imported declaration is represented by a debugging information entry with the tag DW_TAG_imported_declaration. The entry for the imported declaration has a DW_AT_name attribute whose value is a null-terminated string containing the name of the entity whose declaration is being imported as it appears in the source program. The imported declaration entry also has a DW_AT_import attribute, whose value is a reference to the debugging information entry representing the declaration that is being imported. 4.4 Namelist EntriesAt least one language, Fortran90, has the concept of a namelist. A namelist is an ordered list of the names of some set of declared objects. The namelist object itself may be used as a replacement for the list of names in various contexts. A namelist is represented by a debugging information entry with the tag DW_TAG_namelist. If the namelist itself has a name, the namelist entry has a DW_AT_name attribute, whose value is a null-terminated string containing the namelist's name as it appears in the source program. Each name that is part of the namelist is represented by a debugging information entry with the tag DW_TAG_namelist_item. Each such entry is a child of the namelist entry, and all of the namelist item entries for a given namelist are ordered as were the list of names they correspond to in the source program. Each namelist item entry contains a DW_AT_namelist_item attribute whose value is a reference to the debugging information entry representing the declaration of the item whose name appears in the namelist. 5. TYPE ENTRIESThis section presents the debugging information entries that describe program types: base types, modified types and user-defined types. If the scope of the declaration of a named type begins sometime after the low pc value for the scope most closely enclosing the declaration, the declaration may have a DW_AT_start_scope attribute. The value of this attribute is the offset in bytes of the beginning of the scope for the declaration from the low pc value of the debugging information entry that defines its scope. 5.1 Base Type EntriesA base type is a data type that is not defined in terms of other data types. Each programming language has a set of base types that are considered to be built into that language. A base type is represented by a debugging information entry with the tag DW_TAG_base_type. A base type entry has a DW_AT_name attribute whose value is a null-terminated string describing the name of the base type as recognized by the programming language of the compilation unit containing the base type entry. A base type entry also has a DW_AT_encoding attribute describing how the base type is encoded and is to be interpreted. The value of this attribute is a constant. The set of values and their meanings for the DW_AT_encoding attribute is given in Figure 10. Figure 10. Encoding attribute values
All encodings assume the representation that is "normal" for the target architecture. A base type entry has a DW_AT_byte_size attribute, whose value is a constant, describing the size in bytes of the storage unit used to represent an object of the given type. If the value of an object of the given type does not fully occupy the storage unit described by the byte size attribute, the base type entry may have a DW_AT_bit_size attribute and a DW_AT_bit_offset attribute, both of whose values are constants. The bit size attribute describes the actual size in bits used to represent a value of the given type. The bit offset attribute describes the offset in bits of the high order bit of a value of the given type from the high order bit of the storage unit used to contain that value. For example, the C type int on a machine that uses 32-bit integers would be represented by a base type entry with a name attribute whose value was "int", an encoding attribute whose value was DW_ATE_signed and a byte size attribute whose value was 4. 5.2 Type Modifier EntriesA base or user-defined type may be modified in different ways in different languages. A type modifier is represented in DWARF by a debugging information entry with one of the tags given in Figure 11. Figure 11. Type modifier tags
Each of the type modifier entries has a DW_AT_type attribute, whose value is a reference to a debugging information entry describing a base type, a user-defined type or another type modifier. A modified type entry describing a pointer or reference type may have a DW_AT_address_class attribute to describe how objects having the given pointer or reference type ought to be dereferenced. When multiple type modifiers are chained together to modify a base or user-defined type, they are ordered as if part of a right-associative expression involving the base or user-defined type. As examples of how type modifiers are ordered, take the following C declarations: const char * volatile p; which represents a volatile pointer to a constant character. This is encoded in DWARF as: DW_TAG_volatile_type -> DW_TAG_pointer_type -> DW_TAG_const_type -> DW_TAG_base_type volatile char * const p; on the other hand, represents a constant pointer to a volatile character. This is encoded as: DW_TAG_const_type -> DW_TAG_pointer_type -> DW_TAG_volatile_type -> DW_TAG_base_type 5.3 Typedef EntriesAny arbitrary type named via a typedef is represented by a debugging information entry with the tag DW_TAG_typedef. The typedef entry has a DW_AT_name attribute whose value is a null- terminated string containing the name of the typedef as it appears in the source program. The typedef entry also contains a DW_AT_type attribute. If the debugging information entry for a typedef represents a declaration of the type that is not also a definition, it does not contain a type attribute. 5.4 Array Type EntriesMany languages share the concept of an "array," which is a table of components of identical type. An array type is represented by a debugging information entry with the tag DW_TAG_array_type. If a name has been given to the array type in the source program, then the corresponding array type entry has a DW_AT_name attribute whose value is a null-terminated string containing the array type name as it appears in the source program. The array type entry describing a multidimensional array may have a DW_AT_ordering attribute whose constant value is interpreted to mean either row-major or column-major ordering of array elements. The set of values and their meanings for the ordering attribute are listed in Figure 12. If no ordering attribute is present, the default ordering for the source language (which is indicated by the DW_AT_language attribute of the enclosing compilation unit entry) is assumed. The ordering attribute may optionally appear on one-dimensional arrays; it will be ignored. An array type entry has a DW_AT_type attribute describing the type of each element of the array. If the amount of storage allocated to hold each element of an object of the given array type is different from the amount of storage that is normally allocated to hold an individual object of the indicated element type, then the array type entry has a DW_AT_stride_size attribute, whose constant value represents the size in bits of each element of the array. If the size of the entire array can be determined statically at compile time, the array type entry may have a DW_AT_byte_size attribute, whose constant value represents the total size in bytes of an instance of the array type. Note that if the size of the array can be determined statically at compile time, this value can usually be computed by multiplying the number of array elements by the size of each element. Each array dimension is described by a debugging information entry with either the tag DW_TAG_subrange_type or the tag DW_TAG_enumeration_type. These entries are children of the array type entry and are ordered to reflect the appearance of the dimensions in the source program (i.e. leftmost dimension first, next to leftmost second, and so on). In languages, such as ANSI-C, in which there is no concept of a "multidimensional array", an array of arrays may be represented by a debugging information entry for a multidimensional array. 5.5 Structure, Union, and Class Type EntriesThe languages C, C++, and Pascal, among others, allow the programmer to define types that are collections of related components. In C and C++, these collections are called "structures." In Pascal, they are called "records." The components may be of different types. The components are called "members" in C and C++, and "fields" in Pascal. The components of these collections each exist in their own space in computer memory. The components of a C or C++ "union" all coexist in the same memory. Pascal and other languages have a "discriminated union," also called a "variant record." Here, selection of a number of alternative substructures ("variants") is based on the value of a component that is not part of any of those substructures (the "discriminant"). Among the languages discussed in this document, the "class" concept is unique to C++. A class is similar to a structure. A C++ class or structure may have "member functions" which are subroutines that are within the scope of a class or structure. 5.5.1 General Structure DescriptionStructure, union, and class types are represented by debugging information entries with the tags DW_TAG_structure_type, DW_TAG_union_type and DW_TAG_class_type, respectively. If a name has been given to the structure, union, or class in the source program, then the corresponding structure type, union type, or class type entry has a DW_AT_name attribute whose value is a null-terminated string containing the type name as it appears in the source program. If the size of an instance of the structure type, union type, or class type entry can be determined statically at compile time, the entry has a DW_AT_byte_size attribute whose constant value is the number of bytes required to hold an instance of the structure, union, or class, and any padding bytes. For C and C++, an incomplete structure, union or class type is represented by a structure, union or class entry that does not have a byte size attribute and that has a DW_AT_declaration attribute. The members of a structure, union, or class are represented by debugging information entries that are owned by the corresponding structure type, union type, or class type entry and appear in the same order as the corresponding declarations in the source program. Data member declarations occurring within the declaration of a structure, union or class type are considered to be "definitions" of those members, with the exception of C++ "static" data members, whose definitions appear outside of the declaration of the enclosing structure, union or class type. Function member declarations appearing within a structure, union or class type declaration are definitions only if the body of the function also appears within the type declaration. If the definition for a given member of the structure, union or class does not appear within the body of the declaration, that member also has a debugging information entry describing its definition. That entry will have a DW_AT_specification attribute referencing the debugging entry owned by the body of the structure, union or class debugging entry and representing a non-defining declaration of the data or function member. The referenced entry will not have information about the location of that member (low and high pc attributes for function members, location descriptions for data members) and will have a DW_AT_declaration attribute. 5.5.2 Derived Classes and StructuresThe class type or structure type entry that describes a derived class or structure owns debugging information entries describing each of the classes or structures it is derived from, ordered as they were in the source program. Each such entry has the tag DW_TAG_inheritance. An inheritance entry has a DW_AT_type attribute whose value is a reference to the debugging information entry describing the structure or class from which the parent structure or class of the inheritance entry is derived. It also has a DW_AT_data_member_location attribute, whose value is a location description describing the location of the beginning of the data members contributed to the entire class by this subobject relative to the beginning address of the data members of the entire class. An inheritance entry may have a DW_AT_accessibility attribute. If no accessibility attribute is present, private access is assumed. If the structure or class referenced by the inheritance entry serves as a virtual base class, the inheritance entry has a DW_AT_virtuality attribute. In C++, a derived class may contain access declarations that change the accessibility of individual class members from the overall accessibility specified by the inheritance declaration. A single access declaration may refer to a set of overloaded names. If a derived class or structure contains access declarations, each such declaration may be represented by a debugging information entry with the tag DW_TAG_access_declaration. Each such entry is a child of the structure or class type entry. An access declaration entry has a DW_AT_name attribute, whose value is a null-terminated string representing the name used in the declaration in the source program, including any class or structure qualifiers. An access declaration entry also has a DW_AT_accessibility attribute describing the declared accessibility of the named entities. 5.5.3 FriendsEach "friend" declared by a structure, union or class type may be represented by a debugging information entry that is a child of the structure, union or class type entry; the friend entry has the tag DW_TAG_friend. A friend entry has a DW_AT_friend attribute, whose value is a reference to the debugging information entry describing the declaration of the friend. 5.5.4 Structure Data Member EntriesA data member (as opposed to a member function) is represented by a debugging information entry with the tag DW_TAG_member. The member entry for a named member has a DW_AT_name attribute whose value is a null-terminated string containing the member name as it appears in the source program. If the member entry describes a C++ anonymous union, the name attribute is omitted or consists of a single zero byte. The structure data member entry has a DW_AT_type attribute to denote the type of that member. If the member entry is defined in the structure or class body, it has a DW_AT_data_member_location attribute whose value is a location description that describes the location of that member relative to the base address of the structure, union, or class that most closely encloses the corresponding member declaration. The addressing expression represented by the location description for a structure data member expects the base address of the structure data member to be on the expression stack before being evaluated. The location description for a data member of a union may be omitted, since all data members of a union begin at the same address. If the member entry describes a bit field, then that entry has the following attributes:
The location description for a bit field calculates the address of an anonymous object containing the bit field. The address is relative to the structure, union, or class that most closely encloses the bit field declaration. The number of bytes in this anonymous object is the value of the byte size attribute of the bit field. The offset (in bits) from the most significant bit of the anonymous object to the most significant bit of the bit field is the value of the bit offset attribute. For example, take one possible representation of the following structure definition in both big and little endian byte orders: struct S { int j:5; int k:6; int m:5; int n:8; }; In both cases, the location descriptions for the debugging information entries for j, k, m and n describe the address of the same 32-bit word that contains all three members. (In the big-endian case, the location description addresses the most significant byte, in the little-endian case, the least significant). The following diagram shows the structure layout and lists the bit offsets for each case. The offsets are from the most significant bit of the object addressed by the location description. Bit Offsets: j:0 k:5 m:11 n:16 Big-Endian j 0 31 k 26 m 20 n 15 pad 7 0 Bit Offsets: j:27 k:21 m:16 n:8 Little-Endian pad 31 n 23 m 15 k 10 j 0 4 0 5.5.5 Structure Member Function EntriesA member function is represented in the debugging information by a debugging information entry with the tag DW_TAG_subprogram. The member function entry may contain the same attributes and follows the same rules as non-member global subroutine entries (see section 3.3). If the member function entry describes a virtual function, then that entry has a DW_AT_virtuality attribute. An entry for a virtual function also has a DW_AT_vtable_elem_location attribute whose value contains a location description yielding the address of the slot for the function within the virtual function table for the enclosing class or structure. If a subroutine entry represents the defining declaration of a member function and that definition appears outside of the body of the enclosing class or structure declaration, the subroutine entry has a DW_AT_specification attribute, whose value is a reference to the debugging information entry representing the declaration of this function member. The referenced entry will be a child of some class or structure type entry. Subroutine entries containing the DW_AT_specification attribute do not need to duplicate information provided by the declaration entry referenced by the specification attribute. In particular, such entries do not need to contain attributes for the name or return type of the function member whose definition they represent. 5.5.6 Class Template InstantiationsIn C++ a class template is a generic definition of a class type that is instantiated differently when an instance of the class is declared or defined. The generic description of the class may include both parameterized types and parameterized constant values. DWARF does not represent the generic template definition, but does represent each instantiation. A class template instantiation is represented by a debugging information with the tag DW_TAG_class_type. With four exceptions, such an entry will contain the same attributes and have the same types of child entries as would an entry for a class type defined explicitly using the instantiation types and values. The exceptions are:
5.5.7 Variant EntriesA variant part of a structure is represented by a debugging information entry with the tag DW_TAG_variant_part and is owned by the corresponding structure type entry. If the variant part has a discriminant, the discriminant is represented by a separate debugging information entry which is a child of the variant part entry. This entry has the form of a structure data member entry. The variant part entry will have a DW_AT_discr attribute whose value is a reference to the member entry for the discriminant. If the variant part does not have a discriminant (tag field), the variant part entry has a DW_AT_type attribute to represent the tag type. Each variant of a particular variant part is represented by a debugging information entry with the tag DW_TAG_variant and is a child of the variant part entry. The value that selects a given variant may be represented in one of three ways. The variant entry may have a DW_AT_discr_value attribute whose value represents a single case label. The value of this attribute is encoded as an LEB128 number. The number is signed if the tag type for the variant part containing this variant is a signed type. The number is unsigned if the tag type is an unsigned type. Alternatively, the variant entry may contain a DW_AT_discr_list attribute, whose value represents a list of discriminant values. This list is represented by any of the block forms and may contain a mixture of case labels and label ranges. Each item on the list is prefixed with a discriminant value descriptor that determines whether the list item represents a single label or a label range. A single case label is represented as an LEB128 number as defined above for the DW_AT_discr_value attribute. A label range is represented by two LEB128 numbers, the low value of the range followed by the high value. Both values follow the rules for signedness just described. The discriminant value descriptor is a constant that may have one of the values given in Figure 13. If a variant entry has neither a DW_AT_discr_value attribute nor a DW_AT_discr_list attribute, or if it has a DW_AT_discr_list attribute with 0 size, the variant is a default variant. The components selected by a particular variant are represented by debugging information entries owned by the corresponding variant entry and appear in the same order as the corresponding declarations in the source program. 5.6 Enumeration Type EntriesAn "enumeration type" is a scalar that can assume one of a fixed number of symbolic values. An enumeration type is represented by a debugging information entry with the tag DW_TAG_enumeration_type. If a name has been given to the enumeration type in the source program, then the corresponding enumeration type entry has a DW_AT_name attribute whose value is a null-terminated string containing the enumeration type name as it appears in the source program. These entries also have a DW_AT_byte_size attribute whose constant value is the number of bytes required to hold an instance of the enumeration. Each enumeration literal is represented by a debugging information entry with the tag DW_TAG_enumerator. Each such entry is a child of the enumeration type entry, and the enumerator entries appear in the same order as the declarations of the enumeration literals in the source program. Each enumerator entry has a DW_AT_name attribute, whose value is a null-terminated string containing the name of the enumeration literal as it appears in the source program. Each enumerator entry also has a DW_AT_const_value attribute, whose value is the actual numeric value of the enumerator as represented on the target system. 5.7 Subroutine Type EntriesIt is possible in C to declare pointers to subroutines that return a value of a specific type. In both ANSI C and C++, it is possible to declare pointers to subroutines that not only return a value of a specific type, but accept only arguments of specific types. The type of such pointers would be described with a "pointer to" modifier applied to a user-defined type. A subroutine type is represented by a debugging information entry with the tag DW_TAG_subroutine_type. If a name has been given to the subroutine type in the source program, then the corresponding subroutine type entry has a DW_AT_name attribute whose value is a null-terminated string containing the subroutine type name as it appears in the source program. If the subroutine type describes a function that returns a value, then the subroutine type entry has a DW_AT_type attribute to denote the type returned by the subroutine. If the types of the arguments are necessary to describe the subroutine type, then the corresponding subroutine type entry owns debugging information entries that describe the arguments. These debugging information entries appear in the order that the corresponding argument types appear in the source program. In ANSI-C there is a difference between the types of functions declared using function prototype style declarations and those declared using non-prototype declarations. A subroutine entry declared with a function prototype style declaration may have a DW_AT_prototyped attribute, whose value is a flag. Each debugging information entry owned by a subroutine type entry has a tag whose value has one of two possible interpretations.
5.8 String Type EntriesA "string" is a sequence of characters that have specific semantics and operations that separate them from arrays of characters. Fortran is one of the languages that has a string type. A string type is represented by a debugging information entry with the tag DW_TAG_string_type. If a name has been given to the string type in the source program, then the corresponding string type entry has a DW_AT_name attribute whose value is a null- terminated string containing the string type name as it appears in the source program. The string type entry may have a DW_AT_string_length attribute whose value is a location description yielding the location where the length of the string is stored in the program. The string type entry may also have a DW_AT_byte_size attribute, whose constant value is the size in bytes of the data to be retrieved from the location referenced by the string length attribute. If no byte size attribute is present, the size of the data to be retrieved is the same as the size of an address on the target machine. If no string length attribute is present, the string type entry may have a DW_AT_byte_size attribute, whose constant value is the length in bytes of the string. 5.9 Set EntriesPascal provides the concept of a "set," which represents a group of values of ordinal type. A set is represented by a debugging information entry with the tag DW_TAG_set_type. If a name has been given to the set type, then the set type entry has a DW_AT_name attribute whose value is a null-terminated string containing the set type name as it appears in the source program. The set type entry has a DW_AT_type attribute to denote the type of an element of the set. If the amount of storage allocated to hold each element of an object of the given set type is different from the amount of storage that is normally allocated to hold an individual object of the indicated element type, then the set type entry has a DW_AT_byte_size attribute, whose constant value represents the size in bytes of an instance of the set type. 5.10 Subrange Type EntriesSeveral languages support the concept of a "subrange" type object. These objects can represent a subset of the values that an object of the basis type for the subrange can represent. Subrange type entries may also be used to represent the bounds of array dimensions. A subrange type is represented by a debugging information entry with the tag DW_TAG_subrange_type. If a name has been given to the subrange type, then the subrange type entry has a DW_AT_name attribute whose value is a null-terminated string containing the subrange type name as it appears in the source program. The subrange entry may have a DW_AT_type attribute to describe the type of object of whose values this subrange is a subset. If the amount of storage allocated to hold each element of an object of the given subrange type is different from the amount of storage that is normally allocated to hold an individual object of the indicated element type, then the subrange type entry has a DW_AT_byte_size attribute, whose constant value represents the size in bytes of each element of the subrange type. The subrange entry may have the attributes DW_AT_lower_bound and DW_AT_upper_bound to describe, respectively, the lower and upper bound values of the subrange. The DW_AT_upper_bound attribute may be replaced by a DW_AT_count attribute, whose value describes the number of elements in the subrange rather than the value of the last element. If a bound or count value is described by a constant not represented in the program's address space and can be represented by one of the constant attribute forms, then the value of the lower or upper bound or count attribute may be one of the constant types. Otherwise, the value of the lower or upper bound or count attribute is a reference to a debugging information entry describing an object containing the bound value or itself describing a constant value. If either the lower or upper bound or count values are missing, the bound value is assumed to be a language-dependent default constant. The default lower bound value for C or C++ is 0. For Fortran, it is 1. No other default values are currently defined by DWARF. If the subrange entry has no type attribute describing the basis type, the basis type is assumed to be the same as the object described by the lower bound attribute (if it references an object). If there is no lower bound attribute, or it does not reference an object, the basis type is the type of the upper bound or count attribute (if it references an object). If there is no upper bound or count attribute or it does not reference an object, the type is assumed to be the same type, in the source language of the compilation unit containing the subrange entry, as a signed integer with the same size as an address on the target machine. 5.11 Pointer to Member Type EntriesIn C++, a pointer to a data or function member of a class or structure is a unique type. A debugging information entry representing the type of an object that is a pointer to a structure or class member has the tag DW_TAG_ptr_to_member_type. If the pointer to member type has a name, the pointer to member entry has a DW_AT_name attribute, whose value is a null-terminated string containing the type name as it appears in the source program. The pointer to member entry has a DW_AT_type attribute to describe the type of the class or structure member to which objects of this type may point. The pointer to member entry also has a DW_AT_containing_type attribute, whose value is a reference to a debugging information entry for the class or structure to whose members objects of this type may point. Finally, the pointer to member entry has a DW_AT_use_location attribute whose value is a location description that computes the address of the member of the class or structure to which the pointer to member type entry can point. The method used to find the address of a given member of a class or structure is common to any instance of that class or structure and to any instance of the pointer or member type. The method is thus associated with the type entry, rather than with each instance of the type. The DW_AT_use_location expression, however, cannot be used on its own, but must be used in conjunction with the location expressions for a particular object of the given pointer to member type and for a particular structure or class instance. The DW_AT_use_location attribute expects two values to be pushed onto the location expression stack before the DW_AT_use_location expression is evaluated. The first value pushed should be the value of the pointer to member object itself. The second value pushed should be the base address of the entire structure or union instance containing the member whose address is being calculated. So, for an expression like object.*mbr_ptr, where mbr_ptr has some
pointer to member type, a debugger should: 5.12 File Type EntriesSome languages, such as Pascal, provide a first class data type to represent files. A file type is represented by a debugging information entry with the tag DW_TAG_file_type. If the file type has a name, the file type entry has a DW_AT_name attribute, whose value is a null-terminated string containing the type name as it appears in the source program. The file type entry has a DW_AT_type attribute describing the type of the objects contained in the file. The file type entry also has a DW_AT_byte_size attribute, whose value is a constant representing the size in bytes of an instance of this file type. 6. OTHER DEBUGGING INFORMATIONThis section describes debugging information that is not represented in the form of debugging information entries and is not contained within the .debug_info section. 6.1 Accelerated AccessA debugger frequently needs to find the debugging information for a program object defined outside of the compilation unit where the debugged program is currently stopped. Sometimes it will know only the name of the object; sometimes only the address. To find the debugging information associated with a global object by name, using the DWARF debugging information entries alone, a debugger would need to run through all entries at the highest scope within each compilation unit. For lookup by address, for a subroutine, a debugger can use the low and high pc attributes of the compilation unit entries to quickly narrow down the search, but these attributes only cover the range of addresses for the text associated with a compilation unit entry. To find the debugging information associated with a data object, an exhaustive search would be needed. Furthermore, any search through debugging information entries for different compilation units within a large program would potentially require the access of many memory pages, probably hurting debugger performance. To make lookups of program objects by name or by address faster, a producer of DWARF information may provide two different types of tables containing information about the debugging information entries owned by a particular compilation unit entry in a more condensed format. 6.1.1 Lookup by NameFor lookup by name, a table is maintained in a separate object file section called .debug_pubnames. The table consists of sets of variable length entries, each set describing the names of global objects whose definitions or declarations are represented by debugging information entries owned by a single compilation unit. Each set begins with a header containing four values: the total length of the entries for that set, not including the length field itself, a version number, the offset from the beginning of the .debug_info section of the compilation unit entry referenced by the set and the size in bytes of the contents of the .debug_info section generated to represent that compilation unit. This header is followed by a variable number of offset/name pairs. Each pair consists of the offset from the beginning of the compilation unit entry corresponding to the current set to the debugging information entry for the given object, followed by a null-terminated character string representing the name of the object as given by the DW_AT_name attribute of the referenced debugging entry. Each set of names is terminated by zero. In the case of the name of a static data member or function member of a C++ structure, class or union, the name presented in the .debug_pubnames section is not the simple name given by the DW_AT_name attribute of the referenced debugging entry, but rather the fully class qualified name of the data or function member. 6.1.2 Lookup by AddressFor lookup by address, a table is maintained in a separate object file section called .debug_aranges. The table consists of sets of variable length entries, each set describing the portion of the program's address space that is covered by a single compilation unit. Each set begins with a header containing five values:
This header is followed by a variable number of address range descriptors. Each descriptor is a pair consisting of the beginning address of a range of text or data covered by some entry owned by the corresponding compilation unit entry, followed by the length of that range. A particular set is terminated by an entry consisting of two zeroes. By scanning the table, a debugger can quickly decide which compilation unit to look in to find the debugging information for an object that has a given address. 6.2 Line Number InformationA source-level debugger will need to know how to associate statements in the source files with the corresponding machine instruction addresses in the executable object or the shared objects used by that executable object. Such an association would make it possible for the debugger user to specify machine instruction addresses in terms of source statements. This would be done by specifying the line number and the source file containing the statement. The debugger can also use this information to display locations in terms of the source files and to single step from statement to statement. As mentioned in section 3.1, above, the line number information generated for a compilation unit is represented in the .debug_line section of an object file and is referenced by a corresponding compilation unit debugging information entry in the .debug_info section. If space were not a consideration, the information provided in the .debug_line
section could be represented as a large matrix, with one row for each instruction in the
emitted object code. The matrix would have columns for: Such a matrix, however, would be impractically large. We shrink it with two techniques. First, we delete from the matrix each row whose file, line and source column information is identical with that of its predecessors. Second, we design a byte-coded language for a state machine and store a stream of bytes in the object file instead of the matrix. This language can be much more compact than the matrix. When a consumer of the statement information executes, it must "run" the state machine to generate the matrix for each compilation unit it is interested in. The concept of an encoded matrix also leaves room for expansion. In the future, columns can be added to the matrix to encode other things that are related to individual instruction addresses. 6.2.1 DefinitionsThe following terms are used in the description of the line number information format: state machine The hypothetical machine used by a consumer of the line number information to expand the byte-coded instruction stream into a matrix of line number information.
6.2.2 State Machine RegistersThe statement information state machine has the following registers:
At the beginning of each sequence within a statement program, the state of the registers is: address 0 file 1 line 1 column 0 is_stmt determined by default_is_stmt in the statement program prologue basic_block "false" end_sequence "false" 6.2.3 Statement Program InstructionsThe state machine instructions in a statement program belong to one of three categories:
6.2.4 The Statement Program PrologueThe optimal encoding of line number information depends to a certain degree upon the architecture of the target machine. The statement program prologue provides information used by consumers in decoding the statement program instructions for a particular compilation unit and also provides information used throughout the rest of the statement program. The statement program for each compilation unit begins with a prologue containing the following fields in order:
6.2.5 The Statement ProgramAs stated before, the goal of a statement program is to build a matrix representing one compilation unit, which may have produced multiple sequences of target-machine instructions. Within a sequence, addresses may only increase. (Line numbers may decrease in cases of pipeline scheduling.) 6.2.5.1 Special OpcodesEach 1-byte special opcode has the following effect on the state machine:
All of the special opcodes do those same four things; they differ from one another only in what values they add to the line and address registers. Instead of assigning a fixed meaning to each special opcode, the statement program uses several parameters in the prologue to configure the instruction set. There are two reasons for this. First, although the opcode space available for special opcodes now ranges from 10 through 255, the lower bound may increase if one adds new standard opcodes. Thus, the opcode_base field of the statement program prologue gives the value of the first special opcode. Second, the best choice of special-opcode meanings depends on the target architecture. For example, for a RISC machine where the compiler-generated code interleaves instructions from different lines to schedule the pipeline, it is important to be able to add a negative value to the line register to express the fact that a later instruction may have been emitted for an earlier source line. For a machine where pipeline scheduling never occurs, it is advantageous to trade away the ability to decrease the line register (a standard opcode provides an alternate way to decrease the line number) in return for the ability to add larger positive values to the address register. To permit this variety of strategies, the statement program prologue defines a line_base field that specifies the minimum value which a special opcode can add to the line register and a line_range field that defines the range of values it can add to the line register. A special opcode value is chosen based on the amount that needs to be added to the line and address registers. The maximum line increment for a special opcode is the value of the line_base field in the prologue, plus the value of the line_range field, minus 1 (line base + line range - 1). If the desired line increment is greater than the maximum line increment, a standard opcode must be used instead of a special opcode. The "address advance" is calculated by dividing the desired address increment by the minimum_instruction_length field from the prologue. The special opcode is then calculated using the following formula: opcode = (desired line increment - line_base) + (line_range * address advance) + opcode_base If the resulting opcode is greater than 255, a standard opcode must be used instead. To decode a special opcode, subtract the opcode_base from the opcode itself. The amount to increment the address register is the adjusted opcode divided by the line_range. The amount to increment the line register is the line_base plus the result of the adjusted opcode modulo the line_range. That is, line increment = line_base + (adjusted opcode % line_range) As an example, suppose that the opcode_base is 16, line_base is -1 and line_range is 4. This means that we can use a special opcode whenever two successive rows in the matrix have source line numbers differing by any value within the range [-1, 2] (and, because of the limited number of opcodes available, when the difference between addresses is within the range [0, 59]). The opcode mapping would be: _______________________________________________________ Opcode Line advance Address advance _______________________________________________________ 16 -1 0 17 0 0 18 1 0 19 2 0 20 -1 1 21 0 1 22 1 1 23 2 1 253 0 59 254 1 59 255 2 59 ________________________________________________________ There is no requirement that the expression 255 - line_base + 1 be an integral multiple of line_range. 6.2.5.2 Standard OpcodesThere are currently 9 standard ubyte opcodes. In the future additional ubyte opcodes may be defined by setting the opcode_base field in the statement program prologue to a value greater than 10.
6.2.5.3 Extended OpcodesThere are three extended opcodes currently defined. The first byte following the length field of the encoding for each contains a sub-opcode.
6.4.1 Structure of Call Frame InformationDWARF supports virtual unwinding by defining an architecture independent basis for recording how procedures save and restore registers throughout their lifetimes. This basis must be augmented on some machines with specific information that is defined by either an architecture specific ABI authoring committee, a hardware vendor, or a compiler producer. The body defining a specific augmentation is referred to below as the ``augmenter.'' Abstractly, this mechanism describes a very large table that has the following structure: LOC CFA R0 R1 ... RN L0 L1 ... LN The first column indicates an address for every location that contains code in a program. (In shared objects, this is an object-relative offset.) The remaining columns contain virtual unwinding rules that are associated with the indicated location. The first column of the rules defines the CFA rule which is a register and a signed offset that are added together to compute the CFA value. The remaining columns are labeled by register number. This includes some registers that have special designation on some architectures such as the PC and the stack pointer register. (The actual mapping of registers for a particular architecture is performed by the augmenter.) The register columns contain rules that describe whether a given register has been saved and the rule to find the value for the register in the previous frame. The register rules are: This table would be extremely large if actually constructed as described. Most of the entries at any point in the table are identical to the ones above them. The whole table can be represented quite compactly by recording just the differences starting at the beginning address of each subroutine in the program. The virtual unwind information is encoded in a self-contained section called .debug_frame. Entries in a .debug_frame section are aligned on an addressing unit boundary and come in two forms: A Common Information Entry (CIE) and a Frame Description Entry (FDE). Sizes of data objects used in the encoding of the .debug_frame section are described in terms of the same data definitions used for the line number information (see section 6.2.1). A Common Information Entry holds information that is shared among many Frame Descriptors. There is at least one CIE in every non-empty .debug_frame section. A CIE contains the following fields, in order: An FDE contains the following fields, in order: 6.4.2 Call Frame InstructionsEach call frame instruction is defined to take 0 or more operands. Some of the operands may be encoded as part of the opcode (see section 7.23). The instructions are as follows: 6.4.3 Call Frame Instruction UsageTo determine the virtual unwind rule set for a given location (L1), one searches
through the FDE headers looking at the initial_location and address_range
values to see if L1 is contained in the FDE. If so, then: The rules in the register set now apply to location L1. For an example, see Appendix 5. 7. DATA REPRESENTATIONThis section describes the binary representation of the debugging information entry itself, of the attribute types and of other fundamental elements described above. 7.1 Vendor ExtensibilityTo reserve a portion of the DWARF name space and ranges of enumeration values for use for vendor specific extensions, special labels are reserved for tag names, attribute names, base type encodings, location operations, language names, calling conventions and call frame instructions. The labels denoting the beginning and end of the reserved value range for vendor specific extensions consist of the appropriate prefix ( DW_TAG, DW_AT, DW_ATE, DW_OP, DW_LANG, or DW_CFA respectively) followed by _lo_user or _hi_user. For example, for entry tags, the special labels are DW_TAG_lo_user and DW_TAG_hi_user. Values in the range between prefix_lo_user and prefix_hi_user inclusive, are reserved for vendor specific extensions. Vendors may use values in this range without conflicting with current or future system-defined values. All other values are reserved for use by the system. Vendor defined tags, attributes, base type encodings, location atoms, language names, calling conventions and call frame instructions, conventionally use the form prefix_vendor_id_name, where vendor_id is some identifying character sequence chosen so as to avoid conflicts with other vendors. To ensure that extensions added by one vendor may be safely ignored by consumers that do not understand those extensions, the following rules should be followed: 7.2 Reserved Error ValuesAs a convenience for consumers of DWARF information, the value 0 is reserved in the encodings for attribute names, attribute forms, base type encodings, location operations, languages, statement program opcodes, macro information entries and tag names to represent an error condition or unknown value. DWARF does not specify names for these reserved values, since they do not represent valid encodings for the given type and should not appear in DWARF debugging information. 7.3 Executable Objects and Shared ObjectsThe relocated addresses in the debugging information for an executable object are virtual addresses and the relocated addresses in the debugging information for a shared object are offsets relative to the start of the lowest segment used by that shared object. This requirement makes the debugging information for shared objects position independent. Virtual addresses in a shared object may be calculated by adding the offset to the base address at which the object was attached. This offset is available in the run-time linker's data structures. 7.4 File ConstraintsAll debugging information entries in a relocatable object file, executable object or shared object are required to be physically contiguous. 7.5 Format of Debugging InformationFor each compilation unit compiled with a DWARF Version 2 producer, a contribution is made to the .debug_info section of the object file. Each such contribution consists of a compilation unit header followed by a series of debugging information entries. Unlike the information encoding for DWARF Version 1, Version 2 debugging information entries do not themselves contain the debugging information entry tag or the attribute name and form encodings for each attribute. Instead, each debugging information entry begins with a code that represents an entry in a separate abbreviations table. This code is followed directly by a series of attribute values. The appropriate entry in the abbreviations table guides the interpretation of the information contained directly in the .debug_info section. Each compilation unit is associated with a particular abbreviation table, but multiple compilation units may share the same table. This encoding was based on the observation that typical DWARF producers produce a very limited number of different types of debugging information entries. By extracting the common information from those entries into a separate table, we are able to compress the generated information. 7.5.1 Compilation Unit HeaderThe header for the series of debugging information entries contributed by a single compilation unit consists of the following information: The compilation unit header does not replace the DW_TAG_compile_unit debugging information entry. It is additional information that is represented outside the standard DWARF tag/attributes format. 7.5.2 Debugging Information EntryEach debugging information entry begins with an unsigned LEB128 number containing the abbreviation code for the entry. This code represents an entry within the abbreviation table associated with the compilation unit containing this entry. The abbreviation code is followed by a series of attribute values. On some architectures, there are alignment constraints on section boundaries. To make it easier to pad debugging information sections to satisfy such constraints, the abbreviation code 0 is reserved. Debugging information entries consisting of only the 0 abbreviation code are considered null entries. 7.5.3 Abbreviation TablesThe abbreviation tables for all compilation units are contained in a separate object file section called .debug_abbrev. As mentioned before, multiple compilation units may share the same abbreviation table. The abbreviation table for a single compilation unit consists of a series of abbreviation declarations. Each declaration specifies the tag and attributes for a particular form of debugging information entry. Each declaration begins with an unsigned LEB128 number representing the abbreviation code itself. It is this code that appears at the beginning of a debugging information entry in the .debug_info section. As described above, the abbreviation code 0 is reserved for null debugging information entries. The abbreviation code is followed by another unsigned LEB128 number that encodes the entry's tag. The encodings for the tag names are given in Figures 14 and 15. Following the tag encoding is a 1-byte value that determines whether a debugging information entry using this abbreviation has child entries or not. If the value is DW_CHILDREN_yes, the next physically succeeding entry of any debugging information entry using this abbreviation is the first child of the prior entry. If the 1-byte value following the abbreviation's tag encoding is DW_CHILDREN_no, the next physically succeeding entry of any debugging information entry using this abbreviation is a sibling of the prior entry. (Either the first child or sibling entries may be null entries). The encodings for the child determination byte are given in Figure 16. (As mentioned in section 2.3, each chain of sibling entries is terminated by a null entry). Finally, the child encoding is followed by a series of attribute specifications. Each attribute specification consists of two parts. The first part is an unsigned LEB128 number representing the attribute's name. The second part is an unsigned LEB128 number representing the attribute's form. The series of attribute specifications ends with an entry containing 0 for the name and 0 for the form. The attribute form DW_FORM_indirect is a special case. For attributes with this form, the attribute value itself in the .debug_info section begins with an unsigned LEB128 number that represents its form. This allows producers to choose forms for particular attributes dynamically, without having to add a new entry to the abbreviation table. The abbreviations for a given compilation unit end with an entry consisting of a 0 byte for the abbreviation code. See Appendix 2 for a depiction of the organization of the debugging information. 7.5.4 Attribute EncodingsThe encodings for the attribute names are given in Figures 17 and 18. The attribute form governs how the value of the attribute is encoded. The possible forms may belong to one of the following form classes: Figure 14 & 15. Tag encodings (part 1 & part 2)Figure 16. Child determination encodings
Figure 17 & 18. Attribute encodings, part 1 & part 2
Figure 19. Attribute form encodings
7.6 Variable Length DataThe special constant data forms DW_FORM_sdata and DW_FORM_udata are encoded using "Little Endian Base 128" (LEB128) numbers. LEB128 is a scheme for encoding integers densely that exploits the assumption that most integers are small in magnitude. (This encoding is equally suitable whether the target machine architecture represents data in big-endian or little- endian order. It is "little endian" only in the sense that it avoids using space to represent the "big" end of an unsigned integer, when the big end is all zeroes or sign extension bits). DW_FORM_udata (unsigned LEB128) numbers are encoded as follows: start at the low order end of an unsigned integer and chop it into 7-bit chunks. Place each chunk into the low order 7 bits of a byte. Typically, several of the high order bytes will be zero; discard them. Emit the remaining bytes in a stream, starting with the low order byte; set the high order bit on each byte except the last emitted byte. The high bit of zero on the last byte indicates to the decoder that it has encountered the last byte. The integer zero is a special case, consisting of a single zero byte. Figure 20 gives some examples of DW_FORM_udata numbers. The 0x80 in each case is the high order bit of the byte, indicating that an additional byte follows: The encoding for DW_FORM_sdata (signed, 2s complement LEB128) numbers is similar, except that the criterion for discarding high order bytes is not whether they are zero, but whether they consist entirely of sign extension bits. Consider the 32-bit integer -2. The three high level bytes of the number are sign extension, thus LEB128 would represent it as a single byte containing the low order 7 bits, with the high order bit cleared to indicate the end of the byte stream. Note that there is nothing within the LEB128 representation that indicates whether an encoded number is signed or unsigned. The decoder must know what type of number to expect. Figure 20. Examples of unsigned LEB128 encodings
Figure 21. Examples of signed LEB128 encodings
Appendix 4 gives algorithms for encoding and decoding these forms. 7.7 Location Descriptions7.7.1 Location ExpressionsA location expression is stored in a block of contiguous bytes. The bytes form a set of operations. Each location operation has a 1-byte code that identifies that operation. Operations can be followed by one or more bytes of additional data. All operations in a location expression are concatenated from left to right. The encodings for the operations in a location expression are described in Figures 22 and 23. Figure 22 & 23. Location operation encodings, part 1 & 2
7.7.2 Location ListsEach entry in a location list consists of two relative addresses followed by a 2-byte length, followed by a block of contiguous bytes. The length specifies the number of bytes in the block that follows. The two addresses are the same size as used by DW_FORM_addr on the target machine. 7.8 Base Type EncodingsThe values of the constants used in the DW_AT_encoding attribute are given in Figure 24. Figure 24. Base type encoding values
7.9 Accessibility CodesThe encodings of the constants used in the DW_AT_accessibility attribute are given in Figure 25. Figure 25. Accessibility encodings
7.10 Visibility CodesThe encodings of the constants used in the DW_AT_visibility attribute are given in Figure 26. Figure 26. Visibility encodings
7.11 Virtuality CodesThe encodings of the constants used in the DW_AT_virtuality attribute are given in Figure 27. Figure 27. Virtuality encodings
7.12 Source LanguagesThe encodings for source languages are given in Figure 28. Names marked with + and their associated values are reserved, but the languages they represent are not supported in DWARF Version 2. Figure 28. Language encodings
7.13 Address Class EncodingsThe value of the common address class encoding DW_ADDR_none is 0. 7.14 Identifier CaseThe encodings of the constants used in the DW_AT_identifier_case attribute are given in Figure 29. Figure 29. Identifier case encodings
7.15 Calling Convention EncodingsThe encodings for the values of the DW_AT_calling_convention attribute are given in Figure 30. Figure 30. Calling convention encodings
7.16 Inline CodesThe encodings of the constants used in the DW_AT_inline attribute are given in Figure 31. Figure 31. Inline encodings
7.17 Array OrderingThe encodings for the values of the order attributes of arrays is given in Figure 32. Figure 32. Ordering encodings
7.18 Discriminant ListsThe descriptors used in the DW_AT_dicsr_list attribute are encoded as 1-byte constants. The defined values are presented in Figure 33. Figure 33. Discriminant descriptor encodings
7.19 Name Lookup TableEach set of entries in the table of global names contained in the .debug_pubnames section begins with a header consisting of: a 4-byte length containing the length of the set of entries for this compilation unit, not including the length field itself; a 2-byte version identifier containing the value 2 for DWARF Version 2; a 4-byte offset into the .debug_info section; and a 4-byte length containing the size in bytes of the contents of the .debug_info section generated to represent this compilation unit. This header is followed by a series of tuples. Each tuple consists of a 4-byte offset followed by a string of non-null bytes terminated by one null byte. Each set is terminated by a 4-byte word containing the value 0. 7.20 Address Range TableEach set of entries in the table of address ranges contained in the .debug_aranges section begins with a header consisting of: a 4-byte length containing the length of the set of entries for this compilation unit, not including the length field itself; a 2-byte version identifier containing the value 2 for DWARF Version 2; a 4-byte offset into the .debug_info section; a 1-byte unsigned integer containing the size in bytes of an address (or the offset portion of an address for segmented addressing) on the target system; and a 1-byte unsigned integer containing the size in bytes of a segment descriptor on the target system. This header is followed by a series of tuples. Each tuple consists of an address and a length, each in the size appropriate for an address on the target architecture. The first tuple following the header in each set begins at an offset that is a multiple of the size of a single tuple (that is, twice the size of an address). The header is padded, if necessary, to the appropriate boundary. Each set of tuples is terminated by a 0 for the address and 0 for the length. 7.21 Line Number InformationThe sizes of the integers used in the line number and call frame information sections are as follows:
The version number in the statement program prologue is 2 for DWARF Version 2. The boolean values "true" and "false" used by the statement information program are encoded as a single byte containing the value 0 for "false," and a non-zero value for "true." The encodings for the pre-defined standard opcodes are given in Figure 34. Figure 34. Standard Opcode Encodings
The encodings for the pre-defined extended opcodes are given in Figure 35. Figure 35. Extended Opcode Encodings
7.22 Macro InformationThe source line numbers and source file indices encoded in the macro information section are represented as unsigned LEB128 numbers as are the constants in an DW_MACINFO_vendor_ext entry. The macinfo type is encoded as a single byte. The encodings are given in Figure 36. Figure 36. Macinfo Type Encodings
7.23 Call Frame InformationThe value of the CIE id in the CIE header is 0xffffffff. The initial value of the CIE version number is 1. Call frame instructions are encoded in one or more bytes. The primary opcode is encoded in the high order two bits of the first byte (that is, opcode = byte >> 6). An operand or extended opcode may be encoded in the low order 6 bits. Additional operands are encoded in subsequent bytes. The instructions and their encodings are presented in Figure 37. Figure 37. Call frame instruction encodings
7.24 DependenciesThe debugging information in this format is intended to exist in the .debug_abbrev, .debug_aranges, .debug_frame, .debug_info, .debug_line, .debug_loc, .debug_macinfo, .debug_pubnames, and .debug_str sections of an object file. The information is not word-aligned, so the assembler must provide a way for the compiler to produce 2-byte and 4-byte quantities without alignment restrictions, and the linker must be able to relocate a 4-byte reference at an arbitrary alignment. In target architectures with 64-bit addresses, the assembler and linker must similarly handle 8-byte references at arbitrary alignments. 8. FUTURE DIRECTIONSThe UNIX International Programming Languages SIG is working on a specification for a set of interfaces for reading DWARF information, that will hide changes in the representation of that information from its consumers. It is hoped that using these interfaces will make the transition from DWARF Version 1 to Version 2 much simpler and will make it easier for a single consumer to support objects using either Version 1 or Version 2 DWARF. A draft of this specification is available for review from UNIX International. The Programming Languages SIG wishes to stress, however, that the specification is still in flux. Appendix 1 -- Current Attributes by Tag ValueThe list below enumerates the attributes that are most applicable to each type of debugging information entry. DWARF does not in general require that a given debugging information entry contain a particular attribute or set of attributes. Instead, a DWARF producer is free to generate any, all, or none of the attributes described in the text as being applicable to a given entry. Other attributes (both those defined within this document but not explicitly associated with the entry in question, and new, vendor-defined ones) may also appear in a given debugging entry. Therefore, the list may be taken as instructive, but cannot be considered definitive. In the following table, DECL means DW_AT_decl_column, DW_AT_decl_file, DW_AT_decl_line. Appendix 2 -- Organization of Debugging InformationThe following diagram depicts the relationship of the abbreviation tables contained in the .debug_abbrev section to the information contained in the .debug_info section. Values are given in symbolic form, where possible. Abbreviation Table - .debug_abbrevCompilation Unit 1 - .debug_info
Compilation Unit 2 - .debug_info
Appendix 3 -- Statement Program ExamplesConsider this simple source file and the resulting machine code for the Intel 8086 processor: 1: int 2: main() 0x239: push pb 0x23a: mov bp,sp 3: { 4: printf("Omit needless words\n"); 0x23c: mov ax,0xaa 0x23f: push ax 0x240: call _printf 0x243: pop cx 5: exit(0); 0x244: xor ax,ax 0x246: push ax 0x247: call _exit 0x24a: pop cx 6: } 0x24b: pop bp 0x24c: ret 7: 0x24d: If the statement program prologue specifies the following: minimum_instruction_length 1 opcode_base 10 line_base 1 line_range 15 Then one encoding of the statement program would occupy 12 bytes (the opcode SPECIAL(m,n) indicates the special opcode generated for a line increment of m and an address increment of n):
An alternate encoding of the same program using standard opcodes to advance the program counter would occupy 22 bytes:
Appendix 4 -- Encoding and decoding variable length dataHere are algorithms expressed in a C-like pseudo-code to encode and decode signed and unsigned numbers in LEB128: Encode an unsigned integer:do { byte = low order 7 bits of value; value >>= 7; if (value != 0) /* more bytes to come */ set high order bit of byte; emit byte; } while (value != 0); Encode a signed integer:more = 1; negative = (value <0); size="no." of bits in signed integer; while(more) { byte="low" order 7 bits of value; value>>= 7; /* the following is unnecessary if the implementation of >>= * uses an arithmetic rather than logical shift for a signed * left operand */ if (negative) /* sign extend */ value |= - (1 << (size 7)); /* sign bit of byte is 2nd high order bit (0x40) */ if ((value="=" 0 && sign bit of byte is clear) || (value="=" 1 && sign bit of byte is set)) more="0;" else set high order bit of byte; emit byte; } Decode unsigned LEB128 number:result = 0; shift = 0; while(true) { byte = next byte in input; result |= (low order 7 bits of byte << shift); if (high order bit of byte="=" 0) break; shift +="7;" } Decode signed LEB128 number:result = 0; shift = 0; size = no. of bits in signed integer; while(true) { byte = next byte in input; result |= (low order 7 bits of byte << shift); shift +="7;" /* sign bit of byte is 2nd high order bit (0x40) */ if (high order bit of byte="=" 0) break; } if ((shift < size) && (sign bit of byte is set)) /* sign extend */ result |="-" (1 << shift); Appendix 5 -- Call Frame Information ExamplesThe following example uses a hypothetical RISC machine in the style of the Motorola 88000. The following are two code fragments from a subroutine called foo that uses a frame pointer (in addition to the stack pointer.) The first column values are byte addresses. ;; start prologue foo sub R7, R7, The table for the foo subroutine is as follows. It is followed by the corresponding fragments from the .debug_frame section. Loc CFA R0 R1 R2 R3 R4 R5 R6 R7 R8 foo [R7]+0 s u u u s s s s r1 foo+4 [R7]+fsize s u u u s s s s r1 foo+8 [R7]+fsize s u u u s s s s c4 foo+12 [R7]+fsize s u u u s s c8 s c4 foo+16 [R6]+fsize s u u u s s c8 s c4 foo+20 [R6]+fsize s u u u c12 s c8 s c4 foo+64 [R6]+fsize s u u u c12 s c8 s c4 foo+68 [R6]+fsize s u u u s s c8 s c4 foo+72 [R7]+fsize s u u u s s s s c4 foo+76 [R7]+fsize s u u u s s s s r1 foo+80 [R7]+0 s u u u s s s s r1 notes: Common Information Entry (CIE):
Frame Description Entry (FDE):
|