991026.3 B I J. Merrill Compression Duplicate Dwarf data deletion

The scheme we sketched out yesterday will allow us to remove duplicates
from the .debug_info section, which is the most important (modulo
.debug_abbrev handling, as discussed below). However, there
can be duplication in other dwarf2 sections, as well:

.debug_abbrev: Some entries may only be used by discarded info.
.debug_str, .debug_loc: Likewise.
.debug_line: Line info for discarded COMDATs should also be discarded.
.debug_frame: Likewise for unwind info.
.debug_pubnames, .debug_aranges: Likewise.
.debug_macinfo: Also subject to duplication.

_abbrev is tricky because the abbrevs need to be numbered. This means that
  we must define a certain set of abbrevs ahead of time, and all the
  .debug_info bits to be commonized can only use those abbrevs. This
  significantly complicates the process of reducing .debug_info.

_line is tricky since the header contains a list of filenames that
  will be referenced later. This also affects .debug_info, since
  DW_AT_decl_file (if used) refers to the same header.

_pubnames and _aranges are tricky because the header refers to the
  length of the pubname/arange set, which would require link-time
  calculation. This is also true of .debug_info.

On the other hand, it would be pretty straightforward to generate an
additional CU within the object file. It could use the _abbrev and _line
info from the main CU. The minimum overhead (for a 32-bit target) would be:

  11 bytes for the CU header
   1 byte for the CU DIE TAG
   1 byte for AT_language
   4 bytes for a pointer into .debug_str for AT_producer
   4 bytes for AT_stmt_list (maybe)
  --
  21 bytes

A possible extension would be AT_extension for TAG_compile_unit, so the
secondary CU would be only 1+4 bytes, bringing the total to 16 bytes.

_str, _loc, and _frame can all be broken up easily; the chunks we're
interested in don't need headers or depend on other information.

_macinfo is tricky because it is linear. Breaking it into chunks
  would require some sort of extension -- perhaps a symbolic reference to
  the macro information for a particular header, to be used instead of
  MACINFO_{start,end}_file.


This proposal has been revised and is replaced by 010219.1.