991118.1 | B | A | F. Burton | Language | Unicode support |
For Java and other languages where identifiers can contain
multibyte/unicode characters we need to define how this should be
implemented.
With DWARF 1 we (DIAB) defined FORM_STRING to be a UTF-8 encoding of
the identifier. UTF-8 is specified by the Unicode consortium and has
the following properties:
1. UTF-8 strings are null terminated byte arrays.
2. UTF-8 strings use the most significant bit to indicate multibyte
characters.
3. UTF-8 strings that contain only 7 bit ASCII characters look exactly
like an ASCII string.
Adopted. Description of "string" in Sections 2.2 and 7.5.4 should specify UTF-8. Reference should be given to ISO/IEC 10646 AMI.