![]() ![]() Microsoft released Windows NT 3.1, the first Unicode-based version of Windows, on 1993 Jul 27,Īnd the GEDCOM 5.3 specification (1993 Nov 4) allows Unicode as a legal character set. Unicode 1.0 was introduced in October of 1991, GEDCOM 5.0 on 1991 Sep 25. Unicode 1.0 was introduced just before GEDCOM 5.0. The UNICODE value does not represent the Unicode character set, but the UTF-16 encoding. The FamilySearch GEDCOM 5.5.1 specification (late 1999) enumerates four possible HEAD.CHAR line values: ASCII, ANSEL, UTF-8 and UNICODE. The HEAD.CHAR line value actually specifies the character encoding - as it should,Īnd the value should really be known as the value. The FamilySearch GEDCOM 5.5.1 specification refers to the HEAD.CHAR line value as a character set,Īnd the name used for that value in the GEDCOM syntax is. This character set cannot be interpreted properly without knowing which code page the sender was using. Note: The IBMPC character set is not allowed. Set will be limited in its interchangeability for a while but should eventually provide the international UNICODE is not widely supported by most operating systems therefore, GEDCOM produced using the UNICODE character The FamilySearch GEDCOM 5.5.1 specification enumerates four possible values for the HEAD.CHAR line value:Ī code value that represents the character set to be used to interpret this data.Ĭurrently, the preferred character set is ANSEL, which includes ASCII as a subset. To read a file, you do not only need to know the character set used, you need to know the encoding. Including UTF-8 and UTF-16 UTF-8 and UTF-16 are two different encodings of the same character set. Unicode is a character set with multiple encodings, The ASCII and ANSEL character set each have just one encoding. The character sets are ASCII, ANSEL and Unicode and the encodings are ASCII, ANSEL, UTF-8 and UTF-16. ![]() The FamilySearch GEDCOM 5.5.1 specification allows three character sets in four encodings. The HEAD.CHAR record and its line value are mandatory. The GEDCOM header consists of both mandatory and optional subrecords. Through the line value of the HEAD.CHAR subrecord.Įxtract of the FamilySearch GEDCOM 5.5.1 syntax for the GEDCOM header: The character encoding used by a GEDCOM file is specified in the GEDCOM header, It is a contradictio in terminis ASCII is a 7-bit character set. The FamilySearch GEDCOM specification even uses the phrase 8-bit ASCII. The specification never refers to UTF-16 as UTF-16, but as unicode,Ĭonsistently confusing UTF-16, a particular Unicode encoding, with Unicode itself. The FamilySearch GEDCOM specification fails to distinguish between character sets and character encodings. The FamilySearch GEDCOM 5.5.1 Specification Chapter 3, Using Character Sets in GEDCOM is confused. It is fairly short chapter, just three pages.Ī large part of these three pages is taken up by brief descriptions of the choices allowed. The FamilySearch GEDCOM 5.5.1 specification discusses character sets and encodings in chapter 3, Using Character Sets in GEDCOM. GEDCOM Character Encodings Decoding GEDCOM encodings character sets and encodingsĪ text file must be encoded using some character set & encoding,Īnd the FamilySearch GEDCOM specification allows several possible encodings.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |