XmStrings and Motif 1.2 ======================= 1.0 Introduction ================ This document gives a cursory explanation of the way XmStrings are encoded in Motif 1.2 (and LessTif). This information is still being discovered, so explanations of where it is wrong are welcome. Sometime last year on the LessTif mailing list, a well known SGI persona, Doug Rand, sent an email that described the things that were changed from 1.2 to 2.0. One of the things he mentioned was that the encoding rules for the external representation of XmStrings can no longer be considered to be in ASN.1 format (ASN means "Abstract Syntax Notation"). 2.0 Description (Get ready for the acronyms) ============================================ If you don't care where the rules come from, or what they are for, you can skip this section. I happen to work in the telecommunications industry, and I have experience with ASN.1 and related standards as defined by the ITU (others know these standards either through the ISO or from RFC's). ASN.1 is used by the GDMO (roughly, "Guidelines for the Development of Managed Objects" - there are several ways I know of to decompose that acronym) to describe MIBs (Management Information Bases). Basically, ASN.1 is a way to describe data types in a machine independent way from a text description (sorta like XDR). Vaguely associated with ASN.1 are sets of encoding rules, such as BER (or Basic Encoding Rules) which describe how to actually create external representations of data. There are other encoding rules (e.g., FER), but you've had enough acronyms for now. ASN.1 is really a very powerful tool; you may want to learn more about it on your own. 3.0 How It Works ================ Ok, enough of the background. Let's see how it works in practice. The basic idea is to describe data elements as a three piece combination: tag/length/value, sometimes referred to as TLV. You basically have a tag, which describes what type of data this is; a length, which says how long the following value is; and a value, which is basically an octet (or byte) sequence that describes the value. The basic unit of information is the octet (or byte); 8 bits of information. You can see how 8 bits might be a little small to describe large strings -- more on that later. One thing that must be noted is that TLVs can be nested; that is, the value part of a TLV tuple can contain TLVs. I'm going to skip a full description of BER and just report the basics of how they relate to XmStrings. Let's take a trivial example: xmstr = XmStringCreateLtoR("Hello\nWorld", XmFONTLIST_DEFAULT_TAG); The first thing to notice is the XmFONTLIST_DEFAULT_TAG; that's a clue to Motif that the string passed in is represented in the current locale (I'm not even going to try to talk about NLS; look elsewhere for what locale means). The second thing to notice is that we used *CreateLtoR, which means the function should be aware of separators (normally, this means "look for newlines"). So Motif would parse that as "Hello" (locale text) "\n" (separator in this locale) "World" (locale text) Let's look at what Motif does tell us about encodings; each XmString component has a different identifier: XmSTRING_COMPONENT_UNKNOWN ; 0 XmSTRING_COMPONENT_CHARSET ; 1 XmSTRING_COMPONENT_TEXT ; 2 XmSTRING_COMPONENT_DIRECTION ; 3 XmSTRING_COMPONENT_SEPARATOR ; 4 XmSTRING_COMPONENT_LOCALE_TEXT ; 5 Hmm, these could be the tag part of the TLVs! Given that, the XmString that 1.2 generates is the following (in hex and chars, with the 0x prefix removed from the hex): df 80 06 10 05 05 'H' 'e' 'l' 'l' 'o' 04 00 05 05 'W' 'o' 'r' 'l' 'd' (**** verify this is really the right string) which makes absolutely no sense when you look at it that way. Try this: df 80 ; this is a Motif string (essentially) 06 10 ; which contains a 16 byte XmString 05 05 ; which contains 5 bytes of locale text "Hello" ; which has the value "Hello" 04 00 ; and a separator ; which has no data (never does) 05 05 ; and 5 more bytes of locale text "World" ; which has the value "World" The first number (on lines that have them) are the tag; the second number is the length. You can see that this description shows how TLVs can be nested. Look at it this way; if I just describe the string above structurally, it comes out as (using parentheses as an indicator of nesting): TLV=(TLV=(TLV,TLV,TLV)) Ok, now you're scratching your head. Where does the 0x80 (the first length) fit in? Remember how I said that 8 bits was a little small for describing lengths? Well, that's where BER kicks in. There are really three ways for describing lengths: short form, long form, and indeterminate form. As far as I know, Motif cheats horribly on this (more on this below). Here's how you describe lengths in BER: if (length < 128 [0x80]), then length is contained in one octet. if (length > 128 (but not indeterminate), then the length octet is defined as 0x80 + the number of octets needed to describe the length (up to 127 additional octets; this can describe lengths up to 2^^(127 * 8), or 2^^1016, which is HUGE). The octets describing the length immediately follow the length octet and come before the value octets. In practice (as far as I know), Motif limits this to two additional length octets, which implies a maximum value length of 65535. if (length > 2^^1016, or you are really lazy (like Motif is), then the length octect contains 0x80, and you're to parse the value (which contains TLV tuples) until you come to a TLV whose tag and length are both 0. As I said before, Motif is really lazy; the first header (0xdf 0x80) should imply that an XmString parser should look for a tag and length that are both 0; in practice, Motif strings contain one element in the value; the XmString. I've parsed strings in Motif looking for the (0 0) tag/length, and run off into space; therefore, LessTif stops after finding the first XmString component. In effect, a length of 0x80 in Motif means "I don't know how long my value is, but my value is really a TLV, and there's only one of them". Let's look at our example string again, in light of this information: df 80 ; XmSTRING_TAG, XmSTRING_LENGTH 06 10 ; XmSTRING_COMPONENT_XMSTRING, 16 bytes 05 05 ; XmSTRING_COMPONENT_LOCALE_TEXT, 5 bytes "Hello" ; "Hello" 04 00 ; XmSTRING_COMPONENT_SEPARATOR, 0 bytes ; 05 05 ; XmSTRING_COMPONENT_LOCALE_TEXT, 5 bytes "World" ; "World" That should make more sense, now. Note that the tags 6 - 125 are said to be reserved in the Motif header files; now you should understand why the value 6 is XmSTRING_COMPONENT_XMSTRING (which doesn't appear in any Motif header). 4.0 Structures ============== [Need to explain here why order is important in the strings -- the charsets MUST come before the strings that use them].