Revision of the keysym specification
In X11, keysym numbers are used to represent the symbols visible on the keycaps of a keyboard. They represent either characters or function keys. The existing keysym scheme predates the introduction of Unicode/ISO 10646 by several years. Just like Unicode, the keysym scheme attempted to define a unification of many of the older existing coded character set standards.
Why are changes to the standard necessary?
The keysyms, as they are still defined at the moment in the X11R6.7 protocol spec, cover only a tiny subset of Unicode's character repertoire. While they are believed to cover existing keyboard standards pretty well, users continuously want to add support for new scripts and add new characters to unused positions in existing keyboard mappings. Practically all requests to add keysyms for additional characters are about characters from the Unicode standard. There is a clear need to extend the keysym scheme to cover many more characters from the Unicode standard.
- Unicode is now the by far most widely used ASCII extension. Many implementations therefore need to convert received keysyms to the corresponding Unicode characters, and authors of keyboard maps need a clear understanding of how keysyms relate to Unicode. While it is obvious for most of the existing keysyms what the corresponding Unicode character would be, there are many keysyms, where this is not the case. It would be useful if the X11 standard established a clear normative relationship between those keysyms that represent a character and the equivalent Unicode character. As there are now normative mapping tables between Unicode and most older coded character set standards, having a Unicode mapping means having a mapping to most other encodings as well.
Solutions
Main.MarkusKuhn has suggested back in 1999 (and implemented long ago in XFree86 xterm) that any Unicode/ISO 10646 character in the range U0100 to U10FFFF can be represented by a keysym value in the range 0x01000100 to 0x0110FFFF. The Latin-1 characters in the first row of ISO 10646 (U0000 to U00FF) are already represented by keysyms with the same value.
Suggested changes to the standard
- 1 Add to the keysym table in the X11 protocol specification an additional column that indicates the corresponding Unicode position 1 Add to the X11 protocol specification a statement that reserves the keysym range 0x01000100 to 0x0110FFFF, such that any Unicode character above U00FF can be represented by a keysym value obtained by adding 0x01000000 to the characters Unicode position. 1 Add to the X11 protocol specification a statement that newly added keysyms that have a clear correspondence to a Unicode character should use the Unicode value + 0x01000000 as their keysym value.
Problems
Some minor issues relating to mappings between Unicode and keysyms:
- The current keysym standard distinguishes between several characters that the Unicode standard has unified into a single character. For most of these, it could be argued that they were added in error to the keysym spec originally. Examples include
- U+003c (LESS-THAN SIGN) = 0x3c (less) and 0xba3 (leftcaret)
- U+003e (GREATER-THAN SIGN) = 0x3e (greater) and 0xba6 (rightcaret)
- U+2228 (LOGICAL OR) = 0x8df (logicalor) and 0xba8 (downcaret)
- U+2227 (LOGICAL AND) = 0x8de (logicaland) and 0xba9 (upcaret)
- U+00af (MACRON) = 0xaf (macron) and 0xbc0 (overbar)
- U+2229 (INTERSECTION) = 0x8dc (intersection) and 0xbc3 (upshoe)
- U+005f (LOW LINE) = 0x5f (underscore) and 0xbc6 (underbar)
- U+25cb (WHITE CIRCLE) = 0xace (emopencircle) and 0xbcf (circle)
- U+222a (UNION) = 0x8dd (union) and 0xbd6 (downshoe)
- U+2283 (SUPERSET OF) = 0x8db (includes) and 0xbd8 (rightshoe)
- U+2282 (SUBSET OF) = 0x8da (includedin) and 0xbda (leftshoe)
U+20a9 (WON SIGN) = 0xeff (Korean_Won) and 0x20a9 (WonSign)
- The current keysym standard lists a number of characters for which either the meaning is completely unclear, or there exists no equivalent Unicode character. A serious attempt was made to locate the original Digital Equipment Corporation source standards for some of the keysym sets (e.g., Publishing), but it appears that these have been lost in time. Examples of characters with now unclear semantics include:
- 0x08b1 topleftsummation
- 0x08b2 botleftsummation
- 0x08b3 topvertsummationconnector
- 0x08b4 botvertsummationconnector
- 0x08b5 toprightsummation
- 0x08b6 botrightsummation
- 0x08b7 rightmiddlesummation
- 0x09df blank
- 0x0aac signifblank
- 0x0abd decimalpoint
- 0x0abf marker
- 0x0acb trademarkincircle
- 0x0ada hexagram
- 0x0aff cursor
0x0dde Thaimaihanakatmaitho
All of these probably ought to be declared deprecated, and perhaps even be removed from the standard at some point.
Other issues that ought to be fixed
- Appendix A of the X11 Protocol spec, which defines the keysyms, could in general do with a major rewrite. for instance, separate sections should distinguish between Special keysyms, Latin-1 Keysyms, Unicode keysyms, Function keysyms, Vendor keysyms, and Legacy keysyms.
- Some of the Legacy keysyms with now forgotten meaning should be deprecated eventually. Also the Currency set, which was copied out of Unicode directly and is now obsoleted by the Unicode keysym mapping, should be deprecated (0x20AC EURO SIGN is the only one of these actually used in any keymap).
- The keysymdef.h file contains a set of 0xFExx keysyms (Keyboard (XKB) Extention), which were missing so far completely from the standard.
- Some archaic and obsolete material could be removed, for example the old ISO/ECMA 16/16 notation (everyone uses hexadecimal today), or the the section sign vs. paragraph sign vs. pilcrow naming discussion, which is today the least of the problems with the keysyms.
- The character names sould be updated to the latest version of ISO 8859, which now use the names from the Unicode database.
- XFree86 has added a lot of symbols. Most of these should be moved into the Unicode range. Some of them can probably be removed as they are unnecessary (especially all the precomposed Vietnamese characters cannot possibly fit onto any real keyboard). Five keysyms from recent XFree86 extentions that seem widely used and therefore worth adding to the standard are:
0x06ad Ukrainianghewith_upturn
0x06bd UkrainianGHEWITH_UPTURN
- 0xfe60 dead_belowdot
- 0xfe61 dead_hook
- 0xfe62 dead_horn
Draft patches
These are now in a quite mature state, address all of the above issues, and are ready for final review:
Replacement for xc/doc/specs/XProtocol/X11.keysyms: http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms
(PDF)
Replacement for xc/include/keysymdef.h: http://www.cl.cam.ac.uk/~mgk25/ucs/keysymdef.h
Open issues
- Investigate the semantics of the added "Keyboard (XKB) Extention" set. E.g., some of these seem to come from ISO 9995-7, but cross-referencing with that document did not give a flawless match. Any additional information on that topic are highly welcome. Who did add the "Keyboard (XKB) Extention" keysyms, and when? Is there any additional background documentation about the meaning of these keysyms? Are they all actually used and needed?
- Look at Microsoft's recent Multimedia/Internet function keys, which are in part already covered in XFree86 vendor extensions, whether/how these should be moved into the X11 standard.
-- Main.MarkusKuhn - 16 Aug 2004


