X11 was defined in the middle of the 1980's. Since then, computers have changed almost beyond recognition; the simple framebuffer model used by computers at the time has been replaced programmable graphics hardware that is both complex and powerful. Not only has graphics hardware changed, but the basic processing model has and continues to change; parallelism in core system design is becoming the norm, rather than a special case for 'large' systems.

And that's just talking about desktop systems; now there are smart phones, netbooks, tablets, and probably will shortly be other device types that this author can't imagine (else I'd be working on them...).

In short, X11 was designed for a different era of computing.

This is not to say that there's an X12 project. There isn't. But if one day there is...

Requirements for the Successor to the X11 Protocol - “X12”

This section is a discussion of the broad requirements for X12; discussion of the specific failings of X11 are addressed in later sections.

To call this subject controversial would be an understatement; there have been and continue to be many attempts to either freeze 'X11' in time or discard it completely. Clearly, this author believes this to be wrong.

What is Good about X11

Network transparency. Network transparency rocks! Run a program on a remote system and interact with it on your local terminal; write a program and not need to care whether it's going to be run on a full workstation or a dumb terminal. Some may say this is unimportant, but when one looks at the development of Windows and the evolution of RDP, it starts to look a lot more like X in terms of its features.

A Rough List of Requirements

Security designed-in from the start

Systems need to be secure. X12 needs to be designed with security in mind.

Multiple Platform Support

X12 should be designed with mobile phones, tablets, dumb terminals, netbooks and desktop workstations in mind; if it is to succeed, it must work well on all these systems.

Maintain Network Transparency

The future will be more interconnected and network-oriented, not less. Network transparency makes things easier for users and can't be considered an 'optional extra'.

Support Modern Graphics Hardware and Rendering

Programmable hardware, composition; all that good stuff. X12 needs to naturally support modern hardware in a way that allows developers to gain access to the hardware without having to completely bypass X (as happens currently).

The Framebuffer is dead, Long Live the Framebuffer

For all the talk of modern hardware, let's not forget that the framebuffer concept is still extremely useful in certain situations; killing it off completely is likely to be serious mistake.

Be as Efficient as Possible

X has always been a low-level protocol; inefficiencies here will hurt applications.

Think Parallel

This is perhaps the hardest part of the design of X12; the approach to computation and rendering is changing with a greater emphasis parallelism. X12 should be sympathetic to being implemented on massively-parallel systems, if not actively support such systems.

On the other-hand, trying too hard in this regard is likely to be serious cause of difficulties in finalising the design of the protocol; if in doubt, the designers should work to the standards of the day, rather than attempting to predict the future.

Errors, Oversights and Omissions

This section attempts to document the failings of the X protocol and rendering model. Learn from history, or be doomed to repeat it.

Object model

Windows can not be zero-sized

Color maps are non-obvious

Grabs can block too much

Popping up a menu and walking away can leave your screenlock unable to lock the screen since it won't be able to grab the pointer.

Server grabs are even worse when they lock out all other clients including those necessary for user interaction like compositing managers and accessibility helpers.

Current theory: Multiple clients can grab; when any grabs are active, only clients with grabs receive events.

Windows and Pixmaps aren't split correctly

You can't resize a pixmap in the X11 protocol, because you can't get events on things that aren't windows. Lame. Really a Window should only be an IPC name, with one or more associated pixmaps and etc.

Fine grained events

PropertyNotify is a disaster. And we probably want to be able to get events on things other than Windows.

Infeasible to change color depth with clients running

Possibly not worth fixing. However, composited by default might make it reasonable.

Rendering model

Composited by default

Probably. Note that we can more or less accomplish this within X11, but there are probably simplifications to be had by making this explicit in the protocol.

No override-redirect

Core rendering is largely useless

Wide lines bite

BackgroundNone makes security geeks cry

See ?EamonWalsh's talk from XDevConf 2008.

Borders are stupid

Which is really a special case of...

Implicit rendering is stupid

Borders and window backgrounds and the bg=None trick and backing store and saveunders and all that.

There's a speed/complexity tradeoff here, of course. Any time that implicit server rendering works, it saves you exposures and round trips. But the implicit mechanisms we have are poor fits for a composited model. Think very carefully about adding implicit rendering to the server process; it's probably a mistake.

Encoding bugs

Extension space is too small

The first 128 requests are core protocol; the remaining 128 are single-entry multiplex functions for extensions. It's sort of ridiculous to have XPolyFillArc on the same footing as GLX.

The minor opcode "convention" should be formalized and made part of the standard. The core protocol should be assigned major number zero and use minor numbers.

XIDs are too small

XIDs are 29 bits for some inexplicable lisp-related reason. Client-side allocation seems like a reasonable idea for avoiding round-trips, but the need to slice the 29-bit space into client and resource parts means we have an unpleasant tradeoff. Right now most X servers have a 256 client limit, which is uncomfortably close.

Should probably just bump this to uint32 for client ID and uint32 for resource ID.

Sequence numbers are too small

It's pretty easy to hit 16-bit sequence number wrap without getting any responses from the server, which can make interpreting later events or errors impossible. Although perhaps it would be better to introduce a SequenceWrap event sent every 64k requests than to grow the sequence numbers.

15 bit coordinate limit

Ouch. 32768 pixels at 100dpi is 8.3 meters.

64 bit, nanosecond-precision timestamps

585 years ftw. The server time should be included in every event, reply, and error. It should also be present in every request (but with the "CurrentTime" option available).

Cursor image encoding

Cursors are encoded as rectangular bitmaps, so a full-screen crosshairs would be a bitmap 4 times the size of the screen. Allowing a more flexible format, such as SVG, would be good, as well as allowing smoother scaling for screen magnifiers.

ListFontsWithInfo-style requests with multiple replies

Kill them or standardize them, preferably the former. Note that XCB-style asynchronous requests with replies make some uses of these less important.

The KeymapNotify event special-case should go away

All events provide a sequence number, except for the KeymapNotify event which wants the space for more data.

XGE by default

Events and errors are a fixed packet size to make parsing easy, but they aren't big enough to convey all the information you want. Some extensions like Xinput literally use every byte, and then some. The X Generic Events (XGE) extension adds a mechanism for larger events.

XGE should become the default, rather than fixed-size events.


Kill this off. It's only used for core text, which should die.

Responses should assume less client state

Replies, and probably errors, should include the major and minor opcode of the request that triggered them, to ease debugging.

Latin-1 (ISO 8859-1)

Strings for Atom names and the like are required to be in Latin-1 encoding - should be replaced with UTF-8.

Length fields require constant conversion

All requests and replies in X11 have a length field. This is cleverly encoded as number of 32-bit words, since all packets are padded to 32-bit alignment. This annoyingly results in tons of << 2 and >> 2 conversions everywhere to get into useful byte counts for reading & writing across the sockets.


Core input is useless

Xi and (at least some elements of) XKB need to be folded in to the core protocol and made mandatory.

Resource limits

We need more than 255 keycodes, and more than 4 groups. We also need a better mechanism for expressing state than the current field, which limits us to 5 buttons and 4 modifiers, or so.

Redirection should be integrated

The current deep binding of input delivery to window coordinates is garbage. The redirection mesh idea is nice but it should be the only way.

Keysym names are a mess

Come up with a vastly more coherent set of keysym names than the current scattergun approach.

Design issues

Circulation APIs in core protocol are emacs disease

These are wildly unhelpful. CirculateWindow and RotateProperties can taste the curb.

Screens are not helpful

(Screens is used in the protocol sense, with displays used in the physical output device sense.)

This is part intractably large implementation problem (of not allowing resource sharing among screens, nor screen hotplugging), and part protocol problem (screens as defined are static, and there's no expressed relationships). Pushing most of RandR down will help this, as well as rewriting core code.

Events should always go to a window

Events should have a fixed destination window field, to support the idea of events delivered to windows, not to clients directly.

Move stuff from the "random fixes" extensions into core

Decide what from XC-Misc, MIT-SUNDRY, XFree86-Misc, and XFixes needs to go in core.

Don't split or duplicate a class of requests across core and extensions

For instance, the core ForceScreenSaver and the MIT-SCREEN-SAVER extension.

Predefined atoms

ICCCM gets special pre-defined atoms, but newer standards like EWMH don't. One approach would be to identify a set of predefined atoms by the hash of the names of the atoms, allowing extensibility in which atoms to predefine in the future. The connection setup response could include a list of the atom-sets this server provides, eliminating a round-trip at startup in the common case that all the atoms a client wants are already known.

Extension initialization

With XCB we can, in principle, initialize all the extensions a client needs in two round-trips. But there aren't so many extensions in a server that we couldn't just provide the list in the connection setup reply. If that list included all the data that QueryExtension returns, that would eliminate one round-trip. If we standardize that every extension has a major and minor version number, and include those in the setup data as well, we can eliminate the other round-trip.

Race Conditions

Need a basic sync request

Need a dedicated request that just sends back an empty reply.

Implement the ICCCM suggestions

An appendix to the ICCCM lists mostly trivial improvements that would simplify the procedures set forth in that document.

Reference material