X11 was defined in the middle of the 1980's. Since then, computers have changed almost beyond recognition; the simple framebuffer model used by computers at the time has been replaced programmable graphics hardware that is both complex and powerful. Not only has graphics hardware changed, but the basic processing model has and continues to change; parallelism in core system design is becoming the norm, rather than a special case for 'large' systems.
And that's just talking about desktop systems; now there are smart phones, netbooks, tablets, and probably will shortly be other device types that this author can't imagine (else I'd be working on them...).
In short, X11 was designed for a different era of computing.
This is not to say that there's an X12 project. There isn't. But if one day there is...
This section is a discussion of the broad requirements for X12; discussion of the specific failings of X11 are addressed in later sections.
To call this subject controversial would be an understatement; there have been and continue to be many attempts to either freeze 'X11' in time or discard it completely. Clearly, this author believes this to be wrong.
Network transparency. Network transparency rocks! Run a program on a remote system and interact with it on your local terminal; write a program an not need to care whether it's going to be run on a full workstation or a dumb terminal. Some may say this is unimportant, but when one looks at the development of Windows and the evolution of RDP, it starts to look a lot more like X in terms of its features.
Systems need to be secure. X12 needs to be designed with security in mind.
X12 should be designed with mobile phones, tablets, dumb terminals, netbooks and desktop workstations in mind; if it is to succeed, it must work well on all these systems.
The future will be more interconnected and network-oriented, not less. Network transparency makes things easier for users and can't be considered an 'optional extra'.
Programmable hardware, composition; all that good stuff. X12 needs to naturally support modern hardware in a way that allows developers to gain access to the hardware without having to completely bypass X (as happens currently).
For all the talk of modern hardware, let's not forget that the framebuffer concept is still extremely useful in certain situations; killing it off completely is likely to be serious mistake.
X has always been a low-level protocol; inefficiencies here will hurt applications.
This is perhaps the hardest part of the design of X12; the approach to computation and rendering is changing with a greater emphasis parallelism. X12 should be sympathetic to being implemented on massively-parallel systems, if not actively support such systems.
On the other-hand, trying too hard in this regard is likely to be serious cause of difficulties in finalising the design of the protocol; if in doubt, the designers should work to the standards of the day, rather than attempting to predict the future.
This section attempts to document the failings of the X protocol and rendering model. Learn from history, or be doomed to repeat it.
Popping up a menu and walking away can leave your screenlock unable to lock the screen since it won't be able to grab the pointer.
Server grabs are even worse when they lock out all other clients including those necessary for user interaction like compositing managers and accessibility helpers.
Current theory: Multiple clients can grab; when any grabs are active, only clients with grabs receive events.
You can't resize a pixmap in the X11 protocol, because you can't get events on things that aren't windows. Lame. Really a Window should only be an IPC name, with one or more associated pixmaps and etc.
PropertyNotify is a disaster. And we probably want to be able to get events on things other than Windows.
Possibly not worth fixing. However, composited by default might make it reasonable.
Probably. Note that we can more or less accomplish this within X11, but there are probably simplifications to be had by making this explicit in the protocol.
See XDevConf 2008.'s talk from
Which is really a special case of...
Borders and window backgrounds and the bg=None trick and backing store and saveunders and all that.
There's a speed/complexity tradeoff here, of course. Any time that implicit server rendering works, it saves you exposures and round trips. But the implicit mechanisms we have are poor fits for a composited model. Think very carefully about adding implicit rendering to the server process; it's probably a mistake.
The first 128 requests are core protocol; the remaining 128 are single-entry multiplex functions for extensions. It's sort of ridiculous to have XPolyFillArc on the same footing as GLX.
The minor opcode "convention" should be formalized and made part of the standard. The core protocol should be assigned major number zero and use minor numbers.
XIDs are 29 bits for some inexplicable lisp-related reason. Client-side allocation seems like a reasonable idea for avoiding round-trips, but the need to slice the 29-bit space into client and resource parts means we have an unpleasant tradeoff. Right now most X servers have a 256 client limit, which is uncomfortably close.
Should probably just bump this to uint32 for client ID and uint32 for resource ID.
It's pretty easy to hit 16-bit sequence number wrap without getting any responses from the server, which can make interpreting later events or errors impossible. Although perhaps it would be better to introduce a SequenceWrap event sent every 64k requests than to grow the sequence numbers.
Ouch. 32768 pixels at 100dpi is 8.3 meters.
585 years ftw. The server time should be included in every event, reply, and error. It should also be present in every request (but with the "CurrentTime" option available).
Cursors are encoded as rectangular bitmaps, so a full-screen crosshairs would be a bitmap 4 times the size of the screen. Allowing a more flexible format, such as SVG, would be good, as well as allowing smoother scaling for screen magnifiers.
Kill them or standardize them, preferably the former. Note that XCB-style asynchronous requests with replies make some uses of these less important.
All events provide a sequence number, except for the KeymapNotify event which wants the space for more data.
Events and errors are a fixed packet size to make parsing easy, but they aren't big enough to convey all the information you want. Some extensions like Xinput literally use every byte, and then some. The X Generic Events (XGE) extension adds a mechanism for larger events.
XGE should become the default, rather than fixed-size events.
Kill this off. It's only used for core text, which should die.
Replies, and probably errors, should include the major and minor opcode of the request that triggered them, to ease debugging.
Strings for Atom names and the like are required to be in Latin-1 encoding - should be replaced with UTF-8.
All requests and replies in X11 have a length field. This is cleverly encoded as number of 32-bit words, since all packets are padded to 32-bit alignment. This annoyingly results in tons of << 2 and >> 2 conversions everywhere to get into useful byte counts for reading & writing across the sockets.
Xi and (at least some elements of) XKB need to be folded in to the core protocol and made mandatory.
We need more than 255 keycodes, and more than 4 groups. We also need a better mechanism for expressing state than the current field, which limits us to 5 buttons and 4 modifiers, or so.
The current deep binding of input delivery to window coordinates is garbage. The redirection mesh idea is nice but it should be the only way.
Come up with a vastly more coherent set of keysym names than the current scattergun approach.
These are wildly unhelpful. CirculateWindow and RotateProperties can taste the curb.
(Screens is used in the protocol sense, with displays used in the physical output device sense.)
This is part intractably large implementation problem (of not allowing resource sharing among screens, nor screen hotplugging), and part protocol problem (screens as defined are static, and there's no expressed relationships). Pushing most of RandR down will help this, as well as rewriting core code.
Events should have a fixed destination window field, to support the idea of events delivered to windows, not to clients directly.
Decide what from XC-Misc, MIT-SUNDRY, XFree86-Misc, and XFixes needs to go in core.
For instance, the core ForceScreenSaver and the MIT-SCREEN-SAVER extension.
ICCCM gets special pre-defined atoms, but newer standards like EWMH don't. One approach would be to identify a set of predefined atoms by the hash of the names of the atoms, allowing extensibility in which atoms to predefine in the future. The connection setup response could include a list of the atom-sets this server provides, eliminating a round-trip at startup in the common case that all the atoms a client wants are already known.
With XCB we can, in principle, initialize all the extensions a client needs in two round-trips. But there aren't so many extensions in a server that we couldn't just provide the list in the connection setup reply. If that list included all the data that QueryExtension returns, that would eliminate one round-trip. If we standardize that every extension has a major and minor version number, and include those in the setup data as well, we can eliminate the other round-trip.
Need a dedicated request that just sends back an empty reply.
An appendix to the ICCCM lists mostly trivial improvements that would simplify the procedures set forth in that document.