These are rough notes (provided by SimonFarnsworth, whose travel to XDC was paid for by ONELAN Digital Signage), plus links to the video recordings. Please feel free to improve the notes; they're there to help you decide whether to watch the recording or not.

Day 1

Board presentation

The foundation supports development, rather than controlling it.

Foundation is now a US 501(c)(3) charity (thanks to SFLC). Board meetings open and public.

Book sprint to write developer docs has happened (one in March, one in September). Results will be published on wiki.

Xorg has just joined OIN patent pool.

The war chest is shrinking back to a sensible size; the foundation will soon need to fundraise. Current balance is around $85,000, run rate is around $30k/year. Can burn for 3 years before going bankrupt, but would prefer to find new funding sources. Old funding sources were big UNIX workstation vendors; foundation expects new funding sources to be smaller sums per contributor.

Infrastructure (shared with is a challenge, but being worked on; new sysadmin hired and will be working on web stability.

EVoC had 5 students this year; 4 succeeded. Question - does EVoC replace Google Summer of Code, or does need to work out why we weren't accepted and get back into GSoC in future years?

Various IP issues (new logo etc). Also need to revise foundation bylaws. Note; foundation is not tied to (25 years old!), and will continue with Wayland or any other graphical stack that subsumes X11.

Usual need to get more developers on board.


Endless Vacation of Code is inspired by Google Summer of Code, but funded by X.Org foundation. No time restrictions, but 3 month long projects (doesn't have to be summer).

Goal is to get students to become X.Org developers (productive output from EVoC is a bonus); covers the entire graphics stack, basically from drivers (OpenCL, DDX, OpenGL etc) up to the layer beneath the toolkits. It's not meant to provide a wage, just to let you work on X.Org instead of flipping burgers over a long vacation.

Puts extra load on mentors; honestly interested students are welcomed whole heartedly and are worth the extra load, but if you're not actually interested, please don't waste their time (EVoC is not free money for students). Mentor has to establish that student is competent to tackle project, and provide regular assistance to keep you on track. Board relies on mentor's judgement to evaluate students.

Graphics stack security

Inspired by driver development book sprint. These guys aren't driver developers, and had to learn for this presentation.

Security is a trifecta - confidentiality, integrity, availability.

Users have expectations - e.g. cross-app communication is drag-and-drop or cut-and-paste, and therefore under my control. X breaks this - get the auth cookie, you have full access (can keylog, can screenshot as credit card number is typed). Isolation is therefore between users, not between one user's apps.

Problem - all apps grow, and have bugs. Any exploitable bug in any app lets you have full access to the user's desktop session.

So, confidentiality breach; keyloggers, screenshots at interesting moments. Integrity breach; draw over Firefox, so that the user can't tell that you've redirected them from their bank to a phishing site. Any application can act as a "virtual keyboard", and type - another integrity breach. Availability breach - any application can act as a screen locker. ClearGrab bug - a virtual keyboard app can kill a screen locker.

Current mitigations: XSELinux has fine grained access control, but normally deactivated as default distro policy doesn't confine users. Xephyr can provide virtual X screens - but coarse grained, so tricky to use right.

QubesOS uses VMs to solve security problem. PIGA-OS uses SELinux + XSELinux + PIGA-SYSTRANS daemon to solve it.

QubesOS groups apps into security domains. Each security domain is a Xen domU; X server provided for each domain has colourful marking to indicate security domain, inter-domain comms via a daemon in dom0, which implements MAC.

PIGA maps security domains to SELinux types, labels everything. SYSTRANS grants rights as needed, prompting users if this is a cross-domain move.

Wayland moves the security problem into the compositor - events unicast, so compositor has control needed to secure everything. However, compositor is then attack target - how to solve this? Privilege separation? We have an opportunity to fix things here, let's not waste it.

Driver/hardware story not too bad - CPU can't access arbitrary VRAM unless root. Opensource drivers not too bad at GPU isolation between users - mix of VM contexts and switching contexts on GPU, and command validation - scan commands and stop user doing things it shouldn't. Trades context switch cost for CPU time.

Goal is per-GPU-process isolation, just like we have per-CPU-process isolation on the main CPU. Think about information leakage (uncleared buffers), privileged plugins (e.g. to compositor) scanning address space (ASLR helps?) etc.

Solaris privilege separated X11

Solaris can run X without root privileges. Aim is to upstream as much of this as possible.

Solaris Xorg creates a named pipe for GDM to X11 comms, and runs Xorg as root. At login, GDM tells X (via pipe) about new user. X switches UID, but keeps root as its saved UID (POSIX-compliant) so it can become root again when it needs to (VT switch, regen).

Solaris has facilities to set device ownership/permissions on a VT and all associated on login; Xorg uses those facilities to ensure that it can open devices as the user, rather than becoming root.

Patches linked from slides. Side note - UEFI secure boot locks out most non-KMS drivers, so we have to work out what we do about hardware like MGA.

Dante: chasing performance

Oliver took Doom3 (8 years old) GPL3+ release, ported from OpenGL 1.x with extensions (depending on backend used) to EGL and OpenGL ES 2.0 (including clean-room creation of GLSL shaders to replace the ARBfp/ARBvp shaders), then tried to make the OpenGL ES 2.0 version perform well.

No good performance analysis tools for Mesa. "Best" available is intel_gpu_top for i965 driver hardware, but it's a top-like coarse-grained performance tools. Also not user-friendly, as represents load as hardware units (so you need to read HW docs to have a clue what's going on if it's showing an issue).

Every closed driver vendor has decent tools - AMD bought gDEBugger, and made it fglrx-only, other vendors have equivalent tools. Older version of gDEBugger sort-of works with Mesa, sometimes. Mesa has nothing.

Linux has perf infrastructure - lots of performance counters and sampling with source code annotation of results. Nice profiler; can we reuse for GPU performance? Hook DRM and intel_gpu_top data in, rely on existing tools for UI. Need userspace co-operation to get all the way (per-frame indication without GPU stall, so that profile can be interpreted per-frame).

Mesa's best infra so far is perf_debug - but not tied to ARB_debug_output, just to stdout. Also not frame boundary aware - so no way to tie perf_debug output to render operations.

Given all these hints, how do we cope with separate debugger processes?

Oliver found a GLX versus EGL bug. Others may exist.

Intel avoided the need for tools to fix Valve's L4D2 - they sent skilled engineers in instead - this does not scale, so tools are needed.

Comment at end from audience - apitrace is being developed into one profiling tool (although only runs on captured traces, not on live apps); work is underway on a shimGL that hooks live.

Phoronix benchmarking

No video. Contact MichaelLarabel for more information.

Michael would like developers to expose performance information (clocks, utilization %age, thermals) in a consistent fashion (e.g. standard sysfs files, with speeds in kHz). Please all behave the same, though.

Remaining discussion was Michael asking how Phoronix can be useful to driver devs, even though benchmarks are end-user targeted.

Some discussion of goals of benchmarks, agreement from floor that devs can write their own benchmarks to target individual bits of the driver. Big thing Michael can do is benchmark git snapshots regularly, and bisect performance regressions.


DRI2 fixed the SAREA disaster of DRI1. Experience (including implementing Wayland) has shown that DRI2 has pain points, so let's discuss DRI3 to fix these pain points.

Underlying issue is X server allocating buffers, and thus having to synchronise tightly with clients. DRI3 aims to make GLX buffer management similar to Wayland (clients allocate, "present frame" from glSwapBuffers becomes send buffer to X). Can we avoid having the server tell the client how big the buffer is (e.g. by relying on glViewport and providing an extension for apps where the bounding box of all viewports is not the buffer size)?

DMABUF may let us remove GEM handles from the pile - pass a private DMABUF fd via the UNIX socket, instead of a guessable GEM handle. Lots to consider here - YUV buffers, buffer sharing etc.

DRI3 can now flip when window size matches buffer size - blits only needed if size mismatch (so client/server disagree on window size, so something will go wrong anyway).

Identifying reusable buffers is also a challenge.

Discussion will be on mailing list - this talk is to get people thinking about it.

Board meeting

New logo is needed. The board will fund a contest if someone steps up to run it, provided the board gets trademark, copyright on resulting logo.

Election cycle upcoming - 4 to replace as per bylaws.

Finances are now about what the board wants to run with - time to raise money to stay put.

Foundation no longer paying for hosting at MIT - now hosted at PSU, sharing infrastructure with Machines donated by Sun Microsystems.

New sysadmin starting this week, will start by working on web service reliability.

Day 2

X Server Integration Testing

This is around 1 month's experience so far - most things up for change as a result.

Rationale - testing is hard. Manual testing hurts. Regressions happen if there's no easy tests to find them.

XTS is not the right test suite for lots of good reasons, but has 1,000 test cases to learn from. We need combined server + driver tests. It must be really easy to write test cases (otherwise no-one bothers). They must be easy to run, and the output must be easy to parse, so that people get in the habit of trying tests.

New thing is XIT - based on xorg-gtest which is based on googletest C++ framework. Peter showed some example tests.

Some problems; no shared library => Is gtest the right framework? Server can be slow to start - Keith thinks this should be treated as a bug (server start should always be fast). Issues with version combinations and distro-specific results. Documentation of test cases also an issue - remembering why you wrote it, and what it's supposed to tell you.

Googletest good at controlled execution of subset of tests - can select by regex, for example. Would be nice to combine XIT with tinderbox, for continuous testing - we're good at fixing compile fails flagged by tinderbox.

On to Q&A. Currently Linux-tied by uinput - can do equivalents for non-Linux systems. However, this also means no special hardware needed - events are faked by uinput, so you can test things you don't have.

XTS is broken - written as a contracted piece of work by non-X people, so many tests are broken. How do we recover the valuable bits of XTS? Peter currently not interested in this, as wants driver tests first, protocol tests like XTS can come later.

Googletest not a Google service, just a chunk of code from Google. No tie-in to G+ etc.

Tests aim to observe user-visible behaviour - no driver internal state examination needed.

Contrast to Wayland's test infra (simpler); will Wayland test infra become as complex as googletest, or is googletest too complex for the use case.

Can be extended for output tests - rendercheck is apparently broken, but we have the concept to hand.

Can also produce nice XML for processing into tinderbox-style HTML.

If we redo XTS, we need to avoid the code dump problem we have - but formal specification and RFP process valuable.

Hardware independent accelerated graphics

Today, we have multiple DDXes; a select few do modern acceleration, and only the big three (Intel, Radeon, Nouveau) have high quality modern acceleration. We removed XAA from all DDXes, as not useful in the modern world.

Hard to implement acceleration - XRender does a lot, and drivers only accelerate "popular" XRender paths - but popular changes with time. We're developer time limited - unless we acquire many developers suddenly, we won't be able to accelerate everything easily.

Three attempts so far at generic acceleration - glamor, st/xorg, st/xa (OpenGL, Gallium, Gallium). Benchmarks used to determine if any of them help.

All tests run on AMD E-450 with 8GB RAM, as Radeon supports EXA, glamor and Gallium. 25 cairo traces used to provide representative sample of operations. All tests baselined against image backend (CPU rendering).

No graphs - Phoronix will produce if considered helpful - just some interesting points. First, only CPU and EXA accel could survive all traces - glamor triggered one panic, st/xorg triggered 3 and fails to cope with tiled scanout (interesting pictures). st/xa is not complete enough to test easily.

CPU rendering usually faster (except when test is fillrate limited). Only Intel's SNA acceleration is as fast as CPU rendering in almost all cases - but don't get fooled by speed, as acceleration is also about responsiveness under load (reduced CPU use by graphics => more CPU for user applications).

Glamor is only maintained option - st/xorg, st/xa not actively developed, and both have issues. But glamor is slow, and "fights" OpenGL to do 2D ops (claim from floor that this is mostly because Gallium is a heavyweight layer, and not fast).

Q and A - Radeon developers at AMD open to using st/xorg, st/xa, but prefer glamor as it's maintained already (so no need for more manpower).

Any measurements of what's improved? Text rendering is everything in 2D - only common operations are blit scrolling and text rendering (word processing etc); glamor will get better at text rendering when GL_ARB_blend_func_extended is supported and used by it.

In the long run, apps that care about render performance will use OpenGL (matches hardware capabilities with buffer objects etc, when X is immediate mode and therefore not modern-hardware-friendly).

Can Gallium overhead be reduced enough to make glamor competitive? Time will tell.

Joystick input driver module

This module is for special cases - most users don't need it. It's meant to let you operate X with a joystick or gamepad instead of a mouse (not needed for games). Uses are cheap remote, kiosk system (joystick instead of mouse), consoles, media centres etc.

Needed over evdev because you want to do special axis/button mapping - e.g. absolute axes into acceleration of a moving pointer instead of position. Lots of mapping options provided, to cover most common use cases.

Does not grab device, so concurrent use by (e.g.) games is possible.

Distro users mostly install it in error. Needs sensible on-the-fly configuration (graphical frontends on hotplug etc), so that it doesn't cause problems.

Server has a design problem; we want to be both a pointing device (mouse-like) and a slightly funky keyboard - needs to be two server-side devices, not one (leading to mess). Server doesn't cope well with funky keyboard (special driver-controlled keymap etc).

Focus is on Linux systems, but works on FreeBSD as well. However, Linux consoles tend to use kernel drivers or userspace daemons to fake a "real" mouse instead of this driver. Console distro awareness therefore needed.

Wiimote/Kinect not the focus of this driver - they're not joysticks, they're a different sort of input device.

Some fun with force feedback (and nostalgia for ASR-33s) - XI1 allows some events, but toolkits don't use them so never ported to XI2. Doesn't mean it can't be done, though.

Request from audience - please update documentation to cover everything in this talk, to help reduce confusion.

XCWM and XtoQ

Started as Portland State University capstone project (5/6 students doing an externally sponsored project). Goal is library to run X clients in a non-X window system (Wayland, Mac OS X, Microsoft Windows...) without needing special hacks in the server like XQuartz and XWin.

Uses Damage and Composite to do the heavy lifting, with dummy drivers as needed. XtoQ was proof of concept for XCWM - can XCWM replace XQuartz?.

Currently has basic WM functionality - limited ICCCM, EWMH, functional mouse + keyboard, and support for override-redirect windows.

Basic idea is that XCWM runs its own XCB event loop, tracking important data in its own data structures (accessible from native window system event loop) and callbacks on interesting X-side changes. Certain amount of cross-thread excitement as native system and XCWM interact.

Example code flowcharts for XCB_MAP_NOTIFY and Damage event. Damage handling is interesting - XCWM basically gives the native system damage notification, which results in the native system coming back to ask for pixmap for damaged area. When native grabs pixmap, XCWM subtracts the damage, and relies on native rendering the pixmap to get good looking output.

Input event passed by native to XCWM, then XCWM uses XTest to inject into server. Nice and simple.

Future work: more ICCCM, EWMH. Rethink event loop. XtoSomethingElse (Wayland or Windows) to prove that there's no Mac OS X dependency in XCWM. Real drivers for input, instead of XTest.

Contributors welcome - libxcwm git repo, or e-mail Jess <>.

Languages for X client development

This talk was interrupted by a power cut, so two recordings: then when power returned.

Question; why are desktop apps apparently harder to write than mobile or web apps? Is choice of language (usually C or C++) the problem? Note that webapps are rarely C or C++, mobile is usually Java or Objective C.

One problem is that desktop apps are bigger with more requirements; however, mobile and web gain from GC, OOP, dynamic binding. Mobile/web is often not modular, nor do they provide big mega-apps.

Any interesting domain-specifics?

Issues to consider:

  • Correctness
    • Threading versus event streams
    • Callbacks and control flow
    • Memory management
  • Modularity
  • Widget manipulation from code
  • Configuration
    • I18N Lots of language choices out there.

C and C++ popular with XDC attendees - we write the server and drivers in C. Is that why we use it for desktop apps? Does mainstreamness help?

C libraries work well due to FFIs. Toolkits seem to continue C or C++ theme (Qt, GTK are entrenched). Structure makes using other languages hard.

Basically, it's all too hard. Big Q is requirements from a programming language - is it even the right question? How do we fix things so that people go back to building X11 apps for "geek cred" instead of mobile or web apps? Code generators not the answer - too hard to work with generated code.

Question - is X's age part of why other things seem simpler? We have 25 years of finding and resolving problems; web apps and mobile don't yet, and have fewer problems as a direct result (see also Android fragmentation).

Question - is JavaScript the answer? Note history of JS - it's an accident that it dominates web. However, people learn it.

Qt's QML suggested. Point from floor - packaging is the unsolved problem. I have a browser already, I might not have Qt, QML or other important libraries installed.


Basic principle of Wayland is to remove the bits of X11 that toolkits don't use any more, and then clean up the cruft.

In the X11 world, compositor runs as a client of the server. X server sits in the middle of everything, getting in the way. Wayland makes the compositor and the display server one process.

Current plan is to have a stable release later this year; not a final finished release, but stable protocol and API so you can build against it.

Buffer management is client side (see DRI3 ideas). Client allocates and manages buffers including releasing them when unused. Protocol exists to hand buffers to compositor, and to discover when the buffer is no longer used by the compositor (and when the buffer was presented).

XVideo is replaced by YUV buffer support (described by fourcc). There will be a libva backend that can feed YUV buffers direct to compositor. Compositor can use YUV samplers for texture sampling; there is a Mesa extension for this already. Presentation timestamp for buffers lets you tie application clocks to presentation clocks (e.g. for movie playback with lipsync).

Even better, because all you're passing is GPU buffer objects, the compositor can use hardware overlays (aka sprite planes on some hardware) instead of sampling, as and when the compsitor thinks it's a good idea. The client doesn't need to know what's going on - atomic pageflip (Rob Clark) will make it faster than today (where using sprite planes isn't full frame rate).

Couple of demos - first a simple spinning triangle, then a libva movie trailer.

Testing tools are Weston compositor and a toy Wayland toolkit (used for things like a terminal). Weston is a simple, usable compositor (maybe good for embedded), but desktops will end up taking your X11 compositor and making it the Wayland compositor (using same libraries as Weston to talk Wayland).

List of Wayland features next; Wayland doesn't give apps global co-ordinates, so no risk of confusion by apps. Makes input redirection totally reliable, even when window has arbitrary rotation. XWayland clients find themselves at global 0,0, which leads to weirdness for things like xeyes. Events tell applications about subpixel layout, which can change live.

Weston uses a framebuffer per output, so some of the reasons for clone mode go away - but clone mode still possible.

X integration works a lot like XCWM; there's an XWayland module that translates X to and from Wayland as needed. DDXes need some changes to provide acceleration, but not hard to do. Basically the same idea as composited X11, but with the X server being a client of the compositor. Note that even GLX works under XWayland, although Shape extension is not yet present.

Some work needs to be done on X window management - WM is in Weston, leading to a loop (Weston is a client of X is a client of Weston). Needs splitting up before deadlock bites.

Wayland has a handshake protocol for popup menus, to avoid offscreen problem without global geometry.

Weston reuses libxkbcommon for keymaps, to share the work done on X11 keymaps. Zoom in Weston has a "follows mouse or typing, whichever was last" policy. Also screen recording present, with useful tools. There is a Wayland "ping" protocol to check for dead clients - Weston only uses it when it needs to. Because it's UNIX socket connectivity, Wayland compositors can terminate misbehaving apps by PID; this is more reliable than X, which uses a window property.

On-screen keyboard is an external process, launched by Weston as a specially privileged client. Remoting is currently a VNC-like differencing protocol, either whole-desktop or single window. As it happens, because Wayland is optimized for minimal context switches (no round trips), remoting is responsive, even with VNC-like behaviour.

XWayland will be sent to xorg-devel@ list for review soon.

GBM is part of Mesa, but created for Wayland use of OpenGL. It's an EGL display, providing the functions EGL needs (display, surface creation etc), plus extras for buffer swap via compositor, KMS, headless operation, cursors etc. It's been used in KMSCon as well as Weston.

Day 3

New i915 modesetting

The new code is entirely kernel changes backing the same userspace ABI. It's really all about why the helpers were unhelpful.

Naming is a bit problematic - prepare/commit map to enable/disable. Kernel "encoder" is Intel "port", kernel "crtc" is Intel "pipe", terms get used interchangeably.

The crtc helper assumes encoder/crtcs are completely split and can be enabled, disabled and routed independently without influencing each other.

This is both too flexible (can call disable at unexpected times, reorders encoder and crtc enable and disable sequences depending on what changes it's making, which breaks hardware and requires driver to do state tracking to elide dangerous hardware prodding), and is also not flexible enough to cope with hardware quirks (Intel CPU eDP PLL, LVDS pin pair, Haswell DDI port).

New code simplifies DPMS - with the exception of old VGA, Intel hardware only does DPMS on and DPMS off. This is connector-driven; Intel had to map to encoder state as well.

Modeset sequencing is now fixed, which helps with hardware quirks. Callbacks in place to get things done in the same order at the same times, no need to elide any hardware poking.

Output state got staged, so that when things go wrong, you can get back to a sane state. Hardest part of conversion job, but has had two benefits - can read out firmware state during takeover (flicker-free boot) and can compare hardware state to software intended state at suitable points, making it possible to sprinkle WARNs in to make debugging simpler.

The incomplete split between crtc helper and fb helper is obvious WIP; modeset semantics that fb helper depends on were "fun" to duplicate. Cleanup in progress, including training helper to not call the driver to do things if it shouldn't have any effect.

A few WTFs found - DPMS On after modeset, disabling of disconnected outputs, helper vtables not typed. Again, being fixed.

Remaining todos include HPD IRQ handling - may have to move to driver to deal with things like HDMI versus DP signalling on Intel DP++ ports. Not sure why all connectors get polled on HPD IRQ - bug workaround? Tread carefully.

Atomic modeset and pageflip (aka nuclear pageflip) need to understand shared resources like PLLs. Doesn't work with the current disable everything then enable pattern, but fixable.

Always need to clean up dirver code, but it's happening.

Seamless takeover of firmware state is happening. Output routing, firmware FB storage done, just have to deal with resources like PLLs, reference clocks, panel fitter, and state like interlacing.

DP MST is going to make things interesting (but no sinks yet); currently, one crtc feeds multiple encoders. MST means multiple crtcs feed one encoder. Two ideas to consider, either DP encoder as a shared resource like a PLL, or pipes grouped and controlled as a block. Can wait until sinks exist.

Question on whether this fixes flicker during modeset; answer is no, but gets code into shape to fix it. If it does fix it, it'll be because they've made a hardware quirk workaround fix actually take place now.

Release planning

1.13 was 2012-09-05. Target for 1.14 is thus 2013-03-05. Any reasons to delay? No.

Keith would like bugs filed once in stabilization period, rather than private complaints to developers. We want to hold up the release to get the bugs out - this won't happen if Keith is unaware of the problems.

There are features planned - "DRI3" (which may turn out to be just DRI2 fixes), XWayland if we can, some GLX stuff.

Atomic modesetting in X as well as KMS is a target, too.

Rough plan is 3 months to build new features, 2 months for general bugfixing, last month for critical bugs only.

Atomic modeset and pageflip

AKA nuclear pageflip if you've been confused on IRC.

Atomic pageflip is update all plane and crtc fbs and properties within a single vblank. Atomic modeset relaxes the time constraint, but sets all connectors to required state in a single kernel call.

Both IOCTLs have a test flag, to confirm that the hardware will do it sucessfully, as well as a live flag to do the operation requested. If test says it'll fail, you can fall back to a different option (e.g. draw a textured quad instead of using a plane, or reduce resolution on an output).

Most state is now stored in property lists, to reduce the number of codepaths in pageflip/modeset/property set (avoids unexpected bugs).

There's been work done to split out kernel state from user-visible state. Test flag means you need to build up proposed state and then rollback, so helpers have been written to assist.

Atomic sequences become begin, check, commit, cleanup. Only commit should touch hardware, and check should be the same between the actual sequence and the test version.

Patchset also cleans up the code a bit - some object properties added, the idea of "dynamic" properties (which can be changed freely, so test will always return true), signed ranges for properties.

TODOs: remove plane->update_plane, crtc->page_flip. They're now obsolete. Fix the IOCTLs - still not quite stable.

Question - how does core code handle things like memory bandwidth requirements? Answer - it doesn't, it punts to driver. Core checks things that are common (e.g. sane size, fbs being passed actually exist), then lets the driver cope.

All checking must be done in check, commit shouldn't fail if check passed, otherwise users can't do the right thing.

Can we provide better diagnostic information (e.g. an error string)? Preference is to use error numbers, so that we can evade I18N issues.

DRI2 video

Replace XV with DRI2 by pushing more buffer types.

XV downsides - at a minimum, two memcpys (Xshm, XvShmPutImage). More if you don't do Xshm. Not ideal for any hardware acceleration, as can't allow for HW decoders with special memory requirements (alignment etc). Not ideal for GPUs - at best, map/unmap, at worst copy.

Today's DRI2 is unscaled RGB buffers only. VA-API and VDPAU do a GPU YUV->scaled RGB blit, then compositor does at least one blit. Memory bandwidth becomes an issue.

Wayland does this - get X to match capabilities on local machine.

So, teach DRI2 to allocate unscaled YUV buffers (video-size, with codec borders available), including multiplanar formats. If driver/display can support it, we can scan out directly (overlays) from the unscaled buffer (zero-copy scanout).

Removes at least one blit, so always helpful.

New protocol to support this.

Some missing bits - interlace, stereo. Client control of buffer/plane mapping (e.g. NV12 in one buffer or two) - can we fix that with DRI3 proposal. Destination co-ordinates (for fixing aspect ratio).

Optimus and cross-device sync

This uses DMA-BUF, server, patched DDXes and xrandr 1.4 to make it work.

DMA-BUF wraps buffer objects in special (shareable) fds, that can let you remap across hardware.

Server needs GPU hotplug support (platform bus), xrandr 1.4 for configuration. DDXes have to learn about Optimus roles (render slave, output slave).

Would be nice to use robustness extensions to enable compositors to do seamless muxed GPU switching - ideally completely user-invisible.

Sync is hard - you have multiple hardware devices, and arbitrary buffer ordering and sharing, with arbitrary locking. Avoiding deadlock is done using TTM reservation code, and TTM fence primitive for cross-device sync. Make lockdep annotations possible (helps with bug fixing).

All APIs are unstable - not even the name is tied down.

Fence is a dumb primitive - literally "block until signalled". Can implement purely in software, but hardware implementations also work to kick things. Have one exclusive fence, and a few shared fences per device.

Reservation is based on tickets to let clients make progress without deadlocking. Rules exist for using both APIs.

Demo on Ubuntu system, showing that it's still not working well (although it does work), and can crash the system.

The example for shared fences is a camera feeding both the GPU (for a preview display) and a hardware H.264 encoder (for recording).


Problems are authentication and buffer sharing.

Security model is that master has control, but can literally only stop clients authorizing. Once authorized, clients are unrestricted.

Only one master at a time (needs root), but you drop master when you don't need it.

Display servers need master; X uses it for limiting modesetting to active server. Isolation between clients (or at least between masters) would be nice, but not currently provided.

There's a VT switch race; VT switches need us to release master in one process, gain in another. Malicious process can race for the gap.

GEM buffer sharing is flink-based, and dangerous - any client can access any buffer if it guesses the handle. No confidentiality or integrity.

Non-GUI apps need the GUI to authenticate them - makes GPGPU feel weird (why do I need X to co-operate)?

Idea - split master into two, one for modeset, one for GEM. flink restricted to only working between a master and clients it has authed (won't break current user space). Driver + HW can then isolate clients completely.

KMSCon could, in the long run, replace VT switches. It takes master, it forwards things between running display server (e.g. Weston compositor, and the hardware, and pays attention to what should and should not have access.

DMA-BUF is nearly as secure as reasonable, but could be slightly enhanced by LSM (SELinux) hooks. Not used by most mortals, but some people have plans.

Details as patches over the next month.

EVoC Nouveau project experience

Student started as a web developer. Moved onto KDE, but developing X seemed scary and hard.

EVoC was an accident; tried to get an onsite intership at Mozilla, Google, Apple, but failed. GSoC deadline had gone past, so either Android app development, or EVoC.

Student wanted a fast GPU anyway, for gaming. Bought an NVIDIA card, and found out that Nouveau couldn't do reclocking on modern cards.

Presented a brief history of reclocking, from nv50 (where defaults were OK) to Fermi (default 10% of max performance).

Goal was to make it work as well as it did on nv50; nv50 had HWSQ, but Fermi replaced it with PDAEMON, which changed ISA to FµC (flexible microcode).

Martin Peres had already done fan management and host -> PDAEMON comms. Project had three parts; PDAEMON to Host comms, replace HWSQ, and write more documentation (audience enthused by documentation comment).

PDAEMON->host comms was a simple ringbuffer with sanity checks. Tested, proven working, merged, then onto the hard bit (Fermi Scripting Engine).

FSE acts as a HWSQ replacement. Implemented in FµC, provides 6 basic operations - two types of delay, MMIO read/mask/write, message to host.

Send message to host is still not working, but rest works and is being tested.

Did lots of good documentation on his blog Also got mwk to write intro.txt in envytools.

Still thinks we need more newbie docs; Supreet wrote a "Beginner's guide to KDE devel", X needs the same to get people in.

On to EVoC; EVoC is a 13 week project proposed by student. Money is $1k up front, $2k mid-term, $2k completion. Can start at any time, as long as you have 13 weeks.

Money was delayed, but did arrive.

Moved onto discussion of EVoC with audience - how do we make it better? Important bits are that X can't afford to pay wages, and is really aiming to get people over the "X is too scary - I can't possibly contribute to it" barrier - if it wanted work done, it'd have the mentors do it.

Student suggests publicity needed, and more documentation to ensure that mentor time doesn't get wasted (prerequisites for suggested projects).

Tegra and embedded graphics Hardware is ARM CPUs + GeForce ULP (no unified shaders, no virtual memory). Sophisticated memory controller, to share bandwidth and control memory access latency for GPU/CPU accesses.

Two similar display controllers - different encoders (HDMI, DSI, TV). Scanout engines have 3 plane plus cursor support, 2 YUV-capable planes, separate colorkey and palette per plane. No rotation possible at scanout.

Tegra is OSS friendly - can get the TRM from developer website. No documentation for acceleration engines now, though, although there are plans to open up fully. Still a lot of work to upstream everything done so far (partly due to Android focus).

There's a start on a DRM driver; KMS works, but memory management not done. Efforts underway to RE the 3D engine.

Problem; no memory virtualisation at all. Security issues; all buffers must be continuous DMA memory; misbehaves under memory pressue. Fixed for Tegra 3.

Challenges: low memory bandwidth - scanout can eat all the bandwidth. Copying data hurts more than ever before (so RandR with shadow buffers really hurts). Surface tiling required to have enough bandwidth to cope - fallbacks are proportionately more expensive.

Memory again - you don't have much of it - fixed "stolen" VRAM is a waste of memory, but lack of IOMMU makes smart sharing hard.

Compositing with OpenGL not preferable. Subpar use of 2D engines, no plane usage. Wayland provides some help here (makes it possible to write a 2D compositor).

Q&A - is CMA a fix for the memory problem? No, as you can't easily release memory from the GPU when the CPU needs it.

Nokia mentioned that hardware scanout rotation is overrated - you want things like animations, so you don't use the hardware rotate anyway.


Motivation was improving Piglit's GL tests. Piglit is a GLUT user - GLUT is simply outdated; on Linux, is GLX only and can only do legacy GL contexts, so can't do OpenGL 3.1 core profiles, or even request a specific GL version. Lots of ugly hacks to get GLES tests, so not many ES tests.

Piglit should really run on Wayland, Android, X/EGL. GLEW also an issue (for GLES and for GL > 3.0).

Qualities of an ideal solution:

  1. Build each test once, run on each of the available systems - GLX, X/EGL, Wayland, decision at runtime.
  2. Where sane, mux a test over multiple GL APIs (e.g. ES 2, ES 3, GL 3.2, legacy GL etc). This may be a first anywhere.
  3. Benefit projects other than Piglit - so should be a separate library (e.g. for apitrace to reuse with glretrace). Solution so far; can do the first with libwaffle. Getting going on the second, but more work to go.

Piglit needs GL core profile support - not Waffle's fault.

Waffle API resembles EGL with extras for window creation, dlsyming your GL library (with OS abstraction), getting at the underlying native objects (e.g. for input behaviour that's outside Waffle's scope).

Supported platforms are all Linux display engines, Android (experimental - bugs may be Android's fault), Mac OS X (experimental, not fully tested). Patches welcome for Windows.

Demo of Dante ported on top of Waffle instead of EGL. Run same binary on Wayland and EGL/X11.

Waffle is not GLUT or SDL. Will never do input or audio. It's meant for demo applications and tests. See examples in repo (aims to use a GL-like attribute list API).

Question: considered a wrapper header to avoid the dlsym stuff? Yes - piglit has macros to do dispatch.

There is a KMS/GBM backend being written - will let you run piglit on a headless server. Side benefit - lets you work on new hardware as soon as 3D hardware works, while rest of team tackles the modesetting problem.

Question: Dynamic detection of GL API? Example does a dispatch based on environment variables. As yet, Waffle can't pick for you. What are sensible defaults? Consider X on Wayland and Wayland on X (both supported options) - which is the "best" window system?

Waffle dlsym just calls dlsym for you on the right library. But have to be aware of ABI - waffleGetProcAddress backs it the right way, though. EGL, GLX, CGL all have different rules on dlsym and GetProcAddress. Windows backend will trigger an API revision due to GetProcAddress weirdness. Chad is scared of Windows now, so will not be implementing that backend.


Starts with a look at the current position (set down in 2000, so outdated).

Lots of hoops for more OpenGL. GLES with GLX is worse. OpenGL with EGL is awkward. Indirect rendering - just don't start on it, we want to kill it with fire (can't exceed OpenGL 1.5, usually claimed in error, leading to bug reports).

Things to do; split libGL into libOpenGL for OpenGL (like libGLESv2 or libGLESv1_CM) and libGLX for GLX. libGL just links to the two of them.

Version libOpenGL. App developers hate GetProcAddress; how do we escape that? Versioned libOpenGL sonames? Can we bump the minimum GL version (noting old hardware is still out there)?

Want to deprecate GLX, as new GLX extensions are hassle, and we've already decided to kill indirect rendering.

Try and convince everyone to use EGL instead. Hassle is proprietary drivers - NVIDIA apologised, they're working on it (and already ship it for Tegra).

Make OpenGL ES part of the ABI; makes it easier to port between mobile and X.

Where do we go? Update loader/driver interface - closed source driver vendors showing interest, but this will be slow. Killing indirect makes it easier, because the X server stops having trouble with it.

TODO now: split libGL. Convince distros to drink EGL coolaid - GLX to EGL porting guide?

NVIDIA's Andy Ritger has a good proposal up at and will help drive this. Discussion on mesa-dev to get it moving.

Discussion on how you version libOpenGL - soname? ELF versioning? We'll work it out.

Question - what is replacement for indirect? Answer is VNC-like remoting - note that we're now talking about limiting ourselves to 1999 OpenGL versions, as indirect GLX is not keeping up.

NVIDIA has extended indirect GLX to newer OpenGL. Maybe though, X should learn to ship images across the wire instead of GL commands - can be done without applications to be aware of it.

OpenCL testing framework in piglit

This was an EVoC project: Use piglit to test OpenCL as well as OpenGL. Benefits TDD and regression finding.

Goals; test OpenCL compliance and versions. Make test writing easy.

Piglit provides concurrent execution of tests. Grouped tests. Results display.

Test as much as possible, aim for all-platform, all-device tests, but tests can be tied to a device or platform.

Try to make test code common. Helpers to deal with common blocks of code.

A test is configuration section to set up environment, then a test section. So far, three test types (API, OpenCL program execution, and custom). Tries to reduce code you need to write if you're focusing your test. Shown example of custom test.

Program tester aims to let you write an OpenCL program, specify inputs and expected outputs, and execute multiple tests on your OpenCL code, without writing a line of C.

Showed that it does run its 70 tests on 3 different OpenCL implementations - 32 passes on Clover, 53 on Intel (CPU-side implementation), 55 on AMD APP.

Short term - more tests, clinfo (like glxinfo, but for OpenCL). Add support for half type.

Longer term, want OpenGL+OpenCL combination testing (buffer sharing). Support for SPIR (Standard Portable Intermediate Representation) binary OpenCL program format.