X.Org Foundation 6.8 release postpartum discussion notes
These notes serve to document the tasks and issues that arose during the 6.8 release cycle and are intended to be a starting point for further discussion. They are arranged into the following general categories: scheduling, testing, and finalizing the release.
While the discussion below mainly focuses on tasks that did not work well or need to be improved, it should be noted that the goal of the release team was not to perfect the release process but rather to improve it as much as possible. I (and, from the comment I received, many others) feel that the team was very successful and achieved its goal given the constraints of the release.
Discussion of this release started in late May 2004; however, due to the travel schedules of most X.Org Foundation BOD members, the schedule was not finalized until mid July 2004. The release was determined to be a time based release since it was driven by several companies (most notably Red Hat and SUSE) that needed to have a newer X Window System release for their upcoming products. Those companies needed to have the release ready at the beginning of September 2004, so the date for the release was initially set to 25 August 2004 in order to give a buffer for problems that might occur during the release cycle.
The initial deadline for the release left us with a very tight schedule, which had several consequences.
- The deadlines for the feature freeze and code freeze were severely compressed, which limited the features that could be added and limited the amount of testing that was possible.
- A few bugs that would have otherwise have held up the release had to be postponed until after the release.
- The number of new features added in this release was significant; however, most (if not all) had significant testing outside of the X.Org CVS tree before they were merged in. The majority of the remaining testing and bug fixing for these features were due to interactions they had with other new components.
- It was challenging to keep people working on the features and bugs in order to meet the deadlines. Gentle pressure was applied in most cases to help motivate people on the critical paths. This will likely always be an issue for the release manager.
- Because the release cycle was compressed, it was not difficult to keep people focused on the release. However, if the release cycle was longer, this will likely become a problem.
The schedule was broken down into three phases: adding new features, fixing bugs and updating documentation. The deadlines were set approximately two weeks apart for each phase in order for the release to be completed by the initial target date.
During the first phase, the tree was open to adding new features, fixing bugs and updating documentation. The primary responsibility was setting up the initial wiki pages to describe the release plan and status, making sure that the community members were aware of the release schedule and coordinating with the authors of the new code to make sure that everything was checked in before the feature freeze deadline. This phase went smoothly with only a small amount of additional work being required to encourage a few of the committers to have their code checked in before the deadline.
After the feature freeze, the work was limited to fixing bugs and updating documentation. The source tree remained open to all committers to allow for the most people to find and fix bugs. For the release manager, the amount of time and effort required was significantly higher in this phase. The main tasks included:
- Managing the blocker bug list
- Holding regular release wranglers meetings (3 days/week)
- Keeping people focused on fixing bugs
- Reviewing and checking in fixes
- Resolving conflicts
- Encouraging testing (see testing section below)
Several suggestions were made by the release wranglers to help with fixing bugs. The most important of which was the release bug, which is a commonly practiced method of managing a release. Bugs that were considered serious enough to hold up (i.e., block) the release should be marked as blocking the release bug. Bugzilla allows the release manager to list the dependency tree of all bugs that block the release. While there were several attempts to explain how this worked, there was still some confusion. For future releases, it should probably be explicitly explained on the release and status pages.
The release wranglers met several times per week -- usually Monday, Wednesday and Friday mornings -- to focus on the blocker bugs and any issues that had come up during the previous few days. During these meetings, the release manager asked for (and usually got) volunteers to work on certain bugs. The remaining bugs were left to the release manager to investigate and resolve. These meetings were invaluable to the release manager.
Since multiple people were working on fixing bugs at this time, the source tree remained open, which not only allowed the release wranglers to check-in fixes, but also allowed other members of the community to work on and fix issues. The release manager monitored all of the check-ins to make sure that new features were not being added to the release.
Other important contributions during this stage came from those testing the release. There were quite a few people who were just staring to compile the tree and do testing. They reported bugs and marked them as blockers where appropriate. Some pre-packaged binaries were also made to help those who didn't have the experience of building the source tree, but could help with testing. These packages should be encouraged and made more formal in future releases. The general idea behind the testing for this release was loosely defined, but the details had not yet been worked out at this point, so the majority of the testing during this phase was devoted to build and daily usage testing.
Two other bugs were added during this phase: the "hold open but not block the release" bug and the release notes bug. The "hold open" bug turned out to be the less useful of the two since there was too much other work to do that these did not get attention. It is possible that this bug might be more useful in future releases if the schedule is not so compressed. The release notes bug was very useful and over the course of the release cycle it became the place where all documentation issues were placed.
This bug fixing phase was extended by three days to allow several major bug fixes to be completed and checked in. It could have been extended further, but the general feeling what that if we were going to keep on track for a late August release, then we should go ahead and freeze the code.
After the code freeze, the work was limited to fixing all major blocker bugs and updating the documentation. As noted above, the transition between the previous phase and this one was rather arbitrary to keep the release on schedule; however, it turned out that the main difference was that instead of everyone else checking in bug fixes, the release manager was the only person allowed to check in changes. Bug fixes were being proposed and attached to the release blocker bugs, and the release manager and/or the release wranglers would evaluate the change (where possible) and apply the patch if it was accepted.
Looking back, having all bug fixes funneled through a single person slowed down the bug fixing too much. For future releases, we should consider having a small team of people with write permission to continue to check in bugs during the critical bug fixing phase. Also, this transition phase should probably happen before the code freeze goes into effect, which would allow the code freeze phase to concentrate solely on documentation changes and last minute critical bug fixes.
It was during this code freeze phase that the testing was finally formalized. Once the formal testing procedures were documented, many more people started testing the release. The test matrix was updated as time permitted and as new test reports came in. Ideally, the testing should have been happening much earlier, but due to the compressed time schedule the test procedures were not formalized until late in the process. See the next section on testing for more details of the formal testing requirements for this release.
One action helped initiate the testing: tagging the tree with the first release candidate. This action along with the formalizing of the test procedure appeared to catalyze the community around the release. There were four release candidates tagged during this phase. Perhaps making snapshot tags in the previous release process and defining the test procedure earlier would have helped focus attention on testing before the code freeze.
As active formal testing began, more bugs were found and fixed in a relatively short period of time, but it soon became clear that the release would not be able to happen on the original schedule. The number of bugs were remaining relatively constant during this time. At this time, a list of the current blocker bugs was sent out each night to the mailing list to let people know the state of each blocker bug.
Over time the number of blocker bugs slowly shrank, and the focus shifted from bug fixing to updating the release documentation. Initially, the source tree was open to others making documentation only changes, but as the release neared, the source tree was closed to all but the release manager. At that time, the release notes bug became even more valuable to keep track of the features and bugs that needed to be documented.
The documentation needed to be updated in several places. First, the release number is currently present in the following files in the xc/docs directory:
Note that the documentation listed above is current as of the 6.8 release, and might change in the future.
The documentation in the xc/programs/Xserver/hw/xfree86/doc directory also needed to be updated. This documentation was built from the sgml files in the sgml subdir. The README, BUILD and RELNOTES sgml files will probably need to be updated with every release. The other files should be updated by their respective maintainers as needed. One special file, defs.ent, contains the macro definitions for the current and previous releases, and it was updated. Next, the old XFree86 doctools were required to build the sgml documentation. A few patches were required to build the tools (thanks to Soeren). Egbert added a README.build-docs file that describes what is needed to build and update the docs in the source tree.
Once the documentation was complete, the last steps to finish the development phase of the release were:
- Set the final version number and release date in the config/cf/xorg.cf and config/cf/cygwin.cf files
- Tag the tree with the release tag, XORG-6_8_0
- Create the release branch, XORG-6_8-branch
It was noted in an earlier release wranglers call that the branch could have been created much earlier in the release. Due to the compressed time schedule, it was decided to hold off creating the branch until very late to keep people focused on the release, instead of on new development. This should be reevaluated for future releases.
Additional discussion points:
- How should new releases be scheduled (i.e., if someone has a need for a new release, what should they do to get it scheduled)?
- Who/what determines the feature set for a new release?
- When should the stable release branch be created? What are the consequences of creating it earlier or later in the release cycle?
- Who should have write access to the source tree during the various stages of the release cycle?
- When should the tree be tagged for snapshots and release candidates?
- Should the documentation be updated to a more modern format? If so, should all docs be updated?
For the release to be successful and accepted by the community, it was determined very early in the release cycle that testing the release would need to be a priority. The testing was broken down into two parts: what platforms were to be tested and what tests were to be run on those platforms. During OLS, Stuart Anderson and I discussed both of these issues and then presented it to the BOD.
First, we determined that, given the scheduling constraints, it would not be possible to test all possible OS vendor, release, architecture, video card combinations, so a subset was proposed as sufficient. These included the operating system, the architecture, the distribution and release version number. Each combination would define a platform to be tested.
Next, we proposed a set of tests to be run on those platforms. The list included build, install, conformance and run test categories, and we outlined what was required to pass each test category. The tests as well as the platforms were organized into a matrix and was added to the freedesktop wiki:
On that page, the test matrix was included and instructions were given for running each of the tests. Initially the instructions were quite sparse, but as more people ran the tests, they were expanded and improved.
In the test matrix, the first three columns of each row defined one platform to be tested, and the last four columns displayed the state of the testing on that platform. Entries were labeled with the release candidate version that was tested and were given a green background if the test passed or a red background if the test failed (or had not yet been tested).
Names responsible for testing (or gathering the test information) were put into the fourth column in an attempt to give people some ownership and responsibility for testing a particular platform. This was moderately successful; however, there were a few problems with this system:
- The release schedule was incredibly tight and it was not possible to fully test all of the platforms listed.
- The amount of time to run through all of the tests was on the order or 8+ hours (on a 1GHz PC running Linux). Other platforms were significantly slower and some took days to complete the tests.
- Finding volunteers for testing (i.e., adding their name to the table) was not difficult as this was done early in the process, but it was not managed well enough. Clear responsibilities should have been outlined so that this process could have been self-starting and self-regulating.
- Updating the test matrix was cumbersome. Either giving this responsibility to those that volunteered to test a particular platform or automating it so that anyone can update the table would be better. The process for this release required that the release manager monitor the mailing list and update the release matrix as new reports came in.
- By the time the testing had begun, it quickly became clear that there were problems with the tests, which had to be addressed before any testing could truly begin. These problems were worked out within a few days, but the delay caused confusion and slowed down the testing process, and ultimately led to the release being delayed.
- There was also confusion about exactly which tests could be run on each platform. Certain tests could only be run on Linux systems, and comparable tests were not investigated for other platforms.
- The X test suite used was chosen for expediency and ease of use. It was not necessarily the best one available.
The initial goal for testing was to fill in the entire test matrix before the final release. However, it became clear to the release wranglers during the release cycle that the test matrix would not be completely filled, so the goal was changed to fill in as much of the matrix as possible before the release.
As noted above, testing is a very time consuming process and certain tests lend themselves to automation. Many people do not have the extra test machines required to do run tests; however, for those that do, automating the test process would certainly make it more likely that testing would be done. One key tool that automated part of the testing procedure was tinderbox. It allowed us to quickly notice when recent check-ins broke the build process. During many of the release wranglers calls, tinderbox and related tools were discussed, and it was generally agreed that these tools should be explored further to help automate as many of the tests as possible.
Additional discussion points:
- What else should be done to improve the test instructions?
- How can the test matrix be better managed?
Finalizing the release
Once the main development tasks were complete (as outlined above in the schedule section), the release was ready to be packaged and distributed to the community. This finalization stage included building the tarballs and documentation, uploading everything to the appropriate websites and handling the announcement/press release.
Historically, the source code for each public release is made available through a set of tarballs. Egbert created a script to automate creating the set of tarballs from a checked out source tree. Here is an outline of the steps involved (to be run as root):
- Create a new directory that will hold the release
- Export the tagged tree to this new dir
cvs export -r XORG-6_8_0 xc
- Untar Egbert's build scripts and cd to that directory
- Create a directory to hold the tarballs and run the source script
- Rename the tarballs to the appropriate names for the current release
mv Xsrc1.tgz X11R6.8.0-src1.tar.gz
Repeat for each of the other src files
Currently, there are seven tarballs created. Their contents are described in the README file that is shipped with the release (and can be found in the documentation on the website -- see below).
In addition to the multiple tarballs, it was later determined during the 6.8.1 update release that creating one large tarball containing all source code was desirable. From the web logs, more people downloaded the one large tarball than the set of smaller ones.
The website on freedesktop includes not only the tarballs (above) but also the documentation for the release. The website is arranged as follows:
X11R6.8.0/ binaries/ doc/ patches/ PDF/ src/ src-single/
The binaries directory contains the pre-compiled binaries for various operating system releases. At this time, no pre-compiled binaries are being made available. We should consider doing this for future releases.
The doc directory contains the html formatted documentation for the full release. This documentation is taken from ProjectRoot/lib/X11/doc/html after doing both a "make install" and a "make install.man" from a full build of the release. These html files also reference the PDF docs, so the PDF sibling directory should contain the documentation from ProjectRoot/lib/X11/doc/PDF.
The src directory contains the set of seven tarballs (described above) along with the md5sums file. The md5sums file can be created with the following command: "md5sum *.tar.* > md5sums". The src-single dir contains the single source tarballs and their own md5sums. For the 6.8.1 release, two single source tarballs were created: one in gzipped tar format and one in bzip2'd tar format. The bzip2 compressed tarball was added since it has become very popular and is smaller than the gzip compressed tarball.
The patches directory is normally empty for full releases (i.e., releases that have a patch number of 0). For patch releases, this directory would contain the patches necessary to bring the release from the previous full or patch release up-to-date with the current patch release. See the 6.8.1 release for an example.
The next task of the finalization process is creating the press release. This task took quite a while to get appropriate quotes from members of the community, companies, etc. so it is suggested for future releases that it be started well in advance of the preparation of the website documentation and tarballs. There are other steps required here, but since I was not involved with this task, I will leave it to others to describe the process.
The goal was to complete the tasks described above and make the release available to the community on 9 September 2004. Unfortunately, several problems occurred and important lessons were learned about how to handle the release announcements:
Many people were very excited about this release, and we hope that the excitement and enthusiasm carries over to future releases. However, there were some who snooped around the website and found the source tarballs before the official announcement had been made, and this got reported to slashdot. Since the X.Org website had not been updated and the press releases had not been finalized, this pre-announcement by slashdot caused confusion and "stole the thunder" from the official announcement. The lesson here is that the documentation and tarballs should be embargoed in a completely private place that no one other than those involved in the finalization stage have access to.
The X.Org website and freedesktop website need to be made public at very nearly the same time. The official website should be X.Org with freedesktop as a mirror. However, since few people have access to the X.Org website, the freedesktop site was set up first and the X.Org site files were copied from there. This could have been handled better by embargoing the release.
The press release needs to be prepared well ahead of time so that the official announcement sent to the press/mailing lists and the unveiling of the websites can be done simultaneously.
- How much ahead of time does the press release need to be sent to the appropriate press outlets in order for it to be released at a specific time (i.e., the time that the embargo is lifted)?
- What other mirror sites are available? What should be done to coordinate with them to make the release available on their sites as soon as possible after the announcement?