High Performance Graphics Hardware Design Requirements
This page attempts to spell out graphics hardware design requirements needed to
build high-performance graphics subsystems. This page is intended for h/w
graphics chip and board designers, as well as graphics software sub-system
designers and graphics device driver writers. It's intent is to broaden the
understanding of hardware design principles needed to create high-performance
graphics subsystems. These principles are well known to high-end folks, but
are sorely lacking in the Wintel PC clone marketplace.
This page is motivated by discussions on the
USENET group, and the efforts of the
Linux GGI group,
where it has been discovered that most PC-class/ MS Windows
graphics hardware is sorely lacking in important graphics features.
Current work on hardware-accelerated 3D centers around the
OpenGL implementation. The
Graphics Advocacy page provides
the Linux background for accelerated 3D graphics.
The single most fundamental concept of high-performance graphics hardware
design is that the graphics program must have direct access to the hardware.
Depending on your experience, this may sound either obvious, or a damned-fool
bad idea. To people writing computer games, and to people building hardware,
this is obvious. To people writing operating systems and graphics
applications, who are used to device drivers, libraries and windowing systems,
this sounds stupid. In fact, both camps are correct: fast access is direct
access, and yes, with improperly designed hardware, it is dangerous.
The high-end Unix graphics hardware community has learned that both worlds are
possible: direct access from user-level programs (usually through libraries)
for performance, coupled to protected system modes that prevent out-of-control
or malicious programs from hanging the system and locking up the hardware.
However, to create such a system, certain principles must be adhered to in the
raster chip, bus interface chip, and graphics card design. These principles
are not terribly hard, and in fact are sometimes deceptively simple and
obvious. However, many schedules have been slipped due to a misunderstanding
of the required functions. The repercussions of these principles affect the
hardware, the graphics system, the operating system, the window system, and
the graphics application. "Minor" hardware bugs in these areas are
not easily worked around in software; indeed, it may not be even possible to
work around them.
There are two basic principles: (1) a recognition that there is a difference
between a protected mode, to which only the operating system has access, and
user-level drawing commands, which any program can bang on. (2) The concept of
context switching, whereby one graphics application can be stopped, and
another re-started, all without hanging the graphics adapter, or
loosing/scrambling the state of the hardware. All of the other principles
follow from the above.
Without further ado, the list:
Well, that all. There are in fact a large variety of more detailed design
issues, but these are too numerous to be discussed in this overview. All of
the principles discussed above are well-known and understood in the high-end
(UNIX) graphics hardware community. All of these have been discussed and
written about in public forums and journals. However, many of these are rare,
have low circulation, or are out-of-print. This is the ultimate reason for the
existence of this page. See Bibliography below.
- Protected Mode
- Certain graphics h/w registers/functions, such as cursor control and
colormap load, must be segregated into a distinct address space from other
functions, such as area clear and line drawing. This allows the operating
system to protect *privileged functions*, such as cursor movement or colormap
loading, from *user space programs*, which want to have direct access to
hardware registers for line drawing and area clear for (obvious) performance
reasons. Such functions must be separated by at least 4K bytes, since most
CPU's do not allow fine-grained memory protection (e.g. Intel x86, PowerPC,
MIPS, Sparc only allow protection for 1K-4K byte pages.)
- Hardware Cursor
- It is impossible to build a high-performance graphics subsystem if the
cursor needs to be drawn using software. This is not much of an issue, since
many DAC's today support hardware cursors, and many/most graphics cards
provide this function.
- Atomic Operations
- All drawing (i.e non-protected) operations must be atomic. This allows the
operating system to suspend one program that is drawing, and start up another
program that is drawing, without hanging the graphics hardware. For example,
if it requires three registers to be written to draw a line or clear an are
(start-xy, end-xy, and "command"), it must be possible for the
software to write the start/end points, and never get around to writing the
command, without hanging the hardware. (If the command is never written, then
the line is never drawn).
In particular, this requires that command words be written last, and not first.
For commands that require multiple registers to be written, it must be
possible to break off the command at any point without hanging the hardware
(i.e. it must be possible to write some of the registers, without writing all
of them, without indefinitely hanging the hardware). If only a partial command
is written, then no operation is performed.
- Interruptible Operations
- All drawing (i.e. user-level) operations must be interruptible. That is, if
a command requires that multiple registers must be written, it must be
possible to start writing data for this command, and then break this off and
perform another command instead. Thus, for example, it must be possible to
specify the line endpoints, then specify clear-area extents, then clear the
area, then move the cursor, and then ask for the line to be drawn (software
may have reloaded the line endpoints first). Such interrupted operations must
NOT leave the hardware in an unknown or hung state.
This, together with the atomic-operations requirement above, and the readable
registers requirement below, allows a multi-tasking operating system to stop a
drawing process at any time (on an instruction-by-instruction basis), put it
to sleep, and then allow another drawing process to run and do its drawing.
Non-atomic, non-interruptible drawing operations require that the drawing
program to obtain a lock, do its stuff, then release the lock when it's done.
In general, locks are undesirable: they are slow. Even if a lock was fast,
just having to do one takes CPU cycles away from what we really want to do:
Note that after the operating system has suspended one client, it may do
house-hold functions, such as updating the cursor or the colormap, before
allowing other processes to run. Thus, it must be possible to execute
privileged commands that interrupt user commands.
- Readable Registers
- All registers must be readable. This is vital for a multi-tasking operating
system. This allows the operating system to stop a graphics process, and save
its graphics hardware context. It then allows the OS to restore a possibly
different context from a different graphics process, allowing it to run, then
stopping it, saving, etc.
The concept introduced here is of "context switching" or
"multi-tasking". Basically, a graphics program can be suspended at
any time, and another graphics program can be started exactly where it last
left off. In order to be able to restart another process precisely where it
left off, it must be possible to set the graphics hardware into the exact same
state where the last program left off. To be able to get back to the exact
same state, it must be possible to somehow read and save this state.
Note that high-end hardware usually provides features that not only make it
possible to read and restore state information, but also make this operation
extremely fast. Hardware that does support save/restore usually supports this
at sub-millisecond speeds, thus allowing hundreds of context switches per
second, while still leaving the the CPU and graphics card 90% free so that
drawing can continue without hardly any slowdown.
Note that more modern high-end high-end hardware allows multiple graphics
contexts: these can be saved to, and restored from special RAM areas on the
card, without having to move all of the context information over the bus.
- Window Clipping Planes
- Window clipping planes prevent a program from drawing outside of it's
window boundaries. This function isn't absolutely required, but is almost so.
A graphics program can achieve much higher performance by not worrying about
whether it is drawing outside of it's window boundaries, or whether it is
obscured by another window. In addition, clipping planes provide an important
security function: they prevent errant or intentionally malicious programs from
drawing where they should not. Thus, an out-of-control program will not
scribble all over the screen.
The update of window clipping planes must be a reserved, protected operation.
That is, the control of window clipping planes must be segregated into a
different address space than other user-mode drawing operations.
Note that some graphics hardware provides user-mode clipping registers. These
are NOT what we are talking about here. Yes, it is nice to have user-mode clip
registers, but these cannot be used by the operating system to prevent
out-of-control or malicious programs from drawing where they shouldn't.
Note that hardware that supports directly-addressable frame buffers should also
support clip tests against data written to the directly addressable areas.
- Per-Window Double Buffering
- This is not strictly a requirement, but frankly, for a high-performance,
animated 3D hardware, full-screen double buffering sucks. It is painful to
support in the operating system, in the graphics subsystem, and basically
looks bad once you have two or more windows animating at the same time.
- Per-Window Multiple Colormaps
- Again, not strictly a requirement, but if you want things to look nice on
the screen, you have got to allow applications to set their own private
colormaps, without ruining everything for the other windows on the screen
- Another non-requirement, but the fact is that most high-end graphics
hardware employs FIFOs to buffer drawing commands between the central CPU
and the graphics hardware. These FIFO's are typically anywhere from 64
Bytes to 64 KBytes long. This allows the CPU to write commands to the
graphics adapter without having to wait for it to finish, and it allows
the graphics hardware to process drawing commands without having to wait
for the CPU to provide more commands. As long as the buffer never
accumulates more than one-tenth of a second worth of drawing commands,
any delays or lags become essentially un-noticeable to the user.
Four common designs are seen: FIFO's in hardware (on the graphics
adapter), FIFO's in user-memory, and "ping-pong" buffers. FIFO's on the
graphics card can present a problem: when a context switch occurs, the
FIFO contents must be saved and restored. They can be moved either to
other memory on the graphics card, or they can be sent across the bus,
back to the system. FIFO's in user memory present a problem: data and
pointers can be corrupted by the user program (accidentally or
maliciously). Of course, it must not be possible to hang the hardware
due to corrupt data in the FIFO.
- Hardware Contexts
- Yet another non-requirement. However, almost all high-end hardware
keeps considerable graphics context information on the hardware itself.
Just as is the case with FIFO's, this context information must be saved
and restored when a context switch occurs. Again, this context is moved
either to another memory location on the adapter, or is sent back across
the bus to the system for temporary storage in the kernel.
The operating system kernel must address each of the hardware design
considerations expressed above. In particular, the kernel on SGI Irix
and IBM RS/6000 AIX systems supports the following functions:
- Grant and Retract
- A user application is granted direct access to the drawing subsystem
for the very first time by registering itself with the kernel. The
kernel returns addresses to the drawing subsystem hardware.
- Graphics Faults
- Access control to the graphics hardware is governed by a mechanism
similar in many ways to the page-fault mechanism. Let us review
page-faulting: when the CPU attempts to touch a page which is not in
real memory (is in the swap space, for instance), the CPU receives an
interrupt. The interrupt handler puts the process to sleep, and issues
a read request to the disk. When the disk has found the requested
page, that page is loaded into real memory, the virtual page tables are
updated, and the process is marked "ready-to-run". When a time slice is
available, the kernel will schedule the process and allow it to run
A graphics fault proceeds in a similar manner: as long as there are no
other graphics processes that want to access the hardware, the current
process can bang away at it. Periodically, however (typically, every 4
milliseconds), the graphics time-slice expires. The kernel looks to see
if here are any other graphics processes that want to run. If so, then
it retracts write permission to the graphics hardware from the first
process, performs the graphics context switch, and then grants address
access to the second process. At this point, if the first process
attempts to touch the graphics i/o space, an interrupt will be
generated. The first process will be put to sleep. The kernel will then
schedule another process to run (not necessarily another graphics
process). Graphics time-slice scheduling and regular process scheduling
typically run independently of each other.
- The kernel must provide interfaces to allow a special process
(typically, the X Server) to update the position of the cursor.
- WID Management
- Most high-end graphics hardware has window-id (WID) planes. These planes
control not only which hardware color palette is used for pixel color
lookup, but also typically provide hardware clipping so that a process
cannot draw outside of its window and corrupt the screen.
The kernel must provide interfaces to manage these clipping planes,
and/or take over management itself. In particular, if a window is moved
(e.g. the user picks it up with the mouse and moves it), the WID
planes must be updated to reflect the new window position. Window ID
updates are by definition a privileged operation: user processes
must not be allowed to twiddle with them, as this would allow them to
corrupt window contents accidentally or intentionally. If the
corruption is accidental, then it is merely ugly: the user sees crap
drawn all over the screen, where it shouldn't be. A malicious example
might be a rogue program running on a CIA/NSA machine attempting to
read confidential information from another window.
- Context Management
- If the graphics hardware has hardware contexts or hardware FIFOs,
then the kernel must shuffle this data around during a context switch.
If the adapter does not have a lot of memory on it, then this data must
be copied back across the bus, and stored in some temporary location
within the kernel. This memory must, of course,be cleaned up if the
graphics process exits.
- Double Buffering
- All high-end graphics hardware supports hardware double buffering.
Some supports hardware quad-buffering (for double-buffered stereo
viewing). Buffer swaps need to be synchronized with vertical retrace
interrupts, so that image tearing does not occur. The kernel is often
involved with synchronizing the swap with the retrace interrupt.
Furthermore, the kernel must count the number of pending buffer swaps
for a graphics process, and put it to sleep if there are two. A graphics
program is still typically allowed to write to a FIFO or buffer while
there is one pending, outstanding swap request. But any more than that,
and things get ugly. For example, we once allowed a program to issue
600 buffer swaps without putting it to sleep. It then proceeded to
buffer swap 60 times a second for the next ten seconds, while everybody
wondered why it couldn't be control-C'd, and otherwise acted
unexpectedly! Never mind that what it was drawing was 10 seconds out of
date with respect to the current position of the mouse!
Many of the above principles are discussed in greater detail in the
following classical references. If my memory serves me correctly,
the papers by Voorhies and by Rhoden are particularly descriptive of the
issues and possible solutions. Yes, these would appear to be very old,
but, if anything, they illustrate how Unix and Unix workstations have
at times enjoyed a ten year lead in technology over PC's and PC
- Akeley, Kurt and Tom Jermoluk, "High Performance Polygon Rendering",
Conference Proceedings, SIGGRAPH, 1988, vol 22 no. 4, pp 239-246.
- Doyle, Brian, "All About Multi-Processing for Unix Workstations",
Conference Proceedings NCGA '1990, pp228-253.
(National Computer Graphics Association).
- Haletky, Edward H. and Linas Vepstas, "Integration of GL with the X
Window System", Conference Proceedings, Xhibition 1991, pp.105-113
- Norrod, Forest and Larry Thayer, "An Advanced VLSI Chip Set for
Very High Speed Graphics Rendering", Conference Proceedings, NCGA 1991,
- Rhoden, Desi and Chris Wilcox. "Hardware Acceleration for Window
Systems", Conference Proceedings SIGGRAPH 1989 vol 23 no. 3 pp
- Stewart, Don. "VLSI: Key to Four Basic Strategies for Improving
Workstation Graphics", Conference Proceedings, NCGA 1990 pp 302-308.
- Vepstas, Linas. "Porting OpenGL to New Hardware Platforms", Course
Notes, OpenGL, SIGGRAPH 1992.
- Voorhies, Douglas, David Kirk and Olin Lathrop, "Virtual Graphics",
Conference Proceedings, SIGGRAPH 1988, vol 22 no. 4, pp 247-253.
Last updated 18 February 1996 by Linas Vepstas.
Linas can be reached at
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1;
with no Invariant Sections, with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included at the URL
the web page titled
"GNU Free Documentation License".
Linas Web Page