Emulation

Emulation of hardware is an important thing in order to achieve a short "time to market". A linear design process where the next stage begins with the end of the current stage is often not a good choice. This is valid especially in the computer scene where technologies change rapidly.

As an example, the development process of a completely new hardware begins at point t+0. At point t+1 the hardware is working well and driver development can start. When the driver is ready at point t+2, higher level driver/hardware adoption can start (such as an intarface to MPI in case of communication hardware). Finally, at point t+3 applications can benefit from the new features. However, typically the hardware developed between t+0 and t+1 isn't anymore up to date.

In general, the hardware development process is the one taking most of the time. Therefore it would be nice if working on the software stuff (driver etc.) can be started while the hardware development is still in progress.
This requires some software emulation of the hardware. Building up the emulation software can start after the hardware is roughly specified. All further software stuff can be done based on the emulation. When the hardware is ready, there are in worst case some minor changes inside the driver necessary and the hardware can be used immediately.

Hardware emulation has not only advances in view of the availability of a product. Another important fact is that information taken during the emulation phase can flow back into hardware development. Here the emulation can be used as high-level simulation under real operating conditions. This helps to dedect possible conceptual mistakes before the hardware is finally done. In the other case, high level simulation of hardware (for example via VHDL models) is very complicated and time intensive. Such simulations are more suitable for lower level stuff and not for conceptual checks.

The picture below describes principle data/control flow for our SCI/VIA project in emulation (green) and real case (red).
Accesses to hardware will be intercepted by the CPU paging machanism. From there the emulation software takes control and performs operations normally done by hardware.
This is true for ALL hardware accesses whether they are to imported SCI shared memory (transparent mode) or to VIA related doorbell pages.

At the moment the driver and the emulator are able to provide (software) distributed shared memory. Support for VIA functionality is under development.
In the following we want to give some details about how the emulation has been realized.

To really benefit from the approach of overlapping the development stages of hardware and software requires that the emulator can be easily replaced by the real hardware. Ideally, simply by inserting a different kernel module. To achieve this goal the driver has been split into a device dependent low level layer and a device independent high level layer which communicate through a well defined interface. The following two pictures show a comparison between the final system structure and the emulation phase.

We decided to base the emulation on traditional TCP/IP communication so that it can be run on any PC, without special hardware. The emulator consists of a kernel level part and a user level part. The kernel part intercepts the accesses to imported memory, provides the address translation tables, checks protection, manages the stream buffers and creates SCI packets. It also contains a special scilnk device which represents the interface to the user level part - the scilnk process. The next figure shows a block diagram of the emulator.

The memory access interception is the most complicated part. Besides the page fault mechanism it exploits the CPU's debugging facilities as well.
Imported memory areas are realized internally by virtual memory areas with a special nopage--handler living in the emulator. Since all page table entries belonging to these areas are invalidated by the driver on creation every access to this area results in a page fault. The Linux memory management system see to that the vma-specific nopage--handler is invoked. That routine determines which imported area the referenced address lies within, allocates a physical memory page, and turns on the trap flag in the page fault handler's stack frame. In the case of a memory read operation the data is fetched from the remote node and written to the appropriate place in the allocated page. Then the physical page is mapped to the virtual page and the "faulty" memory operation is restarted. Now it succeeds but due to the activated trap flag the CPU raises a debug trap immediately after the instruction has completed. In the case of a write access we've now got the data the CPU has written to the imported memory area. It is passed to the write stream buffer section where it is eventually merged with other write operations, packed into an SCI packet using the information from the down stream address translation table and sent to the remote node via the scilnk device and the scilnk process.
The picture below illustrates the interplay of the nopage routine and the debug handler.

Last Updated: May 27th 1999
By Friedrich Seifert