Super NES Programming/DMA tutorial

=What is DMA, and why do we need it= One of the main limitations of the Super NES (apart from the slow memory access times) is the main processor. As already stated in the text on the memory mapping page, this was mainly due to the fact that the CPU was supposed to be backward compatible, meaning that in the beginning the SNES was supposed to execute NES ROMs as well as SNES ROMs. For this, the CPU featured a so-called emulation mode.

The processor in the original Nintendo Entertainment System was a Ricoh 2A03 (NTSC) or Ricoh 2A07 (PAL), which was actually a toned-down 6502. It lacked the binary-coded-decimal number support (which, for a time, was used for floating-point numbers until the IEEE754 specification was published) of the 6502, but was otherwise compatible to the processor. The Super Nintendo Entertainment System was running on a Ricoh 5A22, basically an enhanced version of the 65816. This CPU featured the already mentioned emulation mode, which would be able to execute games the same way the NES did. Also, the SNES used the same methods of addressing more memory than the native register size of the CPU (16 bit) suggested by basically using the same methods as with the older NES (bank switching). Note that these methods can be used also with running in emulation mode. Both the 2A03 and the 5A22 had very similar clock speeds:

Note that the PAL versions always tend to be a tiny bit slower, as NTSC renders 60 (interleaved half-)frames per second, while PAL only renders 50.

The reason why this page until now covered more about the CPU specifications than about direct access memory - which it will do very soon - is to illustrate the small CPU advantage the SNES had compared to the NES. The SNES had 64 times more memory than the NES, but only 2.5 times more CPU power.

One problem that can slow down a CPU significantly (even in today's age) is copying information (bytes) from device A to device B. The CPU is considered to be always faster than the memory it stores the current state in, and waiting for the memory controller to fetch/set the designated byte range can waste several clocks in which - if the CPU is not out-of-order superscalar - totally blocks the execution of the current program (that is, the game's ROM). Needless to say that the several Ricohs were never meant to be that particular sort of CPU.

Direct memory access is a method whereby memory is dynamically copied to another location independently of the CPU. In any modern computer, DMA is an essential requirement. Relating to the SNES, DMA can be used to quickly copy tile and palette data to the video RAM (also called VRAM), which would otherwise be copied by the slow CPU. Knowledge of DMA is a requirement for creating larger SNES programs, so this tutorial will cover just that.

DMA is used for copying graphics data, such as 8x8 tiles and tile maps, to the VRAM, and palette data to the CGRAM. These locations can only be accessed by reading or writing to certain hardware registers repeatedly (for more information see the memory mapping and the hardware registers page).

=DMA in detail=

To understand the way DMA works, we will take a short look into how the SNES handles its memory.

The console basically has three buses which are connected with several devices within it. These three buses are:


 * 1) Bus A: a 24-bit-wide address bus, which handles the communication between the CPU, the cartridge (ROM + SRAM), and the main memory (WRAM) - this is the main bus.
 * 2) Bus B: an 8-bit-wide address bus, which may be used via special addresses in the main bus' address space (hardware registers) and which connects the APU (audio) and the PPUs (video) with it.
 * 3) an 8-bit-wide data bus, which is controlled by both address buses for sending the data to various locations. This bus is blocked for the CPU whenever you issue one or more DMA processes.

Note that the data bus is really just one byte wide, while the CPU is supplied with various 16-bit registers. That means that one 16-bit register can never be written or read in one bus' cycle clock, but only in two.

The CPU of the SNES contains a DMA controller which supports 8 DMA channels in total. This means that 8 processes of copying chunks from one device to another may be set up and started simultaneously. Each channel can be specifically configured to behave in a certain manner. The channels execute their program, while being the first channel (index 0) with the highest and the eighth channel (index 7) with the lowest priority.

DMA controllers in common PCs can be configured to perform the task they are assigned to in a specific way, which is also called "mode". Also, PC DMA controllers can be used for various other devices, like for all devices connected to the system bus (ISA, PCI, AGP, PCIe).

The SNES supports only one mode, "burst", which basically means that the CPU will halt completely as long as there is at least one DMA process which is not finished yet. The reason for this is that for the transfer both the CPU and the DMA controller use the system bus to communicate with the other devices, but only one device (the master device) can use the system bus at the same time – either the CPU or the controller (more modern controllers offer the opportunity to transfer tiny chunks of data and then give the control of the system bus back to the CPU, immediately requesting the system bus again, so that the CPU may finish its pending operations and then return control of the bus to the controller – or not, if the CPU has to perform lots of memory accesses). Also, the SNES controller can only communicate with either the APU or the PPUs.

The registers of the controller are exposed through the hardware registers from $4200 to $4400 in the system banks $00 – $3F and $80 – $BF. There is a main register which upon writing into will start all DMA transfers it was set to start. Successive registers will contain information for each channel on what the channel is supposed to do, when activated.

=HDMA=

While the normal DMA controller can copy the data it was programmed to only in "burst" mode, and this up to 64 KB (one entire bank), the SNES features another method to copy chunks to various devices, which is a little bit like PC DMA controllers work. This kind of DMA is called HDMA (Horizontal Direct Memory Access) and is executed only during H-blank (that is the time it takes for the beam laser to reset to the original position after drawing a line). Since this time span is very very small, only 1 to 4 bytes can be transfered during one H-blank.

=DMA registers=

Note: x is the index of the channel (indexed by 0 to 7). So basically, 16 bytes of address space are reserved for each channel.

 NOTE: This tutorial is incomplete and untested.

NOTE: A DMA channel's transfer size, when set to #$0000, is read as a transfer of 65536 bytes, not 0 bytes.

=some code=

Loading Palettes
Here is a macro for loading a palette data into the CGRAM(the place where palettes are stored):

Loading VRAM
Here is a macro for loading a data into the VRAM(the place where tiles and tile maps are stored):

=External Links=
 * Information on the NES's CPU
 * The base processor the 2A03 was built from
 * The (in)famous IEEE754 specification
 * Informations on the SNES's CPU
 * The base processor the 5A22 was built from
 * The Von Neumann bottleneck, explained
 * What out-of-order execution means and why it is a good thing ... sometimes
 * How the CPU tries to execute one instruction for several parameters

=See Also=
 * SNES Memory Informations
 * SNES hardware registers