Super NES Programming/Super FX tutorial

=Introduction= The SuperFX is a custom 16-bit RISC processor with a special bitmap emulation function designed for the Super NES. It was designed to bring rudimentary 3D capabilities to the SNES. Programming for it is done with special Super FX assembly language. Each SuperFX title uses a combination of standard SNES assembly code with specially compiled SuperFX assembly routines coded as binary data in the cartridge. It can run in parallel with the SNES under certain conditions. Each SuperFX cartridge has on-board RAM which the SuperFX chip uses as a frame buffer and for general purpose operations which it can share with the Super NES.

Existing titles
The SuperFX chip was used in 8 released SNES games, in Star Fox 2 (unreleased) and in multiple tech demos; 2 of which binaries are available.

Theory of Operation
The SuperFX is a co-processor for the SNES CPU. The SuperFX's task is to execute complex mathematical calculations much faster than the SNES and to generate bitmap pictures for simple 3D rendering of SuperFX games. The SuperFX and SNES processors share access to a common Work RAM and Game pak ROM bus. Only one of the SuperFX or SNES CPU may access the game pak ROM and RAM at any time, controlled by special registers. The flow of the SNES and SuperFX accessing the data busses is an art in optimizing the program's efficiency.

The RAM inside the SuperFX cart is different from battery backup RAM - it can be used for storing results of calculations, for storing a SuperFX program, for storing bulk data or for storing a PLOT picture the SuperFX is generating. There is 256 Kib (32KiB) or 512 Kib (64 KiB) of RAM.

The SuperFX can process instructions in 3 ways: reading them from game pak RAM, from the gamepak ROM (reading straight out of the ROM chip), or via a special 512 byte instruction cache.

It is possible for the SuperFX to run in parallel with the SNES CPU when using the 512 byte instruction Cache. It involves loading a program in, and then setting the SuperFX to start its work. The 512 byte cache is in general 3x faster compared to running the program in the game pak RAM or ROM. The SuperFX can interrupt the SNES CPU after it finishes processing.

When using the special bitmap functions of the SuperFX it's possible to quickly load the bitmap out of the gamepak into the SNES Video RAM and display it on the screen. The SNES by default is a tile and sprite based console - pixel based scene construction used in 3D rendered games is very inefficient with stock SNES hardware. In SuperFX games such as DOOM, Starfox/Starwing and the like, the SuperFX is rapidly painting pixel based scene bitmaps onto the game pak RAM and then throwing it into the SNES VRAM for graphics display many times per second.

Hardware revisions
There are 3 different hardware revisions of the SuperFX. All revisions are functionally compatible in terms of instruction set but support different ROM sizes.


 * MARIO Chip - which stands for Mathematical Argonaut Rotation Input Output. The first release of the SuperFX chip and was only used with Starfox/Starwing. There are two versions of the chip - one with a direct PCB die bonded/epoxied setup and one with a standard chip carrier package.
 * GSU-1 - the release used on most SuperFX games in a standard chip carrier package. Functionally identical to MARIO Chip. Supports maximum 8Mbit (1 Megabyte) ROM size.
 * GSU-2 - used on the final 3 SuperFX games, supports the full 16 Megabit (2 Megabyte) ROM size.

=Registers=

The SuperFX registers are mapped from. Some are 16-bit; some are 8-bit. The explanation of each register is shown in this section. is the Instruction Cache.

Overview
The Super FX chip has 16 general-purpose 16-bit registers labeled  to   plus 11 control registers. Additionally, a memory space from  forms the instruction cache

Control Registers

Instruction Cache

SFR Status Flag Register
The  is a very important register. It controls branching within the SuperFX after evaluating a calculation and can determine the status of the SuperFX when accessed from the SNES CPU.

BRAMBR Backup RAM Register
Used to allow protection of the SRAM inside the Game Pak. This should be set to 0(write disable) normally, and 1(write enable) when saving the game.

PBR Program Bank Register
When the SuperFX is loading code it references the  register to specify the bank being used. The  instruction is the general method used to change this register.

ROMBR Game Pak ROM Bank Register
When using the ROM buffering system, this register specifies the bank of the game pak ROM being copied into the buffer. The  instruction is the general method used to change this register.

CFGR Config Register
Controls the clock multiplier and interrupt mask.

Note: If set to run at 21.477 MHz through the  flag(1),   flag should be set to 0.

SCBR Screen Base Register
This register sets the starting address of the graphics storage area. It is written to directly, rather than through a specific instruction.

CLSR Clock Register
Controls the clock frequency of the Super FX chip.

SCMR Screen Mode Register
This register sets the number of colors and screen height for the  graphics acceleration routine and additionally controls whether the SuperFX or SNES has control of the game pak ROM and work RAM.

Screen Height Truth Table

Color Mode Truth Table

VCR Version Register
Can read out the version of the SuperFX chip in use with this register

RAMBR Game Pak RAM Bank Register
When writing between the game work RAM and the Super FX registers, this register specifies the bank of the game pak RAM being used. The  instruction is the general method used to change this register. Only one bit is used to set the RAM bank to  or

CBR Cache Base Register
This register specifies the address of either the game pak ROM or work RAM where data will be loaded from into the cache. Both the  and   instructions are accepted ways to change this register

=Memory Map=

From Super NES CPU point of view
Super FX Interface: Mapped to, in banks   and

Game ROM: Mapped to 2MiB from. Mirror mapped from bank, stored in 32KiB blocks.

Game Work RAM: Mapped to 128KiB starting from bank. 8KiB mapped from  in each of bank. RAM mirror is in banks.

Game Save RAM: Mapped to 128KiB from bank

SNES CPU ROM: 6MiB ROM is mapped from bank

From Super FX point of view
Game ROM: Mapped to 2MiBfrom. 2MiB mirror mapped from bank  onwards, stored in 32KiB blocks. Other memory locations viewable from the SNES should not be addressed.

Game Work RAM: Mapped to 128KiB starting from Bank.

Note: The Super FX accesses memory through three bank control registers: Program Bank Register, ROM Bank Register and RAM Bank Register

=Instruction Set=

The SuperFX instruction set is different from the Super Nintendo's native instruction set. It allows faster, more sophisticated 16-bit mathematical functions and includes some specific graphics manipulation functions.

Some instructions can be assembled as a single byte. This is where both the instruction(nibble) and argument(nibble) are co-joined into the same storage byte. This allows for faster execution and also greater instruction density. These are important objectives when designing a co-processor. One such instruction is, which starts as   and takes an argument of one of the 16 general purpose SuperFX registers.

Quite a few instructions require an  instruction to be executed before the opcode. This modifies the behavior of the same opcode to perform a slightly different operation. There are 3 possible  codes - ,  , and  +. In the table below, the specific  code is listed for each instruction.

Most instructions rely on pre-defined pointers for the locations of calculation variables. These are the,   and   instructions. The  and   commands specify the general purpose register that is the variable, and the calculation result respectively. defines both of the variable/result in the same command. The variable and result are known as the source and destination registers respectfully.

Sreg and Dreg
For certain instructions, the Sreg and Dreg must be specified before the instruction is run. The Sreg is the "Source Register" and the Dreg is the "Destination Register" - each specified as one of the 16 general purpose registers. Use of the,  , and   instructions specifies the Sreg and Dreg.

=Bitmap Emulation= The Bitmap Emulation function is one of the major acceleration functions of the SuperFX. It allows a pixel based shading approach within frame buffer as opposed to a tile based approach in the SNES VRAM. For 3D rendering operations, a fast pixel by pixel shader is necessary. The SuperFX provides the framework to plot individual pixels to the frame buffer fast, and then transfer the plotted picture to the SNES VRAM.

=Fast Multiply= The SuperFX has 4 multiplication instructions.
 * - Signed 8 bit x Signed 8 bit, with Signed 16 bit result in Dreg.
 * - Unsigned 8 bit x Unsigned 8 bit, with Unsigned 16 bit result in Dreg.
 * - Signed 16 bit x Signed 16 bit, with Signed 32 bit result - MSB in Dreg, LSB in R4
 * - Signed 16 bit x Signed 16 bit, with Signed 32 bit result.

The /  instructions are faster than the  /  instructions.

=Compiling SuperFX routines=

Whilst SNES assembly language programs can be compiled using a regular 65c816 compiler, the SuperFX assembly language requires a custom compiler. The original compiler used on existing SuperFX games has not been released outside the closed development community.

An open source compiler called sfxasm is available for compiling SuperFX programs.

https://sourceforge.net/projects/sfxasm/

Once compiled, SuperFX programs are included in the SNES assembly language program as a binary library. The SNES program then directs the SuperFX to use the precompiled program packed into the ROM.

=Using the SuperFX in a SNES program= When the SNES boots up with a SuperFX game, the SuperFX chip is idle and you don't need to do anything to start the normal SNES routine of loading the ROM and executing code. When the SNES has booted, performed some startup routines and generally is ready, then the SuperFX can be activated in your program. Note, for emulators to support SFX instructions, the  byte in the header must be ,  ,  , or $. The  byte should be.

Initializing
The SuperFX chip should be initialized before running code. This includes setting the basic config registers.



Choosing the execution mode
As mentioned before, code can be loaded into the Super FX in 3 different ways - from ROM, game pak RAM and also the 512 byte cache. Depending which way you want to go, there is a slightly different procedure.


 * The advantage of the ROM mode is simplicity at the cost of stopping the SNES CPU while SuperFX is processing.
 * The advantage of the RAM mode is to be able to run a large SuperFX program whilst the SNES CPU is already busy, but at the cost of having to write the program into Game PAK RAM before running.
 * The advantage of the Cache mode is to run a small program 3 times faster than ROM or RAM modes and additionally while the SNES is busy with both the ROM and game pak RAM, but at the cost of loading the program into cache memory before the execution process.

Setup - ROM Mode
1. Setup the Program Bank Register(PBR) for where the SFX program starts. 2. Program the program counter in the SuperFX. 2. Give the SuperFX exclusive access to the ROM by setting the  flag in the SFR register.

Setup - RAM Mode
1. Transfer the program from ROM into game pak RAM using copy routines. 2. Setup the Program Base Register for where the SFX program starts. 3. Write to the SuperFX program counter.

Setup - Cache Mode
1. Transfer the program from ROM into Cache RAM onwards using copy routines. The programs need to be in blocks of 16 bytes each otherwise the SuperFX will not execute the instructions surplus to a 16 byte segment. This also applies for tiny programs under 16 bytes - to get around this, write something into the 16th byte 2. Write to the SuperFX program counter, this is usually 0. 3. The SuperFX program will execute independently of the SNES until it hits a  instruction. When it finished, depending if the  config interrupt is set, it will generate an interrupt(  instruction) on the SNES. If the interrupt is masked then the SuperFX will go to idle mode and wait for the next command from the SNES to start execution.

Starting processing
Processing starts when the SuperFX notices that the SNES has written to its program counter register.

Stopping processing
The SuperFX can be stopped in one of two ways - by executing a  instruction in the SuperFX's program, or from the SNES by writing a "0" to the   flag in the SuperFX's   register.

Interrupt on stop
The SuperFX calls an  instruction when it reads a SuperFX   instruction. It is possible to mask the interrupt by setting the  bit in the SFR register. If interrupt is not masked, to figure out if it is a screen blanking interrupt or the Super FX, check the  flag bit in the   register.