Scheme Programming/Input and Output

Files
A file is essentially nothing but a string that is stored on your computer's hard drive (or a USB stick, SD card, or other storage device) with a name. There is one other type of object on a hard drive that has a name, and that is a directory. "Directory" is the word that programmers have always used for what later became known as "folders." They are lists of filenames.

Ports
When you want to work with the contents of a file in Scheme, you use functions such as read-char and write-char, which retrieve or add characters to a file one byte at a time. Or, you can use library functions such as read-line, or read and write, all of which read or write more than one character at a time and parse the data in different ways.

Before you can read from or write to a file, you must open it. That means getting a file descriptor from the operating system, which is a value that keeps track of your place in the file― which character in the file will be the next one you read, or where in the file the next character you write will end up. On Windows, the file descriptor also gives you exclusive write access to the file― Other programs can't write to the same file you're writing to or delete it. Unix (Linux and Mac OS) provides no such guarantee, however.

Ports are Scheme's file-descriptor values. They are passed to input/output procedures to tell the I/O functions which file to read from or write to. Each port represents one file and one direction. That is, a port can either by an input-port, or an output-port, but never both.

The keyboard and screen or terminal are also files. Typically, they're the same file: /dev/tty on Unix, or CON for a Windows console program. GUI programs such as your web browser typically have no console or TTY (which stands for teletype) associated with them, however.

At the Scheme REPL, the file representing the keyboard and teletype is open by default. It has three ports: (current-input-port) for the keyboard, <tt>(current-output-port)</tt> for the teletype, and <tt>(current-error-port)</tt>, also for the teletype, for error messages. It is customary to send error messages to <tt>(current-error-port)</tt> rather than <tt>(current-output-port)</tt>. That's because it is possible to redirect these ports. The current-output-port can be redirected to a file, for example, while error messages still get printed on the screen.

Displaying human-readable values to a port
The display function does not print a newline after its output. It strips quotation marks from strings, but otherwise displays Scheme values in the same format in which they're found in Scheme source.

To print a newline, use the newline function:

The port argument is actually optional. If you don't include it,,  , and other input and output (I/O) functions will assume you mean   for output or   for input.

Opening and Closing a File
You open a file with either <tt>open-input-file</tt> to read, or <tt>open-output-file</tt> to write to a file. The value returned by those functions is a port, which needs to be bound to something.

It is important to close a port when you're done with it. There are a limited number of files that you can have open at the same time. This limit is imposed by the operating system, not by Scheme.

Writing to a File
Opening a file for output erases its contents.

When you write a character to a file, some Scheme implementations will not actually write them, but will instead store them in an internal buffer until either enough bytes have been received, or until a newline is written. Any buffered characters remaining will be written out when you close the file.

The operating system will automatically close all your files when Scheme exits.

On some Scheme implementations, <tt>open-output-file</tt> will raise an error if the file already exists. For example, on Racket:

Reading and Writing Scheme values
Scheme provides the <tt>read</tt> function, which reads and parses a Scheme value from a port (or <tt>(current-input-port)</tt> if no port is specified), and a corresponding <tt>write</tt> function. This is the easiest way to get data into and out of Scheme, and as a result, Scheme programmers prefer to store their data as Scheme code whenever it's feasible. For example, suppose you have the following file:

just-some-raw-data.scm ((0.00036277727 0.00024514514 0.00010899892 -0.00017201288 5.1782848e-05) (0.000252906 0.00015007147 -0.00023179696 -0.00037388649 8.3796775e-05) (-0.00037429505 -0.00020174753 0.00043324157 0.00015203918 0.0003337927) (0.0001250037 5.5220273e-05 -0.00049933029 -0.00010911703 -0.00019316927) (0.00018089121 4.254036e-05 0.00018602787 -2.7271702e-05 -0.00024643468))

You could write a program to read the file, do something to all the numbers in it, and write the result back to the same file:

manipulate-raw-data.scm

Then, in the REPL:

The new contents of the file would be:

just-some-raw-data.scm ((36.27772699999999 24.514513999999998 10.899892 -17.201288 5.1782847999999974) (25.290600000000003 15.007147 -23.179696000000005 -37.388649 8.3796775) (-37.429505000000005 -20.174753 43.324157 15.203918 33.379269999999996) (12.500370000000003 5.5220273000000004 -49.933029 -10.911703 -19.316927) (18.089121000000002 4.254036 18.602787 -2.7271701999999997 -24.643468000000003))

Note that Scheme does not format the output in a way that looks nice or is easy for a human to read. But if you load this file with the same program, it will have no trouble reading the values and changing them again.

Reading from the keyboard
The read-line function reads a line of text from a port (or <tt>(current-input-port)</tt> if none is specified). When used at the REPL, the reading usually begins on the same line that the code is being read from:

As you can see, Scheme didn't even give the user a chance to enter the name. However:

This happens because Scheme stops reading as soon as it sees the closing parenthesis. <tt>read-line</tt> sees everything after that. This doesn't affect your program if it's loaded from a file.

Redirecting Ports and Automatically Closing the File
Scheme provides the <tt>with-input-from-file</tt> and <tt>with-output-to-file</tt> functions, which take a function as an argument. They redirect <tt>(current-input-port)</tt> or <tt>(current-output-port)</tt> so that they're open on the specified file, then call the function that you provide, and then close the file when that function exits. You can then call any function in your program, and if they write to <tt>(current-output-port)</tt> or read from <tt>(current-input-port)</tt>, then those functions will read from/write to the file, also.

In some Scheme implementations, the file closes even if an error occurs, which is important because in R5RS Scheme there is no way to trap errors (however, various Scheme implementations provide extensions to allow errors to be trapped, while in some implementations, <tt>with-input-from-file</tt> does not trap errors). It's also nice not to need to define port variables.

The above file-manipulating program could have been written with <tt>with-input-from-file</tt> and <tt>with-output-to-file</tt>. The program would then look like this:

manipulate-raw-data.scm

Reading a String As If It Was a File
Some Scheme implementations provide <tt>with-input-from-string</tt>, which redirects <tt>(current-input-port)</tt> just like <tt>with-input-from-file</tt>. SCM, however, only provides <tt>call-with-input-string</tt>, which is like <tt>with-input-from-string</tt> except the procedure you provide must accept the port as an argument.

It is also possible to write to a string as if it was a port. <tt>call-with-output-string</tt> is used for this. The string is created from scratch. This is one way you can convert any value to a string:

Handling raw, binary data
Raw, binary data is not 100% portable between different Scheme implementations. Some implementations provide SRFI-56, which provides <tt>read-byte</tt>, <tt>write-byte</tt>, <tt>peek-byte</tt>, and <tt>byte-ready?</tt>. If your Scheme implementation doesn't provide them, and its characters are in a single-byte encoding like ASCII and it does not use Unicode, you can define them yourself:

Then, bytes can be combined with OR (<tt>bitwise-ior</tt>), AND (<tt>bitwise-and</tt>) and bit shifting (<tt>arithmetic-shift</tt> or <tt>ash</tt>). Here is a function to convert a list of bytes to an integer, assuming the "big endian" encoding, which is commonly used in network packets:

You can use it to read an arbitrarily-sized integer from a port:

To read little-endian, which is the format used natively by Intel CPUs, just don't reverse the result. It might be convenient to have a function that can read both. Then you could define the both reading functions in terms of it:

Scheme provides the <tt>identity</tt> function, which simply returns its arguments, specifically for cases like the above, where we used it because we didn't want to reverse or do anything else to the result when reading little endian.

Reading Strings from Binary Files
In binary files, strings are stored either as a binary length preceded by the data, or as a null-terminated string. In the case of a binary length, the length itself can have different lengths, and it can be in either little or big endian byte order. The function below requires arguments that take all of that into account:

Finally, it might be convenient to have a function that can read whole structures from a file. You could specify the format of a structure as a list that includes the sizes of integers to be read, and also specifies when to expect a string. For example, you could call <tt>(read-binary '(big-endian 2 4 (counted 1)))</tt> to read a 16-bit big-endian integer, followed by a 32-bit one, followed by a counted string whose length is represented by an 8-bit integer.

The reader functions above assume that all integers are unsigned, meaning there's no way to represent a negative number. But some of the integers you may find in a binary file are meant to be interpreted as "signed". Suppose you have a signed byte that you read as an unsigned byte. An unsigned byte, being 8 bits, can have a value ranging from 0 to 255. Anything bigger than that and you need more than 8 bits to store it. A signed byte can represent values from 0 to 127 in exactly the same format as an unsigned byte, but the value that is interpreted as 128 in an unsigned byte is -128 in a signed byte. Unsigned 129 maps to -127, and so on, until you get to unsigned 255, which maps to -1.

The following function converts an unsigned integer of any size (in bytes) into a signed integer: