Standard ML Programming/Types

Standard ML has a particularly strong static type system. Unlike many languages, it does not have subtypes ("is-a" relationships) or implicit casts between types.

Static typing
Internally, all data in a computer is made up of bits (binary digits, either 0 or 1), which are grouped into bytes (minimally addressable groups of bits — eight bits in all modern machines), and usually in turn into words (groups of bytes that the CPU can treat as a single unit — nowadays usually either four bytes, i.e. 32 bits, or eight bytes, i.e. 64 bits).

However, it is fairly unusual that a program is intended to operate on bits, bytes, and words per se. Usually a program needs to operate on human-intelligible, abstract data-types: integers, say, or real numbers, or strings of characters — or all of these. Programs operate on bits and bytes mainly because these are how computers implement, or represent, these data-types. There are a variety of ways that programming languages address this discrepancy:


 * A language can provide no data-type abstractions, and require programmer code to deal explicitly in bits and bytes. It may provide operations intended for certain abstract data-types (such as addition of 32-bit integers), but leave it entirely up to program code to ensure that it only uses those operations on bytes representing those types. This approach is characteristic of assembly languages and, to some extent, C.
 * A language can provide exactly one data-type abstraction, used by all programmer code. This approach is characteristic of shell-scripting languages, which frequently operate almost exclusively on strings of characters.
 * A language can assign a type to each fragment of data (each value), and store that type assignment together with the value itself. When an operation is attempted on a value of an inappropriate data-type, the language either automatically converts it to the appropriate type (e.g., promoting an integer value to an equivalent value of a real-number type), or emits an error. This approach, where type information exists only at run-time, is called dynamic typing, and is characteristic of languages like Lisp, Python, Ruby and others.
 * A language can assign a type to each fragment of code (each expression). If a bit of code applies an operation to an expression of an inappropriate data-type, the compiler either infers additional code to perform the type conversion, or emits an error. This approach, where type information exists only at compile-time, is called static typing, and is characteristic of languages like Standard ML, OCaml, Haskell and others.

Most languages do not adhere strictly to one of the above approaches, but rather, use elements of more than one of them. The type system of Standard ML, however, uses static typing almost exclusively. This means that an ill-typed program will not even compile. (ML programmers consider this a good thing, as it allows many programming errors to be caught at compile-time that a dynamically-typed language would catch only at run-time.) To the extent that Standard ML does support dynamic typing, it is within the static-typing framework.

Strong typing
The term strong typing is used in a wide variety of different ways (see the Wikipedia article “Strongly typed programming language” for a fairly thorough listing); nonetheless, it is fair to say that the Standard ML type system provides strong typing by almost all definitions. Every expression in an SML program has a specific type at compile-time, and its type at run-time is never in contravention of this. All type conversions are explicit (using functions such as, which accepts an integer and returns an equivalent real number), and take the form of meaningful translations rather than mere re-interpretations of raw bits.

Basic types
There are a number of basic types that may be thought of as "built in", firstly in that they're predefined by the Standard ML Basis library (so that Standard ML programs do not need to define them), and secondly in that the language provides literal notations for them, such as  (which is an integer), or   (which is a string of characters). Some of the most commonly used are:


 * (integer), such as  or  . (Note that a tilde ~ is used for negative numbers.)
 * (floating-point number), such as  or.
 * Standard ML does not implicitly promote integers to floating-point numbers; therefore, an expression such as  is invalid. It must be written either as , or as   (using the   function to convert   to  ).
 * (string of characters), such as  or  . (The latter is the empty string, which contains zero characters.)
 * (one character), such as  or  . (The latter denotes the newline character, ASCII code 10.)
 * (Boolean value), which is either  or.

The following code snippet declares two variables:

val n : int = 66 val x : real = ~23.0

After this snippet,  has type   and the value 66;   has type   and the value -23. Note that, unlike in some languages, these variable bindings are permanent; this  will always have the value 66 (though it's possible that other, unrelated variables, elsewhere in the program, will also have the name , and those variables can have completely different types and values).

Type inference
In above examples, we provided explicit type annotations to inform the compiler of the type of a variable. However, such type annotations are optional, and are rarely necessary. In most cases, the compiler simply infers the correct type. Therefore, the following two code snippets are equivalent:

val s : string = "example" val b : bool = true

val s = "example" val b = true

In examples below, we occasionally provide type annotations as a form of documentation; this has the nice property that the documentation's correctness is enforced, in that compiler will report an error if any type annotations are incorrect. In other cases we may include ordinary comments, of the form ; this is a more flexible form of documentation, in that it can include any sort of text (rather than just type information), but of course its accuracy cannot be enforced by the compiler.

Tuples
Types, including the above basic types, can be combined in a number of ways. One way is in a tuple, which is an ordered set of values; for example, the expression  is of type , and   is of type. There is also a 0-tuple,, whose type is denoted. There are no 1-tuples, however; or rather, there is no distinction between (for example)  and , both having type.

Tuples may be nested, and (unlike in some mathematical formalisms),  is distinct from both   and. The first is of type ; the other two are of types   and , respectively.

The following code snippet declares four variables. On the right side the environment after execution is shown. Notice the use of pattern matching to assign types and values to  and   and the use of projection in the assignment of. This allows for a very convenient notation.

Records
Another way to combine values is in a record. A record is quite like a tuple, except that its components are named rather than ordered; for example,  is of type   (which is the same as type  ).

In fact, in Standard ML, tuples are simply a special case of records; for example, the type  is the same as the type.

Functions
A function accepts a value and normally returns a value. For example, the  function we defined in the introduction:

fun factorial n = if n < 1  then 1  else n * factorial (n - 1)

is of type, meaning that it accepts a value of type   and returns a value of type.

Even if a function doesn't return a value at run-time — for example, if it raises an exception, or if it enters an infinite loop — it has a static return type at compile-time.

As with other types, we can provide explicit type annotations:

fun factorial (n : int) : int = if n < 1 then 1  else n * factorial (n - 1)

should we choose.

Tuples as arguments
Although a Standard ML function must accept exactly one value (rather than taking a list of arguments), tuples and the above-mentioned pattern matching make this no restriction at all. For example, this code snippet:

fun sum (a, b) = a + b fun average pair = sum pair div 2

creates two functions of type. This approach may also be used to create infix operators. This code snippet:

infix averaged_with fun a averaged_with b = average (a, b) val five = 3 averaged_with 7

establishes  into an infix operator, then creates it as a function of type.

And since a tuple is an ordinary type, a function can also return one. In this code snippet:

fun pair (n : int) = (n, n)

is of type.

Polymorphic data type
In this code snippet:

fun pair x = (x, x)

the compiler has no way to infer a specific type for ; it could be ,  , or even. Fortunately, it doesn't need to; it can simply assign it the polymorphic type, where   (pronounced "alpha") is a type variable, denoting any possible type. After the above definition,  and   are both well-defined, producing   and   (respectively). A function can even depend on multiple type variables; in this snippet:

fun swap (x, y) = (y, x)

is of type. All or part of this can be indicated explicitly:

fun swap (x : 'a, y : 'b) : 'b * 'a = (y, x)

Functions as arguments, and curried functions
A function can accept another function as an argument. For example, consider this code snippet:

fun pair_map (f, (x, y)) = (f x, f y)

This creates a function  of type , which applies its first argument (a function) to each element of its second argument (a pair), and returns the pair of results.

Conversely, a function can return a function. Above, we saw one way to create a two-argument function: accept a 2-tuple. Another approach, called currying, is to accept just the first argument, then return a function that accepts the second:

fun sum i j = i + j val add_three = sum 3 val five = add_three 2 val ten = sum 5 5

This creates a function  of type   (meaning  ), a function   of type   that returns three plus its argument, and integers   and.

Type declarations
The  keyword may be used to create synonyms for existing data-types. For example, this code snippet:

type int_pair = int * int

creates the synonym  for the data-type. After that synonym has been created, a declaration like this one:

fun swap_int_pair ((i,j) : int_pair) = (j,i)

is exactly equivalent to one like this:

fun swap_int_pair (i : int, j : int) = (j,i)

As we shall see, this is mainly useful in modular programming, when creating a structure to match a given signature.

Datatype declarations
The  keyword may be used to declare new data-types. For example, this code snippet:

datatype int_or_string = INT of int | STRING of string | NEITHER

creates an entirely new data-type, along with new constructors (a sort of special function or value)  ,  , and  ; each value of this type is either an   with an integer, or a   with a string, or a. We can then write:

val i = INT 3 val s = STRING "qq" val n = NEITHER val INT j = i

where the last declaration uses the pattern-matching facility to bind  to the integer 3.

Conceptually, these types resemble the enumerations or unions of a language such as C++, but they are completely type-safe, in that the compiler will distinguish the  type from every other type, and in that a value's constructor will be available at run-time to distinguish between the type's different variants (the different arms/branches/alternatives).

These data-types can be recursive:

datatype int_list = EMPTY | INT_LIST of int * int_list

creates a new type, where each value of this type is either   (the empty list), or the concatenation of an integer with another.

These data-types, like functions, can be polymorphic:

datatype 'a pair = PAIR of 'a * 'a

creates a new family of types, such as  ,  , and so on.

Lists
One complex data-type provided by the Basis is the. This is a recursive, polymorphic data-type, defined equivalently to this:

datatype 'a list = nil | :: of 'a * 'a list

where  is an infix operator. So, for example,  is a list of three integers. Lists being one of the most common data-types in ML programs, the language also offers the special notation  for generating them.

The Basis also provides a number of functions for working with lists. One of these is, which has type  , and which computes the length of a list. It may be defined like this:

fun length nil = 0 |  length (_::t) = 1 + length t

Another is, of type  , which computes the reverse of a list — for example, it maps   to   — and may be defined like this:

local fun rev_helper (nil, ret) = ret |  rev_helper (h::t, ret) = rev_helper (t, h::ret) in fun rev L = rev_helper (L, nil) end

Exception declarations
The built-in type  (exception) resembles the types created by   declarations: it has variants, each with its own constructor. However, unlike with those types, new variants, with new constructors, can be added to the type, using  declarations. This code snippet:

exception StringException of string val e = StringException "example" val StringException s = e

creates a constructor  of type , a variable   of type  , and a variable   of type   (its value being  ).

The  type is unique in this regard; a type created within a program cannot be "added to" in this way.

Equality types
Above, the concept of polymorphic types was discussed, and we have seen examples such as  and. In these examples, the type applies to all possible types  and. There also exists a slightly more restrictive type of polymorphism, which is restricted to equality types, denoted,  , and so on. This type of polymorphism is generated by the polymorphic equality operator,, which determines if two values are equal, and which has the type. This means that both of its operands must be of the same type, and this type must be an equality type.

Of the "basic types" discussed above —,  ,  ,  , and   — all are equality types except for. This means that,  ,  , and   are all valid expressions evaluating to  ; that  ,  ,  , and   are all valid expressions evaluating to  ; and that   and   are invalid expressions that the compiler will reject. The reason for this is that IEEE floating point equality breaks some of the requirements for equality in ML. In particular,  is not equal to itself, so the relation is not reflexive.

Tuple and record types are equality types if and only if each component type is an equality type; for example,,  , and   are equality types, whereas   and   are not.

Function types are never equality types, since in the general case it is impossible to determine whether two functions are equivalent.

A type created by a  declaration is an equality type if every variant is either a null constructor (one without an argument) or a constructor with an equality-type argument, and (in the case of polymorphic types) every type argument is an equality type. For example, this code snippet:

datatype suit = HEARTS | CLUBS | DIAMONDS | SPADES datatype int_pair = INT_PAIR of int * int datatype real_pair = REAL_PAIR of real * real datatype 'a option = NONE | SOME of 'a

creates the equality types  (null constructors only),   (only one constructor, and its argument is the equality type  ), and   (one null constructor, and one constructor with the equality-type argument  ), not to mention   and   and so on. It also creates the non-equality types  (a constructor with argument of non-equality type  ), ,  , and so on.

A recursive type is an equality type if possible, and not otherwise. For example, consider the above-mentioned polymorphic type :

datatype 'a list = nil | :: of 'a * 'a list

Certainly  is not an equality type, both because its type argument is not an equality type, and because   (the type of  's argument) cannot be an equality type. However, there is no reason that  cannot be an equality type, and so it is one. Note that this equality typing is only within a type; something like  would be invalid, because the two expressions are of different types, even though they have the same value. However,  and   are both valid (and both evaluate to  ).

The mutable type  is an equality type even if its component type is not. This is because two references are said to be equal if they identify the same ref cell (i.e., the same pointer, generated by the same call to the  constructor). Therefore, for example,  and   are both valid — and both evaluate to , because even though both references happen to point to identical values, the references themselves are separate, and each one can be mutated independently of the other.

However, a code snippet such as this one:

datatype 'a myref = myref of 'a ref

does not generate an equality type, because its type argument is not an equality type — even though its sole constructor accepts an argument of an equality-type. As noted above, polymorphic types created with  declarations can only be equality types if their type arguments are, and though the built-in references are exempt from this restriction, they cannot be used to circumvent it. If it is desired that  types always be equality types, one must use this approach:

datatype ' 'a myref = myref of ' 'a ref

which forbids  entirely (since the equality-type variable   cannot represent the non-equality type  ).