Fortran/strings

Modern Fortran has a wide range of facilities for handling string or text data but some of these language-defined facilities have not been widely implemented by the compiler developers. It should be remembered that Fortran is designed for scientific computing and is probably not a good choice for writing a new word processor.

Character type
The main feature in Fortran that supports strings is the intrinsic data type. A character literal constant can be delimited by either single or double quotes, and, where necessary, these can be escaped by using two consecutive single or double quotes. The concatenation operator is  (but this cannot be used to concatenate character entities of different KIND). Character scalar variables and arrays are allowed. Character variables have a sub-string notation to refer to and extract sub-strings.

Example In the above example, the two  variables   and   are declared to have length 6 and 2 characters respectively.

In  assignment operations, if the right hand side of the assignment is shorter than the left hand side, the remaining characters on the left hand side are filled with blanks. If the right hand side is longer than the left hand side, then the right hand side is truncated. In neither case is an error raised either by the compiler or at run time.

arrays and coarrays are permitted and can be declared and accessed in the same way as any other Fortran array. Where the array index and substring notations are to be combined, the array indices appear first and the substring expression appears second as illustrated in the final line of the following example:

Unlike some programming languages, Fortran  data and variables do not require an explicit character to terminate a string. Also, unlike C-type languages, Fortran  data do not accommodate embedded and escaped control characters (e.g. /n) and all processing of output control is done via an extensive   sub-system.

Character collating sequence
Internally, Fortran maintains a collating sequence for all the permitted characters. Non-printing characters may be included in the collating sequence. The collating sequence is not specified by the language standard but most vendors support either ASCII or EBCDIC. This collating sequence means that lexical comparisons can be performed to ascertain whether e.g., but the outcome is essentially vendor specific. Hence there is a difference between functions such as  and   that is described below.

Character kind
can also have a, but this is vendor-specific. It can allow compilers to support unicode, or the Russian alphabet or Japanese characters etc. It is not necessary to specify the length or kind of a  variable. If a  variable is declared with neither, the result is a variable of default kind and one character long. A single number is to indicate length, and two numbers indicate length and kind in that order. It is generally much clearer, but slightly more verbose to be explicit, as shown in lines 6-8 of the following example. The compiler vendor has control over which kinds of character are supported and the integer values assigned to access the corresponding character sets.

The intrinsic function  returns the positive integer kind value of the character set with the corresponding name (e.g default, ascii, kanji, iso_10646 etc) but the only character set that must be supported is , and if the name is not supported then -1 will be returned. Disappointingly, vendors generally have been slow to implement more than the default kind but gfortran, for instance, is a notable exception.

Language-defined Intrinsic Functions and Subprograms
Fortran has a fairly limited set of intrinsic functions to support character manipulation, searching and conversion. But the basic set is enough to construct some powerful features as required. There are some strange absences such as the ability to convert from lower-case to upper-case but this can be understood and forgiven since these concepts may not exist in many of the languages or character sets that may be represented by different  kinds. Functions such as,   and   which apply to arrays of any data type, including character type, are not described here.

achar
returns the ith character in the ASCII collating sequence for the characters of the specified kind. The integer  must be in the range 0 < i < 127. Kind is an optional integer. If kind is not specified the default kind is assumed. has the value 'H'. One really useful feature of  is that it permits access to the non-printing ASCII characters such as return. will always return the ASCII character even if the processor's collating sequence is not ASCII. If kind is present, the kind parameter of the result is that specified by kind; otherwise, the kind parameter of the result is that of default character. If the processor cannot represent the result value in the kind of the result, the result is undefined. Using  is highly recommended in preference to , described below, because it is portable from one processor to another.

adjustl
left justifies by removing leading (left) blanks from string and filling the right of string with blanks so that the result has the same length as the input string.

adjustr
right justifies by removing trailing (right) blanks from string and filling the left of the string with blanks so that the result has the same length as the input string.

char
returns the ith character in the processor collating sequence for the characters of the specified kind. The integer  does not have to be in the range 0 < i < 127. Kind is an optional integer. If kind is not specified the default kind is assumed. If the processor cannot represent the result value in the kind of the result, the result is undefined.

iachar
is the inverse of  described above. c is a single input character and  returns the position of c in the ASCII character set as a default integer. Kind is an optional input integer and if kind is specified, it specifies the kind of the integer returned by.

ichar
is the inverse of CHAR described above. c is a single input character and  returns the position of c in the selected character set as a default integer. Kind is an optional input integer and if kind is specified, it specifies the kind of the integer returned by.

index
returns a default integer representing the position of the first instance of substring in string searching from left to right. There are two optional arguments: back and kind. If the logical back is set true the search is conducted from right to left, and if the integer kind is specified, then the integer returned by  will be of that kind. If substring does not appear in string the result is 0.

len
returns an integer representing the declared length of character c. This can be extremely useful in subprograms which receive character dummy arguments. can be a character array. Kind is an optional integer which controls the kind of the integer returned by.

len_trim
returns the length of c excluding any trailing blanks (but including leading blanks). If c is only blanks the result is 0. Hence expressions like  can be used to count the number of characters in c between the first and last non-blank characters. Kind is an optional integer which controls the kind of the integer returned by.

new_line
is a character function that returns the new line character for the current processor. The kind of the returned character will be the same as the kind of. A blank character may be returned if the character kind from which  is drawn does not contain a relevant newline character. This function is not likely to be used except in some very specific circumstances.

repeat
concatenates integer ncopies of the string. Hence  is a string of 72 equals signs. String must be scalar but can be of any length. Trailing blanks in string are included in the result.

scan
returns a default integer (or an integer of the optional kind) that represents the first position that any character in set appears in string. To search right to left, the optional logical back must be set true. string can be an array in which case, the result in an integer array. If string is an array then set can be an array of the same size and shape as string and each element of set is scanned for in the corresponding element of string. , described above, is a special case of, because every character of set must be found and in the order of the characters in set.

selected_char_kind
is an integer function that returns the kind value of the character set named. The only set that must be supported by the language standard is. If name is not supported the result is -1.

trim
is a character valued function that returns a string with the trailing blanks removed. If string is all blanks the result has zero length.

verify
is an integer function that returns the position of the first character in string that is not in set. So  is roughly the obverse of. In  back and kind are both optional and have the same role as described in   above. If every character in string is also in set (or string has zero length), then the function returns 0.

Regular expressions
Fortran does not have any language-defined regex or sorting capability for character data. Fortran does not have a language-defined text tokenizer but, with a little ingenuity, list directed input can provide a partial solution. However, there are Fortran libraries that wrap C regex libraries.

read formatting
for character data can be list-directed or formated using the "a" or "an" forms of this edit descriptor. In the "a" form, the width is taken from the width of the corresponding item in the list. In the "an" form, the integer n specifies the number of characters to transfer. The general edit description "gn" can also be used.

Example

write Formatting
The a and g edit descriptors exist for  as described above. The "a" form will write the whole character variable including all the trailing blanks so it is common to use  or   or both.

Example

Internal Read and Write
Fortran has many hidden secrets and one of the most useful is that  and   statements can be used on character variables as if they were files. Hence the otherwise mystifying lack of functions to convert numbers to strings and vice versa. The character variable is treated as an 'internal file'

Example In addition to type conversion, this internal read/write can be used as a very flexible and bullet proof method of reading files where the contents may be of uncertain format. The external file is read line by line into a character variable,  and   can be used on the line to determine what is present and then an internal file read is done on the character variable to convert to ,  ,   etc as appropriate.

character, allocatable
The size of character scalar data can be deferred (or "allocatable") and therefore free from being required to be declared of a specific length. The resulting scalar can then be formally allocated, or it can be automatically allocated as shown in the following example.

Example

It is even possible to declare an array of assumed length elements, as illustrated below.

Example However, this feature should be used carefully and some restrictions apply

Actual/Dummy arguments of type character
It is frequently the case that a procedure may be written with a character dummy argument where the length of that argument is not known in advance. Modern Fortran allows dummy arguments to be declared with assumed length using. Functions of type character can be written so that the result assumed a length related to the length of the dummy arguments.

Example In the above example, the  variable   is declared to have 5 more characters than string, no matter how long the actual argument is. In the next example, a function return a string, the length of which is related to the length of one or more arguments.

Example

In circumstances where the character function has to return a string and the length of this string is not simply related to the inputs, the assumed length, allocatable form described above can be used, and is illustrated in the case conversion examples below.

character parameters
parameters can be declared without explicitly stating the length, for example;

Approaches to Case Conversion
Here are some further examples of the ideas above, but directed to the case conversion for languages where case conversion as a concept exists. In the first example, the ASCII character set functions  and   are used to check each character in a string consecutively.

Example

An alternative approach that does not rely on the ASCII representation function could be as follows:

Example

Which routine is quicker will depend on the relative speed of the  and   intrinsics. In one less than very scientific test, the first method above seemed to be slightly more than twice as fast as the second method, but this will vary from vendor to vendor.