Common Lisp/Advanced topics/Strings

The most important thing to know about strings in Common Lisp is probably that they are arrays and thus also sequences. This implies that all concepts that are applicable to arrays and sequences also apply to strings. If you can't find a particular string function, make sure you've also searched for the more general array or sequence functions. We'll only cover a fraction of what can be done with and to strings here.

Accessing Substrings
As a string is a sequence, you can access substrings with the SUBSEQ function. The index into the string is, as always, zero-based. The third, optional, argument is the index of the first character which is not a part of the substring, it is not the length of the substring.

You can also manipulate the substring if you use SUBSEQ together with SETF.

But note that the string isn't "stretchable". To cite from the HyperSpec: "If the subsequence and the new sequence are not of equal length, the shorter length determines the number of elements that are replaced." For example:

Accessing Individual Characters
You can use the function CHAR to access individual characters of a string. CHAR can also be used in conjunction with SETF.

Note that there's also SCHAR. If efficiency is important, SCHAR can be a bit faster where appropriate.

Because strings are arrays and thus sequences, you can also use the more generic functions AREF and ELT (which are more general while CHAR might be implemented more efficiently).

Manipulating Parts of a String
There's a slew of (sequence) functions that can be used to manipulate a string and we'll only provide some examples here. See the sequences dictionary in the HyperSpec for more.

Another function that can be frequently used (but not part of the ANSI standard) is replace-all. This function provides an easy functionality for search/replace operations on a string, by returning a new string in which all the occurrences of the 'part' in string is replaced with 'replacement'".

One of the implementations of replace-all is as follows:

However, bear in mind that the above code is not optimized for long strings; if you intend to perform such an operation on very long strings, files, etc. please consider using cl-ppcre regular expressions and string processing library which is heavily optimized.

Concatenating Strings
The name says it all: CONCATENATE is your friend. Note that this a generic sequence function and you have to provide the result type as the first argument.

If you have to construct a string out of many parts, all of these calls to CONCATENATE seem wasteful, though. There are at least three other good ways to construct a string piecemeal, depending on what exactly your data is. If you build your string one character at a time, make it an adjustable VECTOR (a one-dimensional ARRAY) of type character with a fill-pointer of zero, then use VECTOR-PUSH-EXTEND on it. That way, you can also give hints to the system if you can estimate how long the string will be. (See the optional third argument to VECTOR-PUSH-EXTEND.)

If the string will be constructed out of (the printed representations of) arbitrary objects, (symbols, numbers, characters, strings, ...), you can use FORMAT with an output stream argument of NIL. This directs FORMAT to return the indicated output as a string.

We can use the looping constructs of the FORMAT mini language to emulate CONCATENATE.

FORMAT can do a lot more processing but it has a relatively arcane syntax. After this last example, you can find the details in the CLHS section about formatted output.

Another way to create a string out of the printed representation of various object is using WITH-OUTPUT-TO-STRING. The value of this handy macro is a string containing everything that was output to the string stream within the body to the macro. This means you also have the full power of FORMAT at your disposal, should you need it.

Joining Strings With a Delimiter
Even though the previous section provides enough hints on how to do that, it may be the right time and place to stress how to join strings using delimiters. Assume you have a list of numbers or strings such as (192 168 1 1) or ("192" "168" "1" "1") and you want to join them with the delimiter "." or ";" to create another string. Here are some examples:

Processing a String One Character at a Time
Use the MAP function to process a string one character at a time.

Or do it with LOOP.

Reversing a String by Word or Character
Reversing a string by character is easy using the built-in REVERSE function (or its destructive counterpart NREVERSE).

There's no one-liner in CL to reverse a string by word (like you would do it in Perl with split and join). You either have to use function from an external library like SPLIT-SEQUENCE or you have to roll your own solution. Here's an attempt:

Controlling Case
Common Lisp has a couple of functions to control the case of a string.

These functions take :START and :END keyword arguments so you can optionally only manipulate a part of the string. They also have destructive counterparts whose names starts with "N".

Note this potential caveat: According to the HyperSpec, "for STRING-UPCASE, STRING-DOWNCASE, and STRING-CAPITALIZE, string is not modified. However, if no characters in string require conversion, the result may be either string or a copy of it, at the implementation's discretion." This implies the last result in the following example is implementation-dependent - it may either be "BIG" or "BUG". If you want to be sure, use COPY-SEQ.

Trimming Blanks from the Ends of a String
Not only can you trim blanks, but you can get rid of arbitrary characters. The functions STRING-TRIM, STRING-LEFT-TRIM and STRING-RIGHT-TRIM return a substring of their second argument where all characters that are in the first argument are removed off the beginning and/or the end. The first argument can be any sequence of characters.

Note: The caveat mentioned in the section about Controlling Case also applies here.

Converting between Symbols and Strings
The function INTERN will "convert" a string to a symbol. Actually, it will check whether the symbol denoted by the string (its first argument) is already accessible in the package (its second, optional, argument which defaults to the current package) and enter it, if necessary, into this package. It is beyond the scope of this chapter to explain all the concepts involved and to address the second return value of this function. See the CLHS chapter about packages for details.

Note that the case of the string is relevant.

To do the opposite, convert from a symbol to a string, use SYMBOL-NAME or STRING.

Converting between Characters and Strings
You can use COERCE to convert a string of length 1 to a character. You can also use COERCE to convert any sequence of characters into a string. You can not use COERCE to convert a character to a string, though - you'll have to use STRING instead.

Finding an Element of a String
Use FIND, POSITION, and their -IF counterparts to find characters in a string.

Or use COUNT and friends to count characters in a string.

Finding a Substring of a String
The function SEARCH can find substrings of a string.

Converting a String to a Number
CL provides the PARSE-INTEGER to convert a string representation of an integer to the corresponding numeric value. The second return value is the index into the string where the parsing stopped.

PARSE-INTEGER doesn't understand radix specifiers like #X, nor is there a built-in function to parse other numeric types. You could use READ-FROM-STRING in this case, but be aware that the full reader is in effect if you're using this function.

Converting Number to a String
Common Lisp provides functions such as PRINC-TO-STRING and PRIN1-TO-STRING to convert numbers into strings. If you want to concatenate a string and a number you can use them as follows:

Comparing Strings
The general functions EQUAL and EQUALP can be used to test whether two strings are equal. The strings are compared element-by-element, either in a case-sensitive manner (EQUAL) or not (EQUALP). There's also a bunch of string-specific comparison functions. You'll want to use these if you're deploying implementation-defined attributes of characters. Check your vendor's documentation in this case.

Here are a few examples. Note that all functions that test for inequality return the position of the first mismatch as a generalized boolean. You can also use the generic sequence function MISMATCH if you need more versatility.

Splitting Strings
SPLIT-SEQUENCE is part of the Common Lisp Utilities collection, available at http://www.cliki.net/SPLIT-SEQUENCE.

Copyright © 2002-2005 The Common Lisp Cookbook Project