Perl Programming/Strings

Strings
Any sequence of characters put together as one unit, is a string. So, the word the is a string. This sentence is a string. Even this entire paragraph is a string. In fact, you could consider the text of this entire book as one string.

Strings can be of any length and can contain any characters, numbers, punctuation, special characters (like ! #, and %), and even characters in natural languages besides English. In addition, a string can contain special whitespace formatting characters like newline, tab, and the bell character. We will discuss special characters more later on. For now, we will begin our consideration of strings by considering how to insert literal strings into a Perl program.

To begin our discussion of strings in Perl, we will consider how to work with string literals in Perl. The word literal here refers to the fact that these are used when you want to type a string directly to Perl. This can be contrasted with storing a string in a variable.

Any string literal can be used as an expression. We will find this useful when we want to store string literals in variables. However, for now, we will simply consider the different types of string literals that one can make in Perl. Later, we will learn how to assign these string literals to variables in the Scalar Variables section.

Single-quoted strings
String literals can be represented in primarily three ways in Perl. We have already used one type in the simple programming examples, using double quote marks. Using double or single quote marks in Perl each has a special meaning.

Single quotes can be thought of as literal strings. In the previous examples, you may have noticed that variable names were included inside the strings with double quotes. When the results were printed, the value of the variable was placed in the printed line, not the name of the variable. If single quote marks were used, the actual variable name would have been printed because nearly all special characters that might be interpreted differently are taken at face value when using single quotes.

To see what is meant by this, try this simple program:

You should see "Hello Fred" on the first line and "Hello $name\n" on the second (without a newline after it). Putting the value of $name into the string in the first print statement is called "interpolation." If you don't need interpolation, you should use single quotes, because it makes your intent clearer.

Special characters in single-quoted strings
There are two characters in single quoted strings that do not always represent themselves. This is due to necessity, since single-quoted strings start and end with the ' character. We need a way to express inside a single-quoted string that we want the string to contain a ' character.

The solution to this problem is to preceded any ' characters we actually want to appear in the string itself with the backslash (\ character). Thus we have strings like this:

We have in this example a string with seven characters exactly. Namely, this is the string: xxx'xxx. It can be difficult at first to become accustomed to the idea that two characters in the input to Perl actually produce only one character in the string itself. (C programmers are already probably used to this idea.) However, just keep in mind the rules and you will probably get used to them quickly.

Since we have used the \ character to do something special with the <tt>'</tt> character, we must now worry about the special cases for the backslash character itself. When we see a <tt>\</tt> character in a single-quoted string, we must carefully consider what will happen.

Under most circumstances, when a <tt>\</tt> is in a single-quoted string, it is simply a backslash, representing itself, as most other characters do. However, the following exceptions apply:


 * The sequence <tt>\'</tt> yields the character <tt>'</tt> in the actual string. (This is the exception we already discussed above).
 * The sequence <tt>\\</tt> yields the character <tt>\</tt> in the actual string. In other words, two backslashes right next to each other actually yield only one backslash.
 * A backslash, by itself, cannot be placed at the end of a the single-quoted string. This cannot happen because Perl will think that you are using the <tt>\</tt> to escape the closing <tt>'</tt>.

The following examples exemplify the various exceptions, and use them properly:

In the last example, note that the resulting string is <tt>Three \'s: "\\\"</tt>. If you can follow that example, you have definitely mastered how single-quoted strings work!

Instead of unreadable backslash escapes, Perl offers other ways of quoting strings. The first example above could be written as:

Newlines in single-quoted strings
Note that there is no rule against having a single-quoted string span several lines. When you do this, the string has newline characters embedded in it.

A newline character is a special ASCII character that indicates that a new line should be started. In a text editor, or when printing output to the screen, this usually indicates that the cursor should move from the end of the current line to the first position on the line following it.

Since Perl permits the placement of these newline characters directly into single quoted strings, we are permitted to do the following:

This string has a total of twenty characters. The first seven are <tt>Time to</tt>. The next character following that is a newline. Then, the eleven characters, <tt>start anew.</tt> follow. Note again that this is one string, with a newline as its eighth character.

Further, note that we are not permitted to put a comment in the middle of the string, even though we are usually allowed to place a <tt>#</tt> anywhere on the line and have the rest of the line be a comment. We cannot do this here, since we have yet to terminate our single-quoted string with a <tt>'</tt>, and thus, any <tt>#</tt> character and comment following it would actually become part of the single-quoted string! Remember that single-quotes strings are delimited by <tt>'</tt> at the beginning, and <tt>'</tt> at the end, and everything in between is considered part of the string, included newlines, <tt>#</tt> characters and anything else.

Examples of invalid single-quoted strings
In finishing our discussion of singled-quoted strings, consider these examples of strings that are not legal because they violate the exceptions we talked about above:

Sometimes, when you have invalid string literals such as in the example above, the error message that Perl gives is not particularly intuitive. However, when you see error messages such as:

(Might be a runaway multi-line '' string starting on line X) Bareword found where operator expected Bareword "foo" not allowed while "strict subs" in use

It is often an indication that you have runaway or invalid strings. Keep an eye out for these problems. Chances are, you will forget and violate one of the rules for single-quoted strings eventually, and then need to determine why you are unable to run your Perl program.

Brief digression from strings alone: The <tt>print</tt> function
Before we move on to our consideration of double-quoted strings, it is necessary to first consider a small digression. We know how to represent strings in Perl, but, as you may have noticed, the examples we have given thus far do not do anything interesting. If you try placing the statements that we listed as examples in Single Quoted Strings, into a full Perl program, like this:

#!/usr/bin/perl

use strict; use warnings;

'Three \\\'s: "\\\\\"'; # There are three \ chars between "" 'xxx\'xxx';            # xxx, a single-quote character, and then xxx 'Time to start anew.';

you probably noticed that nothing of interest happens. Perl gladly runs this program, but it produces no output.

Thus, to begin to work with strings in Perl beyond simple hypothetical considerations, we need a way to have Perl display our strings for us. The canonical way of accomplishing this in Perl is to use the <tt>print</tt> function.

The <tt>print</tt> function in Perl can be used in a variety of ways. The simplest form is to use the statement <tt>print STRING;</tt>, where <tt>STRING</tt> is any valid Perl string.

So, to reconsider our examples, instead of simply listing the strings, we could instead print each one out:

#!/usr/bin/perl

use strict; use warnings;

print 'Three \\\'s: "\\\\\"'; # Print first string print 'xxx\'xxx';            # Print the second print 'Time to start anew. ';   # Print last string, with a newline at the end

This program will produce output. When run, the output goes to what is called the standard output. This is usually the terminal, console or window in which you run the Perl program. In the case of the program above, the output to the standard output is as follows:

Three \'s: "\\\"xxx'xxxTime to start anew.

Note that a newline is required to break up the lines. Thus, you need to put a newline at the end of every valid string if you want your string to be the last thing on that line in the output.

Note that it is particularly important to put a newline on the end of the last string of your output. If you do not, often times, the command prompt for the command interpreter that you are using may run together with your last line of output, and this can be very disorienting. So, always remember to place a newline at the end of each line, particularly on your last line of output.

Finally, you may have noticed that formatting your code with newlines in the middle of single-quoted strings hurts readability. Since you are inside a single-quoted string, you cannot change the format of the continued lines within the print statement, nor put comments at the ends of those lines because that would insert data into your single-quoted strings. To handle newlines more elegantly, you should use double-quoted strings, which are the topic of the next section.

Double-quoted strings
Double-quoted strings are another way of representing scalar string literals in Perl. Like single-quoted strings, you place a group of ASCII characters between two delimiters (in this case, our delimiter is <tt>"</tt>). However, something called interpolation happens when you use a double-quoted string.

Interpolation in double-quoted strings
Interpolation is a special process whereby certain special strings written in ASCII are replaced by something different. In Single-quoted strings section, we noted that certain sequences in single-quoted strings (namely, <tt>\\</tt> and <tt>\'</tt>) were treated differently - these are called backslash escape sequences. This is very similar to what happens with interpolation.

For example, in interpolated double-quoted strings, various sequences preceded by a <tt>\</tt> character act differently according to the chart below:

As you may have noticed in the previous chapter, you can put the name of a variable within a string with its leading dollar sign. This form of interpolation replaces the name of the variable in the string with the content of the variable.

Examples of interpolation
Let us consider an example that uses a few of these characters:

#!/usr/bin/perl

use strict; use warnings;

print "A backslash: \\\n"; print "Tab follows:\tover here\n"; print "Ring! \a\n"; print "Please pay someone\@example.org \$20.\n";

This program, when run, produces the following output on the screen:

A backslash: \ Tab follows:	over here Ring! Please pay someone@example.org $20.

In addition, when running, you should hear the computer beep. That is the output of the <tt>\a</tt> character, which you cannot see on the screen. However, you should be able to hear it.

Notice that the <tt>\n</tt> character ends a line. <tt>\n</tt> should always be used to end a line. Those students familiar with the C language will be used to using this sequence to mean newline. When writing Perl, the word newline and the <tt>\n</tt> character are roughly synonymous.

String operators
Operators manipulate two or more strings in some way.

The concatenation operator
Perl uses the <tt>.</tt> operator to concatenate or connect two strings together, like this:

"Hello". "World" # This is the same as "HelloWorld"

If you want to make the string have a space between Hello and World you could write it like this:

"Hello". ", " . "World" # This is the same as "Hello, World"

Or like this:

"Hello". ", World" # This is the same as "Hello, World"

The <tt>x</tt> operator
This is called the string repetition operator and is used to repeat a string. All you have to do is put a string on the left side of the <tt>x</tt> and a number on the right side. Like this:

"Hello" x 5 # This is the same as "HelloHelloHelloHelloHello"

If you wish to insert a line break after each output of the string, use:

"Hello\n" x 5

Exercises

 * Write a program that uses the <tt>.</tt> operator to print "Hello, Sir!".


 * Write another program which uses the <tt>x</tt> operator to print "HelloHelloHelloHello". Put comments in this program that explain how it works


 * Remember to take some time to play with single and double quoted strings, the more practice you get, the better you will be.

Perl/Strings