Pascal Programming/Strings

The data type  is used to store a finite sequence of   values. It is a special case of an, but unlike an  the data type   has some advantages facilitating its effective usage.

The data type  as presented here is an Extended Pascal extension, as defined in the ISO standard 10206. Due to its high relevance in practice, this topic has been put into the Standard Pascal part of this Wikibook, right after the chapter on arrays.

Definition
The declaration of a  data type always entails a maximum capacity: After the word  follows a positive integer number surrounded by parenthesis. This is not a function call.

Implications
Variables of the data type  as defined above will only be able to store up to   independent   values. Of course it is possible to store less, or even, but once this limit is set it cannot be expanded.

Inquiry
variables “know” about their own maximum capacity: If you use, this will print. Every  variable automatically has a “field” called. This field is accessed by writing the respective  variable’s name and the word   joined by a dot. This field is read-only: You cannot assign values to it. It can only appear in expressions.

Length
All  variables have a current length. This is the total number of legit  values every   variable currently contains. To query this number, the EP standard defines a new function called : The  function returns a non-negative   value denoting the supplied string’s length. It also accepts  values. A  value has by definition a length of.

It is guaranteed that the  of a   variable will always be less than or equal to its corresponding.

Compatibility
You can copy entire string values using the  operator provided the variable on the LHS has the same or a greater capacity than the RHS string expression. This is different than a regular ’s behavior, which would require dimensions and size to match exactly. As long as no clipping occurs, i.&#8239;e. the omission of values because of a too short capacity, the assignment is fine.

Index
It is worth noting that otherwise strings are internally regarded as arrays. Like a character array you can access (and alter) every array element independently by specifying a valid index surrounded by brackets. However, there is a big difference with respect to validity of an index. At any time, you are only allowed to specify indices that are within the range. This range may be empty, specifically if  is currently.

Standard routines
In addition to the  function, EP also defines a few other standard functions operating on strings.

Manipulation
The following functions return strings.

Substring
In order to obtain just a part of a  (or  ) expression, the function  returns a sub-string of   having the non-negative length , starting at the positive index. It is important that is a valid character index in , otherwise the function causes an error. Let’s have a look at it in action:

For  -variables, the  function is the same as specifying. Evidently, if the  value is some complicated expression, the   function should be preferred to prevent any programming mistakes. However this syntax of a range index can be used not just as values in expressions, but also to overwrite parts of a.

Furthermore, the third parameter to  can be omitted: This will simply return the rest of the given   starting at the position indicated by the second parameter.

Remove trailing spaces
The  function returns a copy of   without any trailing space characters, i.&#8239;e. . In LTR scripts any blanks to the right are considered insignificant, yet in computing they take up (memory) space. It is advisable to prune strings before writing them, for example, to a disk or other long-term storage media, or transmission via networks. Concededly memory requirements were a more relevant issue prior to the 21st century.

First occurrence of substring
The function finds the first occurrence of   in   and returns the starting index. All characters from  match the characters in   at the returned offset: Note, to obtain the second or any subsequent occurrence, you need to use a proper substring of the.

Because the “empty string” is, mathematically speaking, present everywhere, always returns. Conversely, because any non-empty string cannot occur in an empty string, always returns , in the context of strings an otherwise invalid index. The value zero is returned if  does not occur in. This will always be the case if  is longer than.

Operators
The EP standard introduced an additional operator for strings of any length, including single characters. The  operator concatenates two strings or characters, or any combination thereof. Unlike the arithmetic, this operator is non-commutative, that means the order of the operands matters. Concatenation is useful if you intend to save the data somewhere. Supplying concatenated strings to routines such as /, however, may possibly be disadvantageous: The concatenation, especially of long strings, first requires to allocate enough memory to accommodate for the entire resulting string. Then, all the operands are copied to their respective location. This takes time. Hence, in the case of /  it is advisable (for very long strings) to use their capability of accepting an infinite number of (comma-separated) parameters.

The GPC, the FPC and Delphi are also shipped with a function  performing the very same task. Read the respective compiler’s documentation before using it, because there are some differences, or just stick to the standardized  operator.

Sophisticated comparison
All functions presented in this subsection return a  value.

Order
Since every character in a string has an ordinal value, we can think of a method to sort them. There are two flavors of comparing strings: The difference lies in their treatment of strings that vary in length. While the former will bring both strings to the same length by padding them with space characters, the latter simply clips them to the shortest length, but taking into account which one was longer (if necessary). All these functions and operators are binary, that means they expect and accept only exactly two parameters or operands respectively. They can produce different results if supplied with the same input, as you will see in the next two sub-subsections.
 * One uses the relational operators already introduced, such as,   or.
 * The other one is to use dedicated functions like undefined, or undefined.

Equality
Let’s start with equality. Let’s see this in action: To put this relationship in other words, Pascal terms you already know: The actual implementation is usually different, because  can be, especially for long strings, quite resource-consuming (time, as well as memory).
 * Two strings (of any length) are considered equal by the  function if both operands are of the same length and the value, i.&#8239;e. the character sequence that actually make up the strings, are the same.
 * An ‑comparison, on the other hand, augments any “missing” characters in the shorter string by using the padding character space.

As a consequence, an ‑comparison is usually used if trailing spaces are insignificant, but are still there for technical reasons (e.&#8239;g. because you are using an ). Only  ensures both strings are lexicographically the same. Note that the  of either string is irrelevant. The function, short for not equal, behaves accordingly.

Less than
A string is determined to be “less than” another one by sequentially reading both strings simultaneously from left to right and comparing corresponding characters. If all characters match, the strings are said to be equal to each other. However, if we encounter a differing character pair, processing is aborted and the relation of the current characters determines the overall string’s relation. If both strings are of equal length, the undefined function and the ‑operator behave the same. actually even builds on top of. Things get interesting if the supplied strings differ in length. The ‑comparison, on the other, compares all remaining “missing” characters to, the space character. This can lead to differing results: The situation above has been provoked artificially for demonstration purposes, but this can still become an issue if you are frequently using characters that are “smaller” than the regular space character, like for instance if you are programming on an 1980s 8‑bit Atari computer using ATASCII. The undefined, undefined, and undefined functions act accordingly.
 * 1) The   function first cuts both strings to the same (shorter) length. (substring)
 * 2) Then a regular comparison is performed as demonstrated above. If the shortened versions, common length versions turn out to be equal, the (originally) longer string is said to be greater than the other one.

Inclusion of delimiter
In Pascal  literals start with and are terminated by the same character. Usually this is a straight (typewriter’s) apostrophe. Troubles arise if you want to actually include that character in a  literal, because the character you want to include into your string is already understood as the terminating delimiter. Conventionally, two straight typewriter’s apostrophes back-to-back are regarded as an apostrophe image. In the produced computer program, they are replaced by a single apostrophe. Each double-apostrophe is replaced by a single apostrophe. The string still needs delimiting apostrophes, so you might end up with three consecutive apostrophes like in the example above, or even four consecutive apostrophes if you want a  -value consisting of a single apostrophe.

Non-permissible characters
A  is a linear sequence of characters, i.&#8239;e. along a single dimension.

You are nevertheless allowed to use the OS-specific code indicating EOLs, yet the only cross-platform (i.&#8239;e. guaranteed to work regardless of the used OS) procedure is. Although not standardized, many compilers provide a constant representing the environment’s character/string necessary to produce line breaks. In FPC it is called. Delphi has, which is also understood by the FPC for compatibility reasons. The GPC’s standard module  supplies the constant. You will first need to  this module before you can use that identifier.

Remainder operator
The final Standard Pascal arithmetic operator you are introduced to, after learning to divide, is the remainder operator  (short for modulo). Every  division  may yield a remainder. This operator evaluates to this value. Similar to all other division operations, the  operator does not accept a zero value as the second operand. Moreover, the second operand to  must be positive. There are many definitions, among computer scientists and mathematicians, as regards to the result if the divisor was negative. Pascal avoids any confusion by simply declaring negative divisors as illegal.

The  operator is frequently used to ensure a certain value remains in a specific range starting at zero. Furthermore, you will find modulo in number theory. For example, the definition of prime numbers says “not divisible by any other number”. This expression can be translated into Pascal like that:

Tasks
More exercises can be found in:
 * Programming Fundamentals, Chapter “Practice: Strings and Files”, § “String Activities”

Notes: