Pascal Programming/Sets

This chapter introduces you to a new custom data type. Sets are one of the basic structured data types. When programming you will frequently find that some logic can be modeled with sets. Learning and mastering usage of sets is a key skill, since you will encounter them a lot in Pascal.

Notion
Sets are (possibly empty) aggregations of distinguishable objects. Either a set contains an object, or it does not. An object being part of a set is also referred to as element of that set.

Let's say we know the objects “apple”, “banana” and “pencil”. The set Fruit ≔ {“apple”, “banana”} contains the objects “apple” and “banana”. “Pencil” is not a member of the set Fruit.

Digitization
When a computer is supposed to store and process a set, it actually handles a series of  values. Every one of those  values tells us whether a certain element is part of a set.

Sets in Pascal
The computer needs to know how many  values it needs to set aside. In order to achieve this, a set in Pascal requires an ordinal type as a set’s base type. An ordinal type always has a finite range of permissible discrete values, thus the computer knows beforehand how many  values to reserve, how many elements we can expect a set contain at most. In consequence, a valid set  declaration is: A variable of the data type  can only contain   values. This set cannot contain, for instance,, that is an   value, nor is this information stored in any way.

Sets are particularly useful in conjunction with enumeration data types, which you just learned in previous chapter. Let’s consider an example in Pascal: Here, we have declared a variable, which represents a set of the   enumeration data type values. In the penultimate line we populate our set  with two objects,   and. The brackets indicate a set literal. is a set expression which we are assigning to the  variable.

The set variable  contains no other objects. However, the computer still stores five  values for every potential member of that set. The number five is number of elements in, the set’s base type. The information that,   and   are not part of the set   is stored explicitly (by the proper   value  ).

Inspecting a set
If we want to learn, whether a certain object is part of a set, the set operator  yields the corresponding   value the computer uses to store that information. The  operator is one of Pascal’s non-commutative operators. This means, you cannot swap the operands. On the RHS you always need to write an expression evaluating to a  value, whereas the LHS has to be an expression evaluating to this set’s base type.

Even though we, as humans, can say that is wrong, i.&#8239;e. , such a comparison is illegal. Per definition, the  set can only contain   values.

Operations
So far, sets probably seemed like a really complicated way for using  values. The true power of sets lies in a number of distinct operations, making sets an easier, and thus better alternative to handling two or more individual (but related)  values directly.

Combinations
In Pascal, two sets of the same kind, the same data type, can be combined forming a new set of the respective data type. Following operators are available: † The symmetric difference operator is only defined in EP.

Union
The result of unifying two sets into one is called union. Let’s say, recently our slob has learned how to drive and does that now too. This can be written as: Now,  contains all objects it previously held, plus all objects from the other set,.

Difference
Of course sets can be deprived of a set of elements by using the difference operator, in source code written as. This removes all objects present in the second set from the first set. Here, the empty set does not contain any objects, thus removing no objects has virtually no effect on.

Intersection
Furthermore you can intersect sets. The intersection of two sets is defined as the set of elements both operands contain. The set  now (only) contains   and , because those are the objects member of both operands, of both given sets.

Symmetric difference
A disjunct result to the intersection gives the symmetric difference. It is the union of the operands without the elements contained in both sets. Now  is, because those are the values from either set, but not both.

Comparisons
Two sets of the same kind, the same data type, can be compared by looking at each element in both sets. All comparison operators, as before, evaluate to a  expression.

Inclusion
The inclusion of a two sets means that all objects one set contains are present in another set. If the expression evaluates to , all objects present in the set   are also present. In a Venn diagram you will notice that one circle’s area is completely surrounded by another circle, if not identical to the other circle.

Equality and inequality
The equality of two sets is defined as. All objects contained in the left-hand set are present in the right-hand set and vice versa. In other words, there is not a single object that is present in just one of the sets. The inequality is just the negation thereof.

Element of
The  operator is the only set operator that does not act on two sets but on one potential set member candidate and a set. It has been introduced above. With respect to Venn diagrams, though, you can say that the  operator is “like” pointing with your index finger to a point inside a circle, or outside of it.

Cardinality
(After initialization) at any time a set contains a certain number of elements. In mathematics the number of objects being part of a set is called cardinality. The cardinality of a set can be retrieved using the function, an EP extension. This will print  as there are no elements in an empty set.

Unfortunately, not all compilers implement the  function. The FPC does not have none. The GPC does supply one, though.

Universe
Originally, Wirth proposed a function : "is the set of all values of type"

For example: The set  would contain all available   values, ,  ,  ,  ,.

Unfortunately, this proposal never made it into the ISO standards, nor do the FPC or GPC support that function, or provide an equivalent. The only alternative is to use an appropriate set constructor (an EP extension): is equivalent to, provided that   is the first   and   the last   value (referring to the order these items were listed during the data type declaration of  ).

Inclusion and exclusion
Not standardized, but convenient is BP’s definition of  and   procedures. These are shorthand for very frequent set manipulations.

The procedures allow you to quickly add or remove one object from one set. is identical to but you do not need to type out the set name twice and everything, thus reducing the chance of typing mistakes. Likewise, will do the same as

Both, the FPC and GPC, support these handy routines, which are in fact in all cases implemented as compiler intrinsics, not actual s.

Set literals
Effectively stating sets is a required skill when handling sets. It is important to understand that sets merely store the information that an object is a member of a set, or not. The set is identical to. Specifying  multiple times does not make it “more” part of that set.

Also, it is not necessary to list all members in any particular order. is just as acceptable as  is. Mathematically speaking, sets are not ordered. Pascal’s requirement that a set’s base data type has to be an ordinal type is purely a technical requirement. For readability reasons it is usually sensible, though, to list elements in ascending order.

The EP standard gives you nice short notation for  literals containing a continuous series of values. Instead of writing you can also write ranges like  evaluating to the very same value. Of course, all numbers could also be variables, or expressions in general.

Set literals are always a positive statement which objects are in a set. If we wanted a set of  values between   and   without ,   and  , but do not want to write this set out entirely (i.&#8239;e. as ), you can either write  or the expression. The latter is probably a little easier to grasp what objects are and which are not in the final set.

Memory restrictions
Although a  is legal and complies with all Pascal standards, many compilers do not support such large sets. Per definition, a can contain (at most) all values in the range. That is a lot (try  or read your compiler’s documentation to find out this value). On a 64‑bit platform this value (usually) is 263−1, i.&#8239;e. 9,223,372,036,854,775,808. As of the year 2020 many computers will quickly run out of main memory if they attempted to hold that many  values.

As a consequence, BP restricts permissible set’s base types. In BP the base type’s largest and smallest values’ ordinal values have to be in the range. The value  is 28−1. As of version 3.2.0, the FPC sets the same limitations.

The GPC allows  definitions beyond 28 elements, although some configuration is required: You need to specify the  command-line parameter or a specially crafted comment in your source code: This will instruct the GPC that a  can only store up to this many   values.

Loops
Now that you have made the acquaintance of enumeration data types and sets, you see yourself faced with dealing a growing number of data. Pascal, like many other programming languages, support a language construct called loops.

Characteristics
Loops are (possibly empty) sequences of statements that are repeated over and over again, or even never, based on a  value. The sequence of statements is termed loop body. The loop head contains (possibly implicitly) a  expression determining whether the loop body is executed. Every time the loop body is run, an iteration is in progress.

The term loop originates from the circumstance that some early models of computers required programs to be fed (“loaded”) via punched paper tape. If a portion of that paper tape was meant to be processed multiple times, that piece of paper tape was cut, bent and temporarily fixated so it formed a physical loop. Thankfully, advancements in computer technology has made it far more convenient to handle repeating code.

Pascal (and many other programming languages) differentiate between two groups of loops: Counting loops have in common that, before running the first iteration it can already be determined how many times the loop body will be executed just by evaluating the loop head. Conditional loops on the other hand are based on an abort condition, i.&#8239;e. a  expression. Except for infinite loops, there is no way to tell in advance how many times, how many iterations a conditional loop will have without thoroughly (mathematically) analyzing the loop body and loop head, and possibly even considering circumstances beyond the loop.
 * counting loops, presented here, and
 * conditional loops, presented in a chapter to come.

Counting loops
Counting loops do not necessarily count a quantity. They are named after the fact that they employ a variable, a counting variable. This variable of any ordinal data type (de facto) assigns every iteration a number.

A counting loop is introduced by the reserved word : After  follows a specially crafted assignment to the counting variable.

Range of counting variable
(with the auxiliary reserved word ) denotes a range of values the counting variable   will assume while executing the loop body. and are both expressions possessing the counting variable’s data type, that means there could also appear variables or more complex expressions, not just constant literals as shown.

This range is like a. It may possibly be empty: The range is an empty range, since there are no values between   up to and including. In consequence, the counting variable will not be assigned any value out of this empty range, as there simply are none available, and the loop body is never executed. Nevertheless, the range contains exactly one value, i.&#8239;e. .

During the first iteration the corresponding counting variable, here, will have the first value out of the given range, the start value, in the example above this is the value. In the successive iteration the variable has the value, and so forth up to and including the final value of the given range, here.

Immutability of counting variable
It is not necessary to actually utilize the counting variable inside the loop body, but you can use it if you are just obtaining its current value: Inside the loop body of ‑loops it is forbidden to assign any values to the counting variable. Forbidden assignments include, but are not limited to putting the counting the variable on the LHS of, but also  /  may not use the counting variable. Tampering with the counting variable is forbidden, because the loop head will effectively employ  to obtain the next iteration’s value. The loop head implicitly contains the  expression counting variable ≠ final value. If the counting variable was manipulated this condition might never be met, thus destroying the characteristics of a ‑loop. Preventing the programmer to do any assignments preemptively ensures such an infinite loop is not, accidentally as well as deliberately, created.

Reverse direction
Pascal also allows ‑loops in a reversed direction using the reserved word   instead of  : Here, the range is down and including to. The loop’s terminating condition is still counting variable ≠ final value, but in this case the counting variable  becomes   (not  ) at the end of each iteration, after the loop body has been executed.

Loops on collections
EP allows to iterate over discrete aggregations, such as sets. This is particularly useful if you have a routine that needs to be applied to every value of an aggregation. Here is an example to demonstrate the principle: Now you see the word  again, but in this context  is not an expression. The data type restrictions for  are still in effect: On the RHS an aggregation expression is given, whereas the LHS is in this case a variable that has the aggregation’s data type. This variable will be assigned every value out of the given aggregation every time an iteration is processed.

Since the RHS just needs an expression, not necessarily a variable, so you can shorten the example even further to:

Note, unlike the counting loops above, you are not supposed to make any assumptions about the order the loop variable is assigned values to. It may be in ascending, descending, or completely mixed up “order”, but the specific order is “implementation defined”, i.&#8239;e. it depends on the used compiler. Accompanying documents of the compiler explain in which order the loop is processed.

Tasks
Sources: Notes: