Ada Programming/Libraries/GNAT.String Split

Introduction
Exploding a string into several components based on a set of separators can be done in many different ways. In this article we're going to focus on a solution involving the  package.

Caveat
If you use the following example in a program of your own, the result will be a less portable program. The GNAT packages are only found in the [|GNAT GPL] and in the [|GNU GCC GNAT] compilers, meaning that your program probably won't compile with other Ada compilers.

The Problem
You want to split a string into a set of individual components, such as

This is a string

into

This is a  string

And this is exactly what you can do with the  package.

The GNAT.String_Split Solution
Let's dive straight into the code necessary to solve our string split problem. Create a file named  and add this to it:

You compile and execute the  program like this:

$ gnatmake explode.adb $ ./explode

You should see output similar to this:

Splitting 'This becomes a  bunch of     substrings' at whitespace. Got 6 substrings: 1 -> This (length 4) 2 -> becomes (length 7) 3 -> a (length 1) 4 -> bunch (length 5) 5 -> of (length 2) 6 -> substrings (length 10) The comments in the example should more or less explain what's going on, but for the sake of clarity, we're going to do a step-by-step walk-through of the code, starting with the dependencies and  clauses:

The three  lines list the packages on which our program depends. When the compiler encounters these, it retrieves those packages from its library. The "//Procedure Explode is//" line marks the start of our program, specifically the declarative part, where we declare/initialize our constants and variables. It also names our program. Note the  clauses. Adding these enables us to do this:

instead of this

in the program. Very handy.

As an exercise, try commenting the three  clauses, and prefix the actual package names to all types and procedures in the program.

Next up we have this:

This is the  we're going to split into individual components. is a constant declared in. It inserts a horizontal tab in the string. Since we don't change the value of  throughout the program, we've initialized it as a constant.

The  variable is the container for the individual components, or "slices".

These are our separators. In this case we want to split the string on space (" ") and horizontal tabs (//Latin_1.HT//). Note that the separators are NOT included as part of the resulting. Try experimenting with different separators.

marks the beginning of the body of our program. Immediately after  we output a short message.

This is the meat of the program. In this one statement the    is split into individual slices based on the   separators, and the resulting slices are placed in the. Note the  parameter. When using  mode,   will treat consecutive whitespace and horizontal tabs as one separator.

As an exercise, try changing  to   and see what happens.

This is the line that's responsible for the output:

Got 6 substrings:

Yes, it looks like an awfully long line for very little output, but there's method to the madness:

That line is responsible for the "6" part of the output. What it does is transform the  value   into the   value "6", and it does so using the   [|attribute]. return a  type, which is basically just an   with a value >=0, and   then convert this to a   suitable for output.

Here we start a loop that repeats  times, which in our case is 6. So on the first loop  is 1 and on the final loop   is 6. Inside the loop we  a new block. This enables us to locally initialize the  constant, which on each repeat of the loop is initialized anew with the next slice from our split. This is done using the  function which takes our   constant and the   loop counter as parameters, and return a. In the body of the block we output each slice, along with its index in the  and its length. As you can see, we once again make use of the  attribute to convert numeric values to.

You can get rid of the block inside the loop like this:

As you can see, we're no longer using the  constant. Instead we call  directly. It works just the same, but it is perhaps a bit less readable.

Another option is to use an. You can see a possible solution here:

Finally we have:

Which simply ends the program.

And with that, we've concluded this small tutorial on how to split a string into individual parts (slices) based on a set of separators. I hope you enjoyed reading it, as much as I enjoyed writing it.

Wikibook

 * Ada Programming/Libraries/GNAT

|GNAT.String_Split