Programming with ooc/Strings

Usage
The String type is defined in the core module lang/types. String literals such as "abc", are of this type.

str := "This is a string literal"

On the right hand side a String object is created represented by the string literal. Its reference is assigned to the str variable.

A few operators are overloaded for strings by default. For example, ( + ) is used as a string concatenation operator. For example:

str := "I love" + " french cheese" str := "I love french cheese"

( + ) is overloaded for pretty much every type inherited from C (strings, chars, numeric types) so that you can do

str := "The answer is " + 42

Also overloaded in the standard SDK is [ ], used as a slicing operator

"Sand" println "Sandwich"[0..4] println

Note: the slicing operator applies on bytes, not characters. See the UTF-8 Support section below for more informations.

Many more operators are overloaded on strings, such as == for comparison, * for repeating, [ ]= for bound-checked modification, etc.

Strings in ooc aren't immutable, but every method in the standard type String returns a copy of the given string, and never modifies the original.

The standard type String provides a nice set of methods for string manipulation. Since they all return copies, you should do things like that

name = name trim

But never:

name trim

This would create a new trimmed string and throw it away. It is good to study the String type and know its methods in order not to reinvent the wheel.

UTF-8 support
At the time of this writing, there is no built-in UTF-8 support in ooc. The length function returns the number of bytes used to store the String, never the number of characters.

The reason for that is that there is no clearly defined boundaries between character in the Unicode standard. One can roughly determinate 'grapheme clusters', i.e. associate modifiers with glyphs that correspond to characters, etc. but it's a very difficult problem, and there are edge cases with non-European/non-American languages.

Therefore, for now, there's no UTF-8 character, nor codepoint, etc. but that doesn't prevent one from using UTF-8 in ooc programs. The ICU and utf8proc libraries seem especially interesting for handling such encoding matters in an ooc codebase.

Note that the language design on this issue is not definitive, and is subject to changes in the future, as soon as other, more pressing matters are decided upon.

Length in bytes
As a consequence of the lack of UTF-8 support, the length methods returns the number of bytes so that:

"o/" length

is 2, but

"漢字" length

is 6, because 3 bytes are used to store each of the characters that make up the Japanese word "Kanji"

Creating new strings
You can create a new String from a char:

str := String new('\n')

or just allocate a fixed number of chars:

str := String new(128)

A String literal, such as "abc", is also of type String

str := "Curiosity killed the cat."

Iterating through a string
You can iterate over the bytes of a String, because it implements the iterator method.

(c in "Hello, vertical world!") { c println }

Comparing strings
You can use the == operator to compare two Strings, because it's overloaded in the SDK. It calls the equals method, so that, in ooc:

name == "JFK" name equals("JFK")

This behavior greatly enhances the readability of ooc code, as opposed to, say, Java.

You can still compare the addresses of Strings by casting them to Pointers first:

name as Pointer == otherName

The compare method can be used to test parts of strings for equality, for example:

"awesome" compare("we", 1, 2) "Turn right" compare("Turn left", 0, 6)

Substrings and slicing
The [ ] operator can be used with a range to obtain the same effect as calling substring

"happiness"[3..6] == "pin" "happiness" substring(3, 6) equals("pin")

Searching in a string
The indexOf and lastIndexOf methods allow to search for the first and last occurrence, respectively, of a byte or a string in another string.

str := "Proud of you, son." str indexOf("of") toString println str lastIndexOf('o') toString println

Repeating a string
A String can be repeated multiple time using the overloaded ( * ) operator, or with the times method:

println("The cake is a lie!" * 5) "The cake is a lie!" times(5) println

Note that because of precedence, we can't write:

"The cake is a lie!" * 5 println

Because the compiler would read that as:

"The cake is a lie" * (5 println)

Which is definitely not what we intended.

Reversing a string
A String can be reversed using the reverse method.

"le bon nobel" reverse println

Be aware that reverse works on bytes, not characters. See the UTF-8 Support section for more information.

Appending strings
You can use either the ( + ) operator or the append and prepend methods:

"in" + "direct" "in" append("direct") "direct" prepend("in")

As with all other string methods, a copy is returned, the original string is not modified.

However, if you are building a string from many smaller parts, it is better to use a Buffer instead, as detailed below.