Introducing Julia/Strings and characters

Strings
A string is a sequence of one or more characters, usually found enclosed in double quotes:

There are two important things you need to know about strings.

One is, that they're immutable. You can't change them once they're created. But it's easy to make new strings from parts of existing ones.

The second is that you have to be careful when using two specific characters: double quotes ("), and dollar signs ($). If you want to include a double quote character in the string, it has to be preceded with a backslash, otherwise the rest of the string would be interpreted as Julia code, with potentially interesting results. And if you want to include a dollar sign ($) in a string, that should also be prefaced by a backslash, because it's used for string interpolation.

julia> demand = "You owe me \$50!" "You owe me \$50!" julia> println(demand) You owe me $50!

julia> demandquote = "He said, \"You owe me \$50!\"" "He said, \"You owe me \$50!\""

Strings can also be enclosed in triple double quotes. This is useful because you can use ordinary double quotes inside the string without having to put backslashes before them:

julia> """this is "a" string""" "this is \"a\" string"

You'll encounter a few specialized types of string too, which consist of one or more characters immediately followed by the opening double quote:


 * indicates a regular expression
 * indicates a version string
 * indicates a byte literal
 * indicates a raw string that doesn't do interpolation

String interpolation
You often want to use the results of Julia expressions inside strings. For example, suppose you want to say:

"The value of x is n."

where  is the current value of. Any Julia expression can be inserted into a string with the  construction:

julia> x = 42 42 julia> "The value of x is $(x)." "The value of x is 42."

You don't have to use the parentheses if you're just using the name of a variable:

julia> "The value of x is $x." "The value of x is 42."

To include the result of a Julia expression in a string, enclose the expression in parentheses first, then precede it with a dollar sign:

julia> "The value of 2 + 2 is $(2 + 2)." "The value of 2 + 2 is 4."

Substrings
To extract a smaller string from a string, use  or   syntax. For basic ASCII strings, you can use the same techniques that you use to extract elements from arrays:

julia> s ="a load of characters" "a load of characters" julia> s[1:end] "a load of characters" julia> s[3:6] "load"

julia> s[3:end-6] "load of char"

which is equivalent to:

julia> s[begin+2:end-6] "load of char"

You can easily iterate through a string:

Watch out if you take a single element from the string, rather than a string of length 1 (i.e. with the same start and end positions):

julia> s[1:1] "a" julia> s[1] 'a'

The second result isn't a string, but a character (inside single quotes).

Unicode strings
Not all strings are ASCII. To access individual characters in Unicode strings, you can't always use simple indexing, because some characters occupy more than one index position. Don't be fooled just because some of the index numbers appear to work:

julia> su = "AéB𐅍CD" "AéB𐅍CD" julia> su[1] 'A' julia> su[2] 'é' julia> su[3] ERROR: UnicodeError: invalid character index in slow_utf8_next(::Array{UInt8,1}, ::UInt8, ::Int64) at ./strings/string.jl:67 in next at ./strings/string.jl:92 [inlined] in getindex(::String, ::Int64) at ./strings/basic.jl:70

Instead of  to find the length of a string, use  :

julia> length(su) 6

julia> lastindex(su) 10

The  functions tests whether a string is ASCII or contains Unicode characters:

julia> isascii(su) false

In this string, the 'second' character, é, has 2 bytes, the 'fourth' character, 𐅍, has 4 bytes.

1 -> A 2 -> é 4 -> B 5 -> 𐅍 9 -> C 10 -> D

The 'third' character, B, starts with the 4th element in the string.

You can also do this even more easily using the  function:

1 => A 2 => é 4 => B 5 => 𐅍 9 => C 10 => D

As an alternative, use the  iterator:

There are other useful functions for working with strings like this, including,  ,  , and  :

julia> collect(su) 6-element Array{Char,1}: 'A' 'é' 'B' '𐅍' 'C' 'D'

1 2 2 4 5 5 5 5 9 10

Splitting and joining strings
You can stick strings together (a process often called concatenation) using the multiply operator:

julia> "s" * "t" "st"

If you've used other programming languages, you might expect to use the addition operator:

julia> "s" + "t" LoadError: MethodError: `+` has no method matching +(::String, ::String)

- so use.

If you can 'multiply' strings, you can also raise them to a power:

julia> "s" ^ 18 "ssssssssssssssssss"

You can also use :

julia> string("s", "t") "st"

but if you want to do a lot of concatenation, inside a loop, perhaps, it might be better to use the string buffer approach (see below).

To split a string, use  function. Given this simple string:

julia> s = "You know my methods, Watson." "You know my methods, Watson."

a simple call to the  function divides the string at the spaces, returning a five-piece array:

julia> split(s) 5-element Array{SubString{String},1}: "You" "know" "my" "methods," "Watson."

Or you can specify the string of 1 or more characters to split at:

julia> split(s, "e") 2-element Array{SubString{String},1}: "You know my m" "thods, Watson." julia> split(s, " m") 3-element Array{SubString{String},1}: "You know" "y" "ethods, Watson."

The characters you use to do the splitting don't appear in the final result:

julia> split(s, "hod") 2-element Array{SubString{String},1}: "You know my met" "s, Watson."

If you want to split a string into separate single-character strings, use the empty string ("") which splits the string between the characters:

julia> split(s,"") 28-element Array{SubString{String},1}: "Y" "o" "u" " " "k" "n" "o" "w" " " "m" "y" " " "m" "e" "t" "h" "o" "d" "s" "," " " "W" "a" "t" "s" "o" "n" "."

You can also split strings using a regular expression to define the splitting points. Use the special regex string construction. Inside this, you can use regular expression characters with special meanings:

julia> split(s, r"a|e|i|o|u") 8-element Array{SubString{String},1}: "Y" "" " kn" "w my m" "th" "ds, W" "ts" "n."

Here, the  is a regular expression string, and — as you'll know if you love regular expressions — that this matches any of the vowels. So the resulting array consists of the string split at every vowel. Notice the empty strings in the results -— if you don't want those, add a false flag at the end:

julia> split(s, r"a|e|i|o|u", false) 7-element Array{SubString{String},1}: "Y" " kn" "w my m" "th" "ds, W" "ts" "n."

If you wanted to keep the vowels, rather than use them for splitting work, you have to delve deeper into the world of regex literal strings. Read on.

You can join the elements of a split string in array form using :

julia> join(split(s, r"a|e|i|o|u", false), "aiou") "Yaiou knaiouw my maiouthaiouds, Waioutsaioun."

Splitting using a function
Many functions in Julia let you use functions as part of a function call. Anonymous functions are useful, because you can make function calls which have smart choices built-in. For example,  lets you provide a function in place of the delimiter character. In the next example, the delimiter is (bizarrely) specified to be any upper-case character whose ASCII code is a multiple of 8:

julia> split(join(Char.(65:90)), c -> Int(c) % 8 == 0) 4-element Array{SubString{String},1}: "ABCDEFG" "IJKLMNO" "QRSTUVW" "YZ"

Character objects
Above we extracted smaller strings from larger strings:

julia> s[1:1] "a"

But when we extracted a single element from a string:

julia> s[1] 'a'

note the single quotes. In Julia, these are used to mark character objects, so  is a character object, but   is a string with length 1. These are not equivalent.

You can convert character objects to strings easily enough:

julia> string('s') * string('d') "sd"

or

julia> string('s', 'd') "sd"

It's easy to input 32 bits Unicode characters using  escape sequence (the uppercase means 32 bits). The lowercase escape sequence  can be used for 16 and 8 bit characters:

julia> ('\U1014d', '\u2640', '\u26') ('𐅍','♀','&')

For strings, the  and   syntax is more strict.

julia> "\U0001014d2\U000026402\u26402\U000000a52\u00a52\U000000352\u00352\x352" "𐅍2♀2♀2¥2¥2525252"

Converting between numbers and strings
Turning integers into strings is the job of the  function. The keyword  lets you specify the number base for the conversion, which you can use to convert decimal digits to a binary, octal, or hexadecimal string:

julia> string(11, base=2) "1011"

julia> string(11, base=8) "13" julia> string(11, base=16) "b" julia> string(11) "11"

julia> a = BigInt(2)^200 1606938044258990275541962092341162602522202993782792835301376

julia> string(a) "1606938044258990275541962092341162602522202993782792835301376"

julia> string(a, base=16) "1000000000000000000000000000000000000000000000000"

To convert strings to numbers, use, and you can also specify the number base (such as binary or hex) if you want the string to be interpreted as using a number base:

julia> parse(Int, "100") 100 julia> parse(Int, "100", base=2) 4 julia> parse(Int, "100", base=16) 256 julia> parse(Float64, "100.32") 100.32 julia> parse(Complex{Float64}, "0 + 1im") 0.0 + 1.0im

Converting characters to integers and back again
converts a character into an integer, and  turns an integer into a character.

julia> Char(8253) '‽': Unicode U+203d (category Po: Punctuation, other) julia> Char(0x203d) # the Interrobang is Unicode U+203d in hexadecimal '‽': Unicode U+203d (category Po: Punctuation, other) julia> Int('‽') 8253 julia> string(Int('‽'), base=16) "203d"

To go from a single character string to the code number (such as its ASCII or UTF code number), try this:

julia> Int("S"[1]) 83

For a quick alphabet:

julia> string.(Char.("A"[1]:"Z"[1])) |> collect  26-element Array{String,1}: "A" "B" ... "Y" "Z"

printf formatting
If you're deeply attached to C-style  functionality, you'll be able to use a Julia macro (you call macros by prefacing them with the   sign). The macro is provided in the Printf package, which you'll need to load first:

julia> using Printf

julia> @printf("pi = %0.20f", float(pi)) pi = 3.14159265358979311600

or you can create another string using the  macro, also to be found in the Printf package:

julia> @sprintf("pi = %0.20f", float(pi)) "pi = 3.14159265358979311600"

Convert a string to an array
To read from a string into an array, you can use the  function. This is available with a number of Julia functions (including ). Here's a string of data (it could have been read from a file):

Now you can "read" this string using functions such as, the "read with delimiters" function. This can be found in the package DelimitedFiles.

julia> using DelimitedFiles julia> readdlm(IOBuffer(data)) 3x4 Array{Float64,2}: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 0.0 1.0 2.0

You can add an optional type specification:

julia> readdlm(IOBuffer(data), Int) 3x4 Array{Int64,2}: 1 2 3 4 5 6 7 8 9 0 1 2

Sometimes you want to do things to strings that you can do better with arrays. Here's an example.

julia> s = "/Users/me/Music/iTunes/iTunes Media/Mobile Applications";

You can explode the pathname string into an array of character objects, using, which gathers the items in a collection or string into an array:

julia> collect(s) 55-element Array{Char,1}: '/' 'U' 's' 'e' 'r' 's' '/' ...

Similarly, you can use  to split the string and count the results:

julia> split(s, "") 55-element Array{Char,1}: '/' 'U' 's' 'e' 'r' 's' '/' ...

To count the occurrences of a particular character object, you can use an anonymous function:

julia> count(c -> c == '/', collect(s)) 6

although here converting to an array is unnecessary and inefficient. Here's a better way:

julia> count(c -> c == '/', s) 6

Finding and replacing things inside strings
If you want to know whether a string contains a specific character, use the general-purpose  function.

But the  function, which accepts two strings, is more generally useful, because you can use substrings with one or more characters. Notice that you place the search term first, then the string you're looking in — :

julia> occursin("Wat", s) true

julia> occursin("m", s) true

julia> occursin("mi", s) false

julia> occursin("me", s) true

You can get the location of the first occurrence of a substring using. The first argument can be a single character, a string, or a regular expression:

julia> s ="You know my methods, Watson."; julia> findfirst("meth", s) 13:16

julia> findfirst(r"[aeiou]", s)  # first vowel 2

julia> findfirst(isequal('a'), s) # first occurrence of character 'a' 23

In each case, the result contains the indices of the characters, if present.

Replacing
The  function returns a new string with a substring of characters replaced with something else:

julia> replace("Sherlock Holmes", "e" => "ee") "Sheerlock Holmees"

You use the => operator to specify the pattern you're looking for, and its replacement. Usually the third argument is another string, as here. But you can also supply a function that processes the result:

julia> replace("Sherlock Holmes", "e" => uppercase) "ShErlock HolmEs"

where the function (here, the built-in  function) is applied to the matching substring.

There's no  function, where the "!" indicates a function that changes its argument. That's because you can't change a string — they're immutable.

Replacing using functions
Many functions in Julia allow you to supply functions as part of the function call, and you can make good use of anonymous functions for this. Here, for example, is how to use a function to provide random replacements in a  function.

julia>  t = "You can never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant.";

julia> replace(t, r"a|e|i|o|u" => (c) -> rand(Bool) ? "0" : "1")  "Y00 c1n n0v0r f1r0t1ll wh1t 0ny 0n0 m0n w1ll d0, b0t y01 c1n s1y w0th pr1c1s10n wh0t 1n 1v0r0g0 n1mb0r w0ll b0 0p t1. Ind1v0d11ls v0ry, b0t p1rc0nt0g0s r0m01n c1nst0nt."

julia> replace(t, r"a|e|i|o|u" => (c) -> rand(Bool) ? "0" : "1") "Y11 c0n...n1v0r f0r1t0ll wh1t 1ny 0n1 m0n w1ll d1, b1t y10 c1n s1y w1th pr0c1s01n wh0t 0n 0v1r0g0 n1mb1r w0ll b0 1p t1. Ind1v0d01ls v0ry, b1t p0rc1nt1g0s r0m01n c1nst0nt."

Regular expressions
You can use regular expressions to find matches for substrings. Some functions that accept a regular expression are:


 * changes occurrences of regular expressions
 * returns the first match or nothing
 * returns an iterator that lets you search through all matches
 * splits a string at every match

Use  to replace each consonant with an underscore:

julia> replace("Elementary, my dear Watson!", r"[^aeiou]" => "_") "__e_e__a________ea___a__o__"

and the following code replaces each vowel with the results of running a function on each match:

julia> replace("Elementary, my dear Watson!", r"[aeiou]" => uppercase) "ElEmEntAry, my dEAr WAtsOn!"

With  you can access the matches if you provide a special substitution string , where   refers to the first match,   to the second, and so on. With this regex operation, each lowercase letter preceded by a space is repeated three times:

julia> replace("Elementary, my dear Watson!", r"(\s)([a-z])" => s"\1\2\2\2") "Elementary, mmmy dddear Watson!"

For more regular expression fun, there are the  functions.

Here I've loaded the complete text of "The Adventures of Sherlock Holmes" from a file into the string called :

julia> f = "/tmp/adventures-of-sherlock-holmes.txt" julia> text = read(f, String);

To use the possibility of a match as a Boolean condition, suitable for use in an  statement for example, use.

julia> occursin(r"Opium", text) false

That's odd. We were expecting to find evidence of the great detective's peculiar pharmacological recreations. In fact, the word "opium" does appear in the text, but only in lower-case, hence this  result—regular expressions are case-sensitive.

julia> occursin(r"(?i)Opium", text) true

This is a case-insensitive search, set by the flag ), and it returns.

You could check every line for the word using a simple loop:

For more useable output (in the REPL), add  and some highlighting:

5087 opium. The habit grew upon him, as I understand, from some 5140 he had, when the fit was on him, made use of an opium den in the 5173 brown opium smoke, and terraced with wooden berths, like the 5237 wrinkled, bent with age, an opium pipe dangling down from between 5273 very short time a decrepit figure had emerged from the opium den, 5280 opium-smoking to cocaine injections, and all the other little 5429 steps - for the house was none other than the opium den in which 5486 lives upon the second floor of the opium den, and who was 5510 learn to have been the lodger at the opium den, and to have been 5593 doing in the opium den, what happened to him when there, where is 5846 "Had he ever showed any signs of having taken opium?" 6129 room above the opium den when I looked out of my window and saw,

There's an alternative syntax for adding regex modifiers, such as case-insensitive matches. Notice the "i" immediately following the regex string in the second example:

julia> occursin(r"Opium", text) false julia> occursin(r"Opium"i, text) true

With the  function, you apply the regex to the string to produce an iterator. For example, to look for substrings in our text matching the letters "L", followed by some other characters, ending with "ed":

julia> lmatch = eachmatch(r"L.*?ed", text)

The result in  is an iterable object containing all the matches, as RegexMatch objects:

julia> collect(lmatch)[1:10] 10-element Array{RegexMatch,1}: RegexMatch("London, and proceed") RegexMatch("London is a pleasant thing indeed") RegexMatch("Looking for lodgings,\" I answered") RegexMatch("London he had received")        RegexMatch("Lied")                 RegexMatch("Life,\" and it attempted") RegexMatch("Lauriston Gardens wore an ill-omened") RegexMatch("Let\" card had developed")      RegexMatch("Lestrade, is here. I had relied")    RegexMatch("Lestrade grabbed")

We can step through the iterator and look at each match in turn. You can access a number of fields of a RegexMatch, to extract information about the match. These include,  ,  ,  , and. For example, the  field contains the matched substring:

Other fields include, the captured substrings as an array of strings,  , the offset into the string at which the whole match begins, and, the offsets of the captured substrings.

To get an array of matching strings, use something like this:

julia> collect(m.match for m in eachmatch(r"L.*?ed", text)) 58-element Array{SubString{String},1}: "London - quite so! Your Majesty, as I understand, became entangled" "Lodge. As it pulled" "Lord, Mr. Wilson, that I was a red" "League of the Red" "League was founded" "London when he was young, and he wanted" "Leadenhall Street Post Office, to be left till called" "Let the whole incident be a sealed" "Lestrade, being rather puzzled" "Lestrade would have noted" "Lestrade looked" "Lestrade laughed" "Lestrade shrugged" "Lestrade called" ... "Lord St. Simon shrugged" "Lady St. Simon was decoyed" "Lestrade,\" drawled"                        "Lestrade looked"                           "Lord St. Simon has not already arrived"               "Lord St. Simon sank into a chair and passed"             "Lord St. Simon had by no means relaxed"               "Lordship. \"I may be forced" "London. What could have happened" "London, and I had placed"

The basic  function looks for the first match for your regex. Use the  field to extract the information from the RegexMatch object:

julia> match(r"She.*",text).match "Sherlock Holmes she is always THE woman. I have seldom heard\r"

A more streamlined way of obtaining matching lines from a file is this:

julia> f = "adventures of sherlock holmes.txt" julia> filter(s -> occursin(r"(?i)Opium", s), map(chomp, readlines(open(f)))) 12-element Array{SubString{String},1}: "opium. The habit grew upon him, as I understand, from some" "he had, when the fit was on him, made use of an opium den in the" "brown opium smoke, and terraced with wooden berths, like the" "wrinkled, bent with age, an opium pipe dangling down from between" "very short time a decrepit figure had emerged from the opium den," "opium-smoking to cocaine injections, and all the other little" "steps - for the house was none other than the opium den in which" "lives upon the second floor of the opium den, and who was" "learn to have been the lodger at the opium den, and to have been" "doing in the opium den, what happened to him when there, where is" "\"Had he ever showed any signs of having taken opium?\"" "room above the opium den when I looked out of my window and saw,"

Making a Regex
Sometimes you want to make a regular expression from within your code. You can do this by making a Regex object. Here is one way you could count the number of vowels in the text:

there are 219626 letter "a"s in the text. there are 337212 letter "e"s in the text. there are 167552 letter "i"s in the text. there are 212834 letter "o"s in the text. there are 82924 letter "u"s in the text.

Making a substitution string
Sometimes you'll want to assemble a substitution string. To do this, you can use  instead of.

For example, say you want to do some string interpolation in the replacement string. Perhaps you have a list of files, and you want to renumber them, so that "file2.png" becomes "file1.png":

files = ["file2.png", "file3.png", "file4.png", "file5.png", "file6.png", "file7.png"]

for (n, f) in enumerate(files) newfilename = replace(f, r"(.*)\d\.png" => SubstitutionString("\\g<1>$(n).png")) # now to do the renaming...

Notice that you can't simply use  in the SubstitutionString to refer to the first captured expression, you have to escape it as , and use   (escaped as  ) to refer to the named capture group.

Testing and changing strings
There are lots of functions for testing and changing strings:


 * length of string
 * length/size
 * does strA start with strB?
 * does strA end with strB?
 * does strA occur in strB?
 * is str entirely letters?
 * is str entirely number characters?
 * is str ASCII?
 * is str entirely control characters?
 * is str 0-9?
 * does str consist of punctuation?
 * is str whitespace characters?
 * is str uppercase?
 * is str entirely lowercase?
 * is str entirely hexadecimal digits?
 * return a copy of str converted to uppercase
 * return a copy of str converted to lowercase
 * return copy of str with the first character of each word converted to uppercase
 * return copy of str with first character converted to uppercase
 * return copy of str with first character converted to lowercase
 * return a copy with the last character removed
 * return a copy with the last character removed only if it's a newline

Streams
To write to a string, you can use a Julia stream. The  (String Print) function lets you use a function as the first argument, and uses the function and the rest of the arguments to send information to a stream, returning the result as a string.

For example, consider the following function,. The body of the function maps an anonymous 'print' function over the arguments, enclosing them with angle brackets. When used by, the function   processes the remaining arguments and sends them to the stream.

julia> sprint(f, "fred", "jim", "bill", "fred blogs") "   "

Functions like  can take an IOBuffer or stream as their first argument. This lets you print to streams instead of printing to the standard output device:

julia> iobuffer = IOBuffer IOBuffer(data=Uint8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> for i in 1:100           println(iobuffer, string(i))       end

After this, the in-memory stream called  is full of numbers and newlines, even though nothing was printed on the terminal. To copy the contents of  from the stream to a string or array, you can use  :

julia> String(take!(iobuffer)) "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14 ... \n98\n99\n100\n"

Colored / Styled Output
The following prints out messages in their respective colors using :

julia> for color in [:red, :green, :blue, :magenta]           printstyled("Hello in $(color)\n"; color = color)       end Hello in red Hello in green Hello in blue Hello in magenta

Printing a formatted Backtrace
In the middle of a try catch statement, the following will print the original backtrace that caused the exception:

If you are outside a try-catch and want to print a stacktrace without throwing stopping execution use this: