Introducing Julia/Dictionaries and sets

Dictionaries
Many of the functions introduced so far have been shown working on arrays (and tuples). But arrays are just one type of collection. Julia has others.

A simple look-up table is a useful way of organizing many types of data: given a single piece of information, such as a number, string, or symbol, called the key, what is the corresponding data value? For this purpose, Julia provides the Dictionary object, called Dict for short. It's an "associative collection" because it associates keys with values.

Creating dictionaries
You can create a simple dictionary using the following syntax:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3) Dict{String,Int64} with 3 entries: "c" => 3 "b" => 2 "a" => 1

is now a dictionary. The keys are "a", "b", and "c", the corresponding values are 1, 2, and 3. The  operator is called the   function. In a dictionary, keys are always unique – you can't have two keys with the same name.

If you know the types of the keys and values in advance, you can (and probably should) specify them after the  keyword, in curly braces:

julia> dict = Dict{String,Integer}("a"=>1, "b" => 2) Dict{String,Integer} with 2 entries: "b" => 2 "a" => 1

You can also create dictionaries using the generator/comprehensions syntax:

julia> dict = Dict(string(i) => sind(i) for i = 0:5:360) Dict{String,Float64} with 73 entries: "320" => -0.642788 "65"  => 0.906308  "155" => 0.422618  ⋮     => ⋮

Use the following syntax to create a typed empty dictionary:

julia> dict = Dict{String,Int64} Dict{String,Int64} with 0 entries

or you can omit the types, and get an untyped dictionary:

julia> dict = Dict Dict{Any,Any} with 0 entries

It's sometimes useful to create dictionary entries using a  loop:

This is one way you could create a set of 'variables' stored in a dictionary:

julia> fvars Dict{Any,Any} with 3 entries: "x_1" => "a.txt" "x_2" => "b.txt" "x_3" => "c.txt"

Looking things up
To get a value, if you have the key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5) julia> dict["a"] 1

if the keys are strings. Or, if the keys are symbols:

julia> symdict = Dict(:x => 1, :y => 3, :z => 6) Dict{Symbol,Int64} with 3 entries: :z => 6 :x => 1 :y => 3

julia> symdict[:x] 1

Or if the keys are integers:

julia> intdict = Dict(1 => "one", 2 => "two", 3 => "three") Dict{Int64,String} with 3 entries: 2 => "two" 3 => "three" 1 => "one"

julia> intdict[2] "two"

You can instead use the  function, and provide a fail-safe default value if there's no value for that particular key:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5)

julia> get(dict, "a", 0) 1 julia> get(dict, "Z", 0) 0

If you don't want  to provide a default value, use a  ... block:

To change a value assigned to an existing key (or assign a value to a hitherto unseen key):

julia> dict["a"] = 10 10

Keys
Keys must be unique for a dictionary. There's always only one key called  in this dictionary, so when you assign a value to a key that already exists, you're not creating a new one, just modifying an existing one.

To see if the dictionary contains a key, use :

julia> haskey(dict, "Z") false

To check for the existence of a key/value pair:

julia> in(("b" => 2), dict) true

To add a new key and value to a dictionary, use this:

julia> dict["d"] = 4 4

You can delete a key from the dictionary, using :

julia> delete!(dict, "d") Dict{String,Int64} with 4 entries: "c" => 3 "e" => 5 "b" => 2 "a" => 1

You'll notice that the dictionary doesn't seem to be sorted in any way — at least, the keys are in no particular order. This is due to the way they're stored, and you can't sort them in place. (But see Sorting, below.)

To get all keys, use the  function:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5); julia> keys(dict) Base.KeySet for a Dict{String,Int64} with 5 entries. Keys: "c" "e" "b" "a" "d"

The result is an iterator that has just one job: to iterate through a dictionary key by key:

julia> collect(keys(dict)) 5-element Array{String,1}: "c" "e" "b" "a" "d" julia> [uppercase(key) for key in keys(dict)] 5-element Array{Any,1}: "C" "E" "B" "A" "D"

This uses the list comprehension form and each new element is collected into an array. An alternative would be:

julia> map(uppercase, collect(keys(dict))) 5-element Array{String,1}: "C" "E" "B" "A" "D"

Values
To retrieve all the values, use the  function:

julia> values(dict) Base.ValueIterator for a Dict{String,Int64} with 5 entries. Values: 3 5  2  1  4

If you want to go through a dictionary and process each key/value, you can make use the fact that dictionaries themselves are iterable objects:

julia> for kv in dict println(kv) end "c"=>3 "e"=>5 "b"=>2 "a"=>1 "d"=>4

where  is a tuple containing each key/value pair in turn.

Or you could do:

julia> for k in keys(dict)          println(k, " ==> ", dict[k])       end c ==> 3 e ==> 5 b ==> 2 a ==> 1 d ==> 4

Even better, you can use a key/value tuple to simplify the iteration even more:

julia> for (key, value) in dict           println(key, " ==> ", value)       end c ==> 3 e ==> 5 b ==> 2 a ==> 1 d ==> 4

Here's another example:

(Notice the string interpolation operator, . This allows you to use a variable's name in a string and get the variable's value when the string is printed. You can include any Julia expression in a string using  .)

Sorting a dictionary
Because dictionaries don't store the keys in any particular order, you might want to output the dictionary to a sorted array to obtain the items in order:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) Dict{String,Int64} with 6 entries: "f" => 6 "c" => 3 "e" => 5 "b" => 2 "a" => 1 "d" => 4

julia> for key in sort(collect(keys(dict)))   println("$key => $(dict[key])") end a => 1 b => 2 c => 3 d => 4 e => 5 f => 6

If you really need to have a dictionary that remains sorted all the time, you can use the SortedDict data type from the DataStructures.jl package (after having installed it).

julia> import DataStructures julia> dict = DataStructures.SortedDict("b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries: "b" => 2 "c" => 3 "d" => 4 "e" => 5 "f" => 6

julia> dict["a"] = 1 1

julia> dict DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries: "a" => 1 "b" => 2 "c" => 3 "d" => 4 "e" => 5 "f" => 6

Recent versions of Julia sort dictionaries for you:

julia> dict = Dict("a" => 1, "b" => 2, "c" => 3, "d" => 4, "e" => 5, "f" => 6) Dict{String,Int64} with 6 entries: "f" => 6 "c" => 3 "e" => 5 "b" => 2 "a" => 1 "d" => 4 julia> sort(dict) OrderedCollections.OrderedDict{String,Int64} with 6 entries: "a" => 1 "b" => 2 "c" => 3 "d" => 4 "e" => 5 "f" => 6

Simple example: counting words
A simple application of a dictionary is to count how many times each word appears in a piece of text. Each word is a key, and the value of the key is the number of times that word appears in the text.

Let's count the words in the Sherlock Holmes stories. I've downloaded the text from the excellent Project Gutenberg and stored them in a file "sherlock-holmes-canon.txt". To create a list of words from the loaded text in, we'll split the text using a regular expression, and convert every word to lowercase. (There are probably faster methods.)

julia> f = open("sherlock-holmes-canon.txt") julia> wordlist = String[] julia> for line in eachline(f)   words = split(line, r"\W")   map(w -> push!(wordlist, lowercase(w)), words) end julia> filter!(!isempty, wordlist) julia> close(f)

is now an array of nearly 700,000 words:

julia> wordlist[1:20] 20-element Array{String,1}: "THE" "COMPLETE" "SHERLOCK" "HOLMES" "Arthur" "Conan" "Doyle" "Table" "of" "contents" "A" "Study" "In" "Scarlet" "The" "Sign" "of" "the" "Four" "The"

To store the words and the word counts, we'll create a dictionary:

julia> wordcounts = Dict{String,Int64} Dict{String,Int64} with 0 entries

To build the dictionary, loop through the list of words, and use  to look up the current tally, if any. If the word has already been seen, the count can be increased. If the word hasn't been seen before, the fall-back third argument of  ensures that the absence doesn't cause an error, and 1 is stored instead.

Now you can look up words in the  dictionary and find out how many times they appear:

julia> wordcounts["watson"] 1040 julia> wordcounts["holmes"] 3057 julia> wordcounts["sherlock"] 415 julia> wordcounts["lestrade"] 244

Dictionaries aren't sorted, but you can use the  and   functions on the dictionary to collect the keys and then sort them. In a loop you can work through the dictionary in alphabetical order:

But how do you find out the most common words? One way is to use  to convert the dictionary to an array of tuples, and then to sort the array by looking at the last value of each tuple:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true) 19171-element Array{Pair{String,Int64},1}: ("the",36244) ("and",17593) ("i",17357) ("of",16779) ("to",16041) ("a",15848) ("that",11506) ⋮                 ("enrage",1) ("smuggled",1) ("lounges",1) ("devotes",1) ("reverberated",1) ("munitions",1) ("graybeard",1)

To see only the top 20 words:

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:20] 20-element Array{Pair{String,Int64},1}: ("the",36244) ("and",17593) ("i",17357) ("of",16779) ("to",16041) ("a",15848) ("that",11506) ("it",11101) ("in",10766) ("he",10366) ("was",9844) ("you",9688) ("his",7836) ("is",6650) ("had",6057) ("have",5532) ("my",5293) ("with",5256) ("as",4755) ("for",4713)

In a similar way, you can use the  function to find, for example, all words that start with "k" and occur less than four times:

julia> filter(tuple -> startswith(first(tuple), "k") && last(tuple) < 4, collect(wordcounts)) 73-element Array{Pair{String,Int64},1}: ("keg",1) ("klux",2) ("knifing",1) ("keening",1) ("kansas",3) ⋮ ("kaiser",1) ("kidnap",2) ("keswick",1) ("kings",2) ("kratides",3) ("ken",2) ("kindliness",2) ("klan",2) ("keepsake",1) ("kindled",2) ("kit",2) ("kicking",1) ("kramm",2) ("knob",1)

More complex structures
A dictionary can hold many different types of values. Here for example is a dictionary where the keys are strings and the values are arrays of arrays of points (assuming that the Point type has been defined already). For example, this could be used to store graphical shapes describing the letters of the alphabet (some of which have two or more loops):

julia> p = Dict{String, Array{Array}} Dict{String,Array{Array{T,N},N}} julia> p["a"] = ArrayPoint(0,0), Point(1,1)], [Point(34, 23), Point(5,6) 2-element Array{Array{T,N},1}: [Point(0.0,0.0), Point(1.0,1.0)] [Point(34.0,23.0), Point(5.0,6.0)] julia> push!(p["a"], [Point(34.0,23.0), Point(5.0,6.0)]) 3-element Array{Array{T,N},1}: [Point(0.0,0.0), Point(1.0,1.0)] [Point(34.0,23.0), Point(5.0,6.0)] [Point(34.0,23.0), Point(5.0,6.0)]

Or create a dictionary with some already-known values:

julia> d = Dict("shape1" => Array [ [ Point(0,0), Point(-20,57)], [Point(34, -23), Point(-10,12) ] ]) Dict{String,Array{Array{T,N},1}} with 1 entry: "shape1" => Array [ [ Point(0.0,0.0), Point(-20.0,57.0)], [Point(34.0,-23.0), Point(-10.0,12.0) ] ]

Add another array to the first one:

julia> push!(d["shape1"], [Point(-124.0, 37.0), Point(25.0,32.0)]) 3-element Array{Array{T,N},1}: [Point(0.0,0.0), Point(-20.0,57.0)] [Point(34.0,-23.0), Point(-10.0,12.0)] [Point(-124.0,37.0), Point(25.0,32.0)]

Sets
A set is a collection of elements, just like an array or dictionary, with no duplicated elements.

The two important differences between a set and other types of collection is that in a set you can have only one of each element, and, in a set, the order of elements isn't important (whereas an array can have multiple copies of an element and their order is remembered).

You can create an empty set using the  constructor function:

julia> colors = Set Set{Any}({})

As elsewhere in Julia, you can specify the type:

julia> primes = Set{Int64} Set(Int64)[]

You can create and fill sets in one go:

julia> colors = Set{String}(["red","green","blue","yellow"]) Set(String["yellow","blue","green","red"])

or you can let Julia "guess the type":

julia> colors = Set(["red","green","blue","yellow"]) Set{String}({"yellow","blue","green","red"})

Quite a few of the functions that work with arrays also work with sets. Adding elements to sets, for example, is a bit like adding elements to arrays. You can use :

julia> push!(colors, "black")  Set{String}({"yellow","blue","green","black","red"})

But you can't use, because that works only for things that have a concept of "first", like arrays.

What happens if you try to add something to the set that's already there? Absolutely nothing. You don't get a copy added, because it's a set, not an array, and sets don't store repeated elements.

To see if something is in the set, you can use :

julia> in("green", colors) true

There are some standard operations you can do with sets, namely find their union, intersection, and difference, with the functions,,  , and  :

julia> rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"]) Set(String["indigo","yellow","orange","blue","violet","green","red"])

The union of two sets is the set of everything that is in one or the other sets. The result is another set – so you can't have two "yellow"s here, even though we've got a "yellow" in each set:

julia> union(colors, rainbow) Set(String["indigo","yellow","orange","blue","violet","green","black","red"])

The intersection of two sets is the set that contains every element that belongs to both sets:

julia> intersect(colors, rainbow) Set(String["yellow","blue","green","red"])

The difference between two sets is the set of elements that are in the first set, but not in the second. This time, the order in which you supply the sets matters. The  function finds the elements that are in the first set, , but not in the second set,  :

julia> setdiff(colors, rainbow) Set(String["black"])

Other functions
Functions that work on arrays and sets sometimes work on dictionaries and other collections too. For example, some of the set operations can be applied to dictionaries, not just sets and arrays:

julia> d1 = Dict(1=>"a", 2 => "b") Dict{Int64,String} with 2 entries: 2 => "b" 1 => "a" julia> d2 = Dict(2 => "b", 3 =>"c", 4 => "d") Dict{Int64,String} with 3 entries: 4 => "d" 2 => "b" 3 => "c" julia> union(d1, d2) 4-element Array{Pair{Int64,String},1}: 2=>"b" 1=>"a" 4=>"d" 3=>"c" julia> intersect(d1, d2) 1-element Array{Pair{Int64,String},1}: 2=>"b" julia> setdiff(d1, d2) 1-element Array{Pair{Int64,String},1}: 1=>"a"

Notice that the results are returned as arrays of Pairs, rather than as Dictionaries.

Functions such as,  , and   which we've already seen being used with arrays also work with dictionaries:

julia> filter((k, v) -> k == 1, d1) Dict{Int64,String} with 1 entry: 1 => "a"

There's a  function which can merge two dictionaries:

julia> merge(d1, d2) Dict{Int64,String} with 4 entries: 4 => "d" 2 => "b" 3 => "c" 1 => "a"

The  function can find the minimum value in a dictionary, and return the value, and its key.

julia> d1 = Dict(:a => 1, :b => 2, :c => 0) Dict{Symbol,Int64} with 3 entries: :a => 1 :b => 2 :c => 0 julia> findmin(d1) (0, :c)