Introducing Julia/Working with text files

Reading from files
The standard approach for getting information from a text file is using the,  , and   functions.

Open
To read text from a file, first obtain a file handle:

is now Julia's connection to the file on disk. When you've finished with the file, you should close the connection, using:

In general, the recommended way to work with a file in Julia is to wrap any file-processing functions inside a do block:

The open file is automatically closed when this block finishes. See Controlling the flow for more about   blocks.

Because of the scope of local variables in blocks, you might want to keep some of information that was processed:

julia> totaltime, totallines (0.004484679, 76803)

Slurp – reading a file all at once
You can read the entire contents of an open file at once with :

julia> s = read(f, String)

This stores the contents of the file in :

You can use  to read in the whole file as an array, with each line an element:

julia> f = open("sherlock-holmes.txt"); julia> lines = readlines(f) 76803-element Array{String,1}: "THE ADVENTURES OF SHERLOCK HOLMES by SIR ARTHUR CONAN DOYLE\r\n" "\r\n" "  I. A Scandal in Bohemia\r\n" " II. The Red-headed League\r\n" ... "Holmes, rather to my disappointment, manifested no further\r\n" "interest in her when once she had ceased to be the centre of one\r\n" "of his problems, and she is now the head of a private school at\r\n" "Walsall, where I believe that she has met with considerable success.\r\n" julia> close(f)

Now you can step through the lines:

There's a better way to do this – see, below.

You might find the  function useful – it removes the trailing newline from a string.

Line by line
The  function turns a source into an iterator. This allows you to process a file a line at a time:

Another approach is to read until you reach the end of the file. You might want to keep track of which line you're on:

A better approach is to use  on an iterable object – you'll get the line numbering 'for free':

If you have a specific function that you want to call on a file, you can use this alternative syntax:

julia> shoutversion = open(shout, "sherlock-holmes.txt");

julia> shoutversion[30237:30400] "ELEMENTARY PROBLEMS. LET HIM, ON MEETING A\nFELLOW-MORTAL, LEARN AT A GLANCE TO DISTINGUISH THE HISTORY OF THE\nMAN, AND THE TRADE OR PROFESSION TO WHICH HE BELONGS. "

This opens the file, runs the  function on it, then closes it again, assigning the processed contents to the variable.

You can use the CSV.jl to read and write comma-separated-values (.csv) files, and it's recommended over (handles more corner cases and can be faster, especially for larger files) using  function to read lines delimited with certain characters, such as data files, arrays stored as text files, and tables. If you use the DataFrames package, there's also a  specifically designed to read data into a table.

Working with paths and filenames
These functions will be useful for working with filenames:


 * changes the current directory.
 * gets the current working directory.
 * returns a lists of the contents of a named directory, or the current directory.
 * adds the current directory's path to a filename to make an absolute pathname.
 * assembles a pathname from pieces.
 * tells you whether the path is a directory.
 * – split a path into a tuple of the directory name and file name.
 * – on Windows, split a path into the drive letter part and the path part. On Unix systems, the first component is always the empty string.
 * – if the last component of a path contains a dot, split the path into everything before the dot and everything including and after the dot. Otherwise, return a tuple of the argument unmodified and the empty string.
 * – replaces a tilde character at the start of a path with the current user's home directory.
 * – normalizes a path, removing "." and ".." entries.
 * – canonicalizes a path by expanding symbolic links and removing "." and ".." entries.
 * – gets current user's home directory.
 * – gets the directory part of a path.
 * – gets the file name part of a path.

To work on a restricted selection of files in a directory, use  and an anonymous function to filter the file names and just keep the ones you want. ( is more of a fishing net or sieve, rather than a coffee filter, in that it catches what you want to keep.)

If you want to match a group of files using a regular expression, then use. Let's look for files with ".jpg" or ".png" suffixes (remembering to escape the "."):

034571172750.jpg 034571172750.png 51ZN2sCNfVL._SS400_.jpg 51bU7lucOJL._SL500_AA300_.jpg Voronoy.jpg kblue.png korange.png penrose.jpg r-home-id-r4.png wave.jpg

To examine a file hierarchy, use, which lets you work through a directory, and examine the files in each directory in turn.

File information
If you want information about a specific file, use, and then use one of the fields to find out the information. Here's how to get all the information and the field names listed for a file "i":

You can access these fields via a 'stat' structure:

julia> s = stat("Untitled1.ipynb") StatStruct(mode=100644, size=64424)

julia> s.ctime 1.446649269e9

and you can also use some of them directly:

julia> ctime("Untitled2.ipynb") 1.446649269e9

although not :

julia> s.size 64424

To work on specific files that meet conditions – all Jupyter files (i.e. files with the extension "ipynb") modified after a certain date, for example – you could use something like this:

Interacting with the file system
The,  ,  , and   functions have the same names and functions as their Unix shell counterparts.

To convert filenames to pathnames, use. You can map this over a list of files in a directory:

julia> map(abspath, readdir) 67-element Array{String,1}: "/Users/me/.CFUserTextEncoding" "/Users/me/.DS_Store" "/Users/me/.Trash" "/Users/me/.Xauthority" "/Users/me/.ahbbighrc" "/Users/me/.apdisk" "/Users/me/.atom" ...

To restrict the list to filenames that contain a particular substring, use an anonymous function inside  – something like this:

julia> filter(x -> occursin("re", x), map(abspath, readdir)) 4-element Array{String,1}: "/Users/me/.DS_Store" "/Users/me/.gitignore" "/Users/me/.hgignore_global" "/Users/me/Pictures" ... To restrict the list to regular expression matches, try this:

julia> filter(x -> occursin(r"recur.*\.jl", x), map(abspath, readdir)) 2-element Array{String,1}: "/Users/me/julia/recursive-directory-scan.jl" "/Users/me/julia/recursive-text.jl"

Writing to files
To write to a text file, open it using the "w" flag and make sure that you have permission to create the file in the specified directory:

Here's how to write 20 lines of 4 random numbers between 1 and 10, separated by commas:

A quicker alternative to this is to use the  function, described next:

Writing and reading array to and from a file
In the DelimitedFiles package are two convenient functions,  and. These let you read/write an array or collection from/to a file.

writes the contents of an object to a text file, and  reads the data from a file into an array:

julia> numbers = rand(5,5) 5x5 Array{Float64,2}: 0.913583 0.312291  0.0855798  0.0592331  0.371789 0.13747   0.422435  0.295057   0.736044   0.763928 0.360894  0.434373  0.870768   0.469624   0.268495 0.620462  0.456771  0.258094   0.646355   0.275826 0.497492  0.854383  0.171938   0.870345   0.783558 julia> writedlm("/tmp/test.txt", numbers)

You can see the file using the shell (type a semicolon ";" to switch):

cat "/tmp/test.txt" .9135833328830523	.3122905420350348	.08557977218948465	.0592330821115965	.3717889559226475 .13747015238054083	.42243494637594203	.29505701073304524	.7360443978397753	.7639280496847236 .36089432672073607	.43437288984307787	.870767989032692	.4696243851552686	.26849468736154325 .6204624598015906	.4567706404666232	.25809436255988105	.6463554854347682	.27582613759302377 .4974916625466639	.8543829989347014	.17193814498701587	.8703447748713236	.783557793485824

The elements are separated by tabs unless you specify another delimiter. Here, a colon is used to delimit the numbers:

julia> writedlm("/tmp/test.txt", rand(1:6, 10, 10), ":")

shell> cat "/tmp/test.txt" 3:3:3:2:3:2:6:2:3:5 3:1:2:1:5:6:6:1:3:6 5:2:3:1:4:4:4:3:4:1 3:2:1:3:3:1:1:1:5:6 4:2:4:4:4:2:3:5:1:6 6:6:4:1:6:6:3:4:5:4 2:1:3:1:4:1:5:4:6:6 4:4:6:4:6:6:1:4:2:3 1:4:4:1:1:1:5:6:5:6 2:4:4:3:6:6:1:1:5:5

To read in data from a text file, you can use.

julia> numbers = rand(5,5) 5x5 Array{Float64,2}: 0.862955 0.00827944  0.811526  0.854526  0.747977 0.661742  0.535057    0.186404  0.592903  0.758013 0.800939  0.949748    0.86552   0.113001  0.0849006 0.691113  0.0184901   0.170052  0.421047  0.374274 0.536154  0.48647     0.926233  0.683502  0.116988

julia> writedlm("/tmp/test.txt", numbers) julia> numbers = readdlm("/tmp/test.txt") 5x5 Array{Float64,2}: 0.862955 0.00827944  0.811526  0.854526  0.747977 0.661742  0.535057    0.186404  0.592903  0.758013 0.800939  0.949748    0.86552   0.113001  0.0849006 0.691113  0.0184901   0.170052  0.421047  0.374274 0.536154  0.48647     0.926233  0.683502  0.116988

There are also a number of Julia packages specifically designed for reading and writing data to files, including DataFrames.jl and CSV.jl. Search in JuliaHub or JuliaPackages for these and more. Many of these packages live at the home of the JuliaData organization.