How to Think Like a Computer Scientist: Learning with Python 2nd Edition/Modules and files

= Modules and files =

Modules
A module is a file containing Python definitions and statements intended for use in other Python programs. There are many Python modules that come with Python as part of the standard library. We have seen two of these already, the doctest module and the string module.

pydoc
You can use pydoc to search through the Python libraries installed on your system. At the command prompt type the following:

and the following will appear:

(note: see exercise 2 if you get an error)

Click on the open browser button to launch a web browser window containing the documentation generated by pydoc:

This is a listing of all the python libraries found by Python on your system. Clicking on a module name opens a new page with documenation for that module. Clicking keyword, for example, opens the following page:

Documentation for most modules contains three color coded sections:


 * Classes in pink
 * Functions in orange
 * Data in green

Classes will be discussed in later chapters, but for now we can use pydoc to see the functions and data contained within modules.

The keyword module contains a single function, iskeyword, which as its name suggests is a boolean function that returns True if a string passed to it is a keyword:

The data item, kwlist contains a list of all the current keywords in Python:

We encourage you to use <tt>pydoc</tt> to explore the extensive libraries that come with Python. There are so many treasures to discover!

Creating modules
All we need to create a module is a text file with a <tt>.py</tt> extension on the filename:

We can now use our module in both scripts and the Python shell. To do so, we must first import the module. There are two ways to do this:

and:

In the first example, <tt>remove_at</tt> is called just like the functions we have seen previously. In the second example the name of the module and a dot (.) are written before the function name.

Notice that in either case we do not include the <tt>.py</tt> file extension when importing. Python expects the file names of Python modules to end in <tt>.py</tt>, so the file extention is not included in the import statement.

The use of modules makes it possible to break up very large programs into managable sized parts, and to keep related parts together.

Namespaces
A namespace is a syntactic container which permits the same name to be used in different modules or functions (and as we will see soon, in classes and methods).

Each module determines its own namespace, so we can use the same name in multiple modules without causing an identification problem.

We can now import both modules and access <tt>question</tt> and <tt>answer</tt> in each:

If we had used <tt>from module1 import *</tt> and <tt>from module2 import *</tt> instead, we would have a naming collision and would not be able to access <tt>question</tt> and <tt>answer</tt> from <tt>module1</tt>.

Functions also have their own namespace:

Running this program produces the following output:

The three <tt>n</tt>'s here do not collide since they are each in a different namespace.

Namespaces permit several programmers to work on the same project without having naming collisions.

Attributes and the dot operator
Variables defined inside a module are called attributes of the module. They are accessed by using the dot operator ( <tt>.</tt>). The <tt>question</tt> attribute of <tt>module1</tt> and <tt>module2</tt> are accessed using <tt>module1.question</tt> and <tt>module2.question</tt>.

Modules contain functions as well as attributes, and the dot operator is used to access them in the same way. <tt>seqtools.remove_at</tt> refers to the <tt>remove_at</tt> function in the <tt>seqtools</tt> module.

In Chapter 7 we introduced the <tt>find</tt> function from the <tt>string</tt> module. The <tt>string</tt> module contains many other useful functions:

You should use pydoc to browse the other functions and attributes in the string module.

String and list methods
As the Python language developed, most of functions from the <tt>string</tt> module have also been added as methods of string objects. A method acts much like a function, but the syntax for calling it is a bit different:

String methods are built into string objects, and they are invoked (called) by following the object with the dot operator and the method name.

We will be learning how to create our own objects with their own methods in later chapters. For now we will only be using methods that come with Python's built-in objects.

The dot operator can also be used to access built-in methods of list objects:

<tt>append</tt> is a list method which adds the argument passed to it to the end of the list. Continuing with this example, we show several other list methods:

Experiment with the list methods in this example until you feel confident that you understand how they work.

Reading and writing text files
While a program is running, its data is stored in random access memory (RAM). RAM is fast and inexpensive, but it is also volatile, which means that when the program ends, or the computer shuts down, data in RAM disappears. To make data available the next time you turn on your computer and start your program, you have to write it to a non-volatile storage medium, such a hard drive, usb drive, or CD-RW.

Data on non-volatile storage media is stored in named locations on the media called files. By reading and writing files, programs can save information between program runs.

Working with files is a lot like working with a notebook. To use a notebook, you have to open it. When you're done, you have to close it. While the notebook is open, you can either write in it or read from it. In either case, you know where you are in the notebook. You can read the whole notebook in its natural order or you can skip around.

All of this applies to files as well. To open a file, you specify its name and indicate whether you want to read or write.

Opening a file creates a file object. In this example, the variable <tt>myfile</tt> refers to the new file object.

The open function takes two arguments. The first is the name of the file, and the second is the mode. Mode <tt>'w'</tt> means that we are opening the file for writing.

If there is no file named <tt>test.dat</tt>, it will be created. If there already is one, it will be replaced by the file we are writing.

When we print the file object, we see the name of the file, the mode, and the location of the object.

To put data in the file we invoke the <tt>write</tt> method on the file object:

Closing the file tells the system that we are done writing and makes the file available for reading:

Now we can open the file again, this time for reading, and read the contents into a string. This time, the mode argument is <tt>'r'</tt> for reading:

If we try to open a file that doesn't exist, we get an error:

Not surprisingly, the <tt>read</tt> method reads data from the file. With no arguments, it reads the entire contents of the file into a single string:

There is no space between time and to because we did not write a space between the strings.

<tt>read</tt> can also take an argument that indicates how many characters to read:

If not enough characters are left in the file, <tt>read</tt> returns the remaining characters. When we get to the end of the file, <tt>read</tt> returns the empty string:

The following function copies a file, reading and writing up to fifty characters at a time. The first argument is the name of the original file; the second is the name of the new

This functions continues looping, reading 50 characters from <tt>infile</tt> and writing the same 50 characters to <tt>outfile</tt> until the end of <tt>infile</tt> is reached, at which point <tt>text</tt> is empty and the <tt>break</tt> statement is executed.

Text files
A text file is a file that contains printable characters and whitespace, organized into lines separated by newline characters. Since Python is specifically designed to process text files, it provides methods that make the job easy.

To demonstrate, we'll create a text file with three lines of text separated by newlines:

The <tt>readline</tt> method reads all the characters up to and including the next newline character:

<tt>readlines</tt> returns all of the remaining lines as a list of strings:

In this case, the output is in list format, which means that the strings appear with quotation marks and the newline character appears as the escape sequence <tt>\\012</tt>.

At the end of the file, <tt>readline</tt> returns the empty string and <tt>readlines</tt> returns the empty list:

The following is an example of a line-processing program. <tt>filter</tt> makes a copy of <tt>oldfile</tt>, omitting any lines that begin with <tt>#</tt>:

The continue statement ends the current iteration of the loop, but continues looping. The flow of execution moves to the top of the loop, checks the condition, and proceeds accordingly.

Thus, if <tt>text</tt> is the empty string, the loop exits. If the first character of <tt>text</tt> is a hash mark, the flow of execution goes to the top of the loop. Only if both conditions fail do we copy <tt>text</tt> into the new file.

Directories
Files on non-volatile storage media are organized by a set of rules known as a file system. File systems are made up of files and directories, which are containers for both files and other directories.

When you create a new file by opening it and writing, the new file goes in the current directory (wherever you were when you ran the program). Similarly, when you open a file for reading, Python looks for it in the current directory.

If you want to open a file somewhere else, you have to specify the path to the file, which is the name of the directory (or folder) where the file is located:

This example opens a file named <tt>words</tt> that resides in a directory named <tt>dict</tt>, which resides in <tt>share</tt>, which resides in <tt>usr</tt>, which resides in the top-level directory of the system, called <tt>/</tt>. It then reads in each line into a list using <tt>readlines</tt>, and prints out the first 5 elements from that list.

You cannot use <tt>/</tt> as part of a filename; it is reserved as a delimiter between directory and filenames.

The file <tt>/usr/share/dict/words</tt> should exist on unix based systems, and contains a list of words in alphabetical order.

Counting letters
The <tt>ord</tt> function returns the integer representation of a character:

This example explains why <tt>'Apple' &lt; 'apple'</tt> evaluates to <tt>True</tt>.

The <tt>chr</tt> function is the inverse of <tt>ord</tt>. It takes an integer as an argument and returns its character representation:

The following program, <tt>countletters.py</tt> counts the number of times each character occurs in the book Alice in Wonderland_:

Run this program and look at the output file it generates using a text editor. You will be asked to analyze the program in the exercises below.

The <tt>sys</tt> module and <tt>argv</tt>
The <tt>sys</tt> module contains functions and variables which provide access to the environment in which the python interpreter runs.

The following example shows the values of a few of these variables on one of our systems:

Starting Jython on the same machine produces different values for the same variables:

The results will be different on your machine of course.

The <tt>argv</tt> variable holds a list of strings read in from the command line when a Python script is run. These command line arguments can be used to pass information into a program at the same time it is invoked.

Running this program from the unix command prompt demonstrates how <tt>sys.argv</tt> works:

$ python demo_argv.py this and that 1 2 3 ['demo_argv.py', 'this', 'and', 'that', '1', '2', '3'] $ <tt>argv</tt> is a list of strings. Notice that the first element is the name of the program. Arguments are separated by white space, and separated into a list in the same way that <tt>string.split</tt> operates. If you want an argument with white space in it, use quotes:



$ python demo_argv.py &quot;this and&quot; that &quot;1 2&quot; 3 ['demo_argv.py', 'this and', 'that', '1 2', '3'] $ With <tt>argv</tt> we can write useful programs that take their input directly from the command line. For example, here is a program that finds the sum of a series of numbers:

In this program we use the <tt>from &lt;module&gt; import &lt;attribute&gt;</tt> style of importing, so <tt>argv</tt> is brought into the module's main namespace.

We can now run the program from the command prompt like this:

You are asked to write similar programs as exercises.

Exercises
<ol> <li> Complete the following: <ul> <li>Start the pydoc server with the command <tt>pydoc -g</tt> at the command prompt.</li> <li>Click on the open browser button in the pydoc tk window.</li> <li>Find the <tt>calendar</tt> module and click on it.</li> <li> While looking at the Functions section, try out the following in a Python shell: </li> <li> Experiment with <tt>calendar.isleap</tt>. What does it expect as an argument? What does it return as a result? What kind of a function is this? </li></ul>

Make detailed notes about what you learned from this exercise.</li> <li> If you don't have <tt>Tkinter</tt> installed on your computer, then <tt>pydoc -g</tt> will return an error, since the graphics window that it opens requires <tt>Tkinter</tt>. An alternative is to start the web server directly: $ pydoc -p 7464 This starts the pydoc web server on port 7464. Now point your web browser at: http://localhost:7464 and you will be able to browse the Python libraries installed on your system. Use this approach to start <tt>pydoc</tt> and take a look at the <tt>math</tt> module. <ol style="list-style-type: lower-alpha;"> <li>How many functions are in the <tt>math</tt> module?</li> <li>What does <tt>math.ceil</tt> do? What about <tt>math.floor</tt>? ( hint: both <tt>floor</tt> and <tt>ceil</tt> expect floating point arguments.)</li> <li>Describe how we have been computing the same value as <tt>math.sqrt</tt> without using the <tt>math</tt> module.</li> <li>What are the two data constants in the <tt>math</tt> module?</li></ol>

Record detailed notes of your investigation in this exercise.</li> <li>Use <tt>pydoc</tt> to investigate the <tt>copy</tt> module. What does <tt>deepcopy</tt> do? In which exercises from last chapter would <tt>deepcopy</tt> have come in handy?</li> <li> Create a module named <tt>mymodule1.py</tt>. Add attributes <tt>myage</tt> set to your current age, and <tt>year</tt> set to the current year. Create another module named <tt>mymodule2.py</tt>. Add attributes <tt>myage</tt> set to 0, and <tt>year</tt> set to the year you were born. Now create a file named <tt>namespace_test.py</tt>. Import both of the modules above and write the following statement:

When you will run <tt>namespace_test.py</tt> you will see either <tt>True</tt> or <tt>False</tt> as output depending on whether or not you've already had your birthday this year.</li> <li> Add the following statement to <tt>mymodule1.py</tt>, <tt>mymodule2.py</tt>, and <tt>namespace_test.py</tt> from the previous exercise:

Run <tt>namespace_test.py</tt>. What happens? Why? Now add the following to the bottom of <tt>mymodule1.py</tt>:

Run <tt>mymodule1.py</tt> and <tt>namespace_test.py</tt> again. In which case do you see the new print statement?</li> <li> In a Python shell try the following:

What does Tim Peter's have to say about namespaces?</li> <li>Use <tt>pydoc</tt> to find and test three other functions from the <tt>string</tt> module. Record your findings.</li> <li>Rewrite <tt>matrix_mult</tt> from the last chapter using what you have learned about list methods.</li> <li>The <tt>dir</tt> function, which we first saw in Chapter 7, prints out a list of the attributes of an object passed to it as an argument. In other words, <tt>dir</tt> returns the contents of the namespace of its argument. Use <tt>dir(str)</tt> and <tt>dir(list)</tt> to find at least three string and list methods which have not been introduced in the examples in the chapter. You should ignore anything that begins with double underscore (__) for the time being. Be sure to make detailed notes of your findings, including names of the new methods and examples of their use. ( hint: Print the docstring of a function you want to explore. For example, to find out how <tt>str.join</tt> works, <tt>print str.join.__doc__</tt>)</li> <li> Give the Python interpreter's response to each of the following from a continuous interpreter session: <dl> <dt>a.</dt> <dd></dd> <dt>b.</dt> <dd></dd> <dt>c.</dt> <dd></dd> <dt>d.</dt> <dd></dd> <dt>e.</dt> <dd></dd></dl>

Be sure you understand why you get each result. Then apply what you have learned to fill in the body of the function below using the <tt>split</tt> and <tt>join</tt> methods of <tt>str</tt> objects:

Your solution should pass all doctests.</li> <li> Create a module named <tt>wordtools.py</tt> with the following at the bottom:

Explain how this statement makes both using and testing this module convenient. What will be the value of <tt>__name__</tt> when <tt>wordtools.py</tt> is imported from another module? What will it be when it is run as a main program? In which case will the doctests run? Now add bodies to each of the following functions to make the doctests pass:

Save this module so you can use the tools it contains in your programs.</li> <li>unsorted_fruits.txt_ contains a list of 26 fruits, each one with a name that begins with a different letter of the alphabet. Write a program named <tt>sort_fruits.py</tt> that reads in the fruits from <tt>unsorted_fruits.txt</tt> and writes them out in alphabetical order to a file named <tt>sorted_fruits.txt</tt>.</li> <li> Answer the following questions about <tt>countletters.py</tt>: <ol style="list-style-type: lower-alpha;"> <li> Explain in detail what the three lines do:

What is would <tt>type(text)</tt> return after these lines have been executed?</li> <li>What does the expression <tt>128 * [0]</tt> evaluate to? Read about ASCII_ in Wikipedia and explain why you think the variable, <tt>counts</tt> is assigned to <tt>128 * [0]</tt> in light of what you read.</li> <li> What does

do to <tt>counts</tt>?</li> <li>Explain the purpose of the <tt>display</tt> function. Why does it check for values <tt>10</tt>, <tt>13</tt>, and <tt>32</tt>? What is special about those values?</li> <li> Describe in detail what the lines

do. What will be in <tt>alice_counts.dat</tt> when they finish executing?</li> <li> Finally, explain in detail what

does. What is the purpose of <tt>if counts[i]</tt>? </li></ol> </li> <li> Write a program named <tt>mean.py</tt> that takes a sequence of numbers on the command line and returns the mean of their values.: $ python mean.py 3 4 3.5 $ python mean.py 3 4 5 4.0 $ python mean.py 11 15 94.5 22 35.625 A session of your program running on the same input should produce the same output as the sample session above.</li> <li> Write a program named <tt>median.py</tt> that takes a sequence of numbers on the command line and returns the median of their values.: $ python median.py 3 7 11 7 $ python median.py 19 85 121 85 $ python median.py 11 15 16 22 15.5 A session of your program running on the same input should produce the same output as the sample session above.</li> <li> Modify the <tt>countletters.py</tt> program so that it takes the file to open as a command line argument. How will you handle the naming of the output file? </li></ol>