How to Think Like a Computer Scientist: Learning with Python 2nd Edition/Classes and objects

= Classes and objects =

Object-oriented programming
Python is an object-oriented programming language, which means that it provides features that support object-oriented programming_ ( OOP).

Object-oriented programming has its roots in the 1960s, but it wasn't until the mid 1980s that it became the main programming paradigm_ used in the creation of new software. It was developed as a way to handle the rapidly increasing size and complexity of software systems, and to make it easier to modify these large and complex systems over time.

Up to now we have been writing programs using a procedural programming_ paradigm. In procedural programming the focus is on writing functions or procedures which operate on data. In object-oriented programming the focus is on the creation of objects which contain both data and functionality together.

User-defined compound types
A class in essence defines a new data type. We have been using several of Python's built-in types throughout this book, we are now ready to create our own user-defined type: the Point.

Consider the concept of a mathematical point. In two dimensions, a point is two numbers (coordinates) that are treated collectively as a single object. In mathematical notation, points are often written in parentheses with a comma separating the coordinates. For example, (0, 0) represents the origin, and (x, y) represents the point x units to the right and y units up from the origin.

A natural way to represent a point in Python is with two numeric values. The question, then, is how to group these two values into a compound object. The quick and dirty solution is to use a list or tuple, and for some applications that might be the best choice.

An alternative is to define a new user-defined compound type, also called a class. This approach involves a bit more effort, but it has advantages that will be apparent soon.

A class definition looks like this:

Class definitions can appear anywhere in a program, but they are usually near the beginning (after the import statements). The syntax rules for a class definition are the same as for other compound statements. There is a header which begins with the keyword, class, followed by the name of the class, and ending with a colon.

This definition creates a new class called Point. The pass statement has no effect; it is only necessary because a compound statement must have something in its body.

By creating the <tt>Point</tt> class, we created a new type, also called <tt>Point</tt>. The members of this type are called instances of the type or objects. Creating a new instance is called instantiation. To instantiate a <tt>Point</tt> object, we call a function named (you guessed it) <tt>Point</tt>:

The variable <tt>p</tt> is assigned a reference to a new <tt>Point</tt> object. A function like <tt>Point</tt> that creates new objects is called a constructor.

Attributes
Like real world objects, object instances have both form and function. The form consists of data elements contained within the instance.

We can add new data elements to an instance using dot notation:

This syntax is similar to the syntax for selecting a variable from a module, such as <tt>math.pi</tt> or <tt>string.uppercase</tt>. Both modules and instances create their own namespaces, and the syntax for accessing names contained in each, called attributes, is the same. In this case the attribute we are selecting is a data item from an instance.

The following state diagram shows the result of these assignments:

The variable <tt>p</tt> refers to a Point object, which contains two attributes. Each attribute refers to a number.

We can read the value of an attribute using the same syntax:

The expression <tt>p.x</tt> means, Go to the object <tt>p</tt> refers to and get the value of <tt>x</tt>. In this case, we assign that value to a variable named <tt>x</tt>. There is no conflict between the variable <tt>x</tt> and the attribute <tt>x</tt>. The purpose of dot notation is to identify which variable you are referring to unambiguously.

You can use dot notation as part of any expression, so the following statements are legal:

The first line outputs <tt>(3, 4)</tt>; the second line calculates the value 25.

The initialization method and <tt>self</tt>
Since our <tt>Point</tt> class is intended to represent two dimensional mathematical points, all point instances ought to have <tt>x</tt> and <tt>y</tt> attributes, but that is not yet so with our <tt>Point</tt> objects.

To solve this problem we add an initialization method to our class.

Instances as parameters
You can pass an instance as a parameter in the usual way. For example:

<tt>print_point</tt> takes a point as an argument and displays it in the standard format. If you call <tt>print_point(blank)</tt>, the output is <tt>(3, 4)</tt>.

Sameness
The meaning of the word same seems perfectly clear until you give it some thought, and then you realize there is more to it than you expected.

For example, if you say, Chris and I have the same car, you mean that his car and yours are the same make and model, but that they are two different cars. If you say, Chris and I have the same mother, you mean that his mother and yours are the same person.

When you talk about objects, there is a similar ambiguity. For example, if two <tt>Point</tt> s are the same, does that mean they contain the same data (coordinates) or that they are actually the same object?

To find out if two references refer to the same object, use the <tt>==</tt> operator. For example:

Even though <tt>p1</tt> and <tt>p2</tt> contain the same coordinates, they are not the same object. If we assign <tt>p1</tt> to <tt>p2</tt>, then the two variables are aliases of the same object:

This type of equality is called shallow equality because it compares only the references, not the contents of the objects.

To compare the contents of the objects --- deep equality --- we can write a function called <tt>same_point</tt>:

Now if we create two different objects that contain the same data, we can use <tt>same_point</tt> to find out if they represent the same point.

Of course, if the two variables refer to the same object, they have both shallow and deep equality.

Rectangles
Let's say that we want a class to represent a rectangle. The question is, what information do we have to provide in order to specify a rectangle? To keep things simple, assume that the rectangle is oriented either vertically or horizontally, never at an angle.

There are a few possibilities: we could specify the center of the rectangle (two coordinates) and its size (width and height); or we could specify one of the corners and the size; or we could specify two opposing corners. A conventional choice is to specify the upper-left corner of the rectangle and the size.

Again, we'll define a new class:

And instantiate it:

This code creates a new <tt>Rectangle</tt> object with two floating-point attributes. To specify the upper-left corner, we can embed an object within an object!

The dot operator composes. The expression <tt>box.corner.x</tt> means, Go to the object <tt>box</tt> refers to and select the attribute named <tt>corner</tt>; then go to that object and select the attribute named <tt>x</tt>.

The figure shows the state of this object:



Instances as return values
Functions can return instances. For example, <tt>find_center</tt> takes a <tt>Rectangle</tt> as an argument and returns a <tt>Point</tt> that contains the coordinates of the center of the <tt>Rectangle</tt>:

To call this function, pass <tt>box</tt> as an argument and assign the result to a variable:

Objects are mutable
We can change the state of an object by making an assignment to one of its attributes. For example, to change the size of a rectangle without changing its position, we could modify the values of <tt>width</tt> and <tt>height</tt>:

Copying
Aliasing can make a program difficult to read because changes made in one place might have unexpected effects in another place. It is hard to keep track of all the variables that might refer to a given object.

Copying an object is often an alternative to aliasing. The <tt>copy</tt> module contains a function called <tt>copy</tt> that can duplicate any object:

Once we import the <tt>copy</tt> module, we can use the <tt>copy</tt> method to make a new <tt>Point</tt>. <tt>p1</tt> and <tt>p2</tt> are not the same point, but they contain the same data.

To copy a simple object like a <tt>Point</tt>, which doesn't contain any embedded objects, <tt>copy</tt> is sufficient. This is called shallow copying.

For something like a <tt>Rectangle</tt>, which contains a reference to a <tt>Point</tt>, <tt>copy</tt> doesn't do quite the right thing. It copies the reference to the <tt>Point</tt> object, so both the old <tt>Rectangle</tt> and the new one refer to a single <tt>Point</tt>.

If we create a box, <tt>b1</tt>, in the usual way and then make a copy, <tt>b2</tt>, using <tt>copy</tt>, the resulting state diagram looks like this:

This is almost certainly not what we want. In this case, invoking <tt>grow_rect</tt> on one of the <tt>Rectangles</tt> would not affect the other, but invoking <tt>move_rect</tt> on either would affect both! This behavior is confusing and error-prone.

Fortunately, the <tt>copy</tt> module contains a method named <tt>deepcopy</tt> that copies not only the object but also any embedded objects. You will not be surprised to learn that this operation is called a deep copy.

Now <tt>b1</tt> and <tt>b2</tt> are completely separate objects.

We can use <tt>deepcopy</tt> to rewrite <tt>grow_rect</tt> so that instead of modifying an existing <tt>Rectangle</tt>, it creates a new <tt>Rectangle</tt> that has the same location as the old one but new dimensions:

Exercises

 * 1) Create and print a <tt>Point</tt> object, and then use <tt>id</tt> to print the object's unique identifier. Translate the hexadecimal form into decimal and confirm that they match.
 * 2) Rewrite the <tt>distance</tt> function from chapter 5 so that it takes two <tt>Point</tt> s as parameters instead of four numbers.
 * 3) Write a function named <tt>move_rect</tt> that takes a <tt>Rectangle</tt> and two parameters named <tt>dx</tt> and <tt>dy</tt>. It should change the location of the rectangle by adding <tt>dx</tt> to the <tt>x</tt> coordinate of <tt>corner</tt> and adding <tt>dy</tt> to the <tt>y</tt> coordinate of <tt>corner</tt>.
 * 4) Rewrite <tt>move_rect</tt> so that it creates and returns a new <tt>Rectangle</tt> instead of modifying the old one.