Python Programming/Databases

Python has support for working with databases via a simple API. Modules included with Python include modules for SQLite and Berkeley DB. Modules for MySQL, PostgreSQL , FirebirdSQL and others are available as third-party modules. The latter have to be downloaded and installed before use. The package MySQLdb can be installed, for example, using the debian package "python-mysqldb".

MySQL
An Example with MySQL would look like this:

On the first line, the Module MySQLdb is imported. Then a connection to the database is set up and on line 4, we save the actual SQL statement to be executed in the variable query. On line 5 we execute the query and on line 6 we fetch all the data. After the execution of this piece of code, lines contains the number of lines fetched (e.g. the number of rows in the table sampletable). The variable data contains all the actual data, e.g. the content of sampletable. In the end, the connection to the database would be closed again. If the number of lines are large, it is better to use row = cursor.fetchone and process the rows individually:

Obviously, some kind of data processing has to be used on row, otherwise the data will not be stored. The result of the fetchone command is a Tuple.

In order to make the initialization of the connection easier, a configuration file can be used:

Here, the file .my.cnf in the home directory contains the necessary configuration information for MySQL.

Sqlite
An example with sqlite is very similar to the one above and the cursor provides many of the same functionalities.

When writing to the db, one has to remember to call db.commit, otherwise the changes are not saved:

Parameter Quoting
You will frequently need to substitute dynamic data into a query string. It is important to ensure this is done correctly.

This example is wrong, because it doesn’t correctly deal with special characters, like apostrophes, in the string being substituted. If your code has to deal with potentially hostile users (like on a public-facing Web server), this could leave you open to an SQL injection attack.

For simple cases, use the automatic parameter substitution provided by the  method, e.g.

The DBMS interface itself will automatically convert the values you pass into the correct SQL syntax.

For more complex cases, the DBMS module should provide a quoting function that you can explicitly call. For example, MySQLdb provides the  method, while APSW (for SQLite3) provides. This is necessary where the query structure takes a more dynamic form:

This will dynamically construct queries like “select name from employees where company = &#39;some company&#39;” or “select name from employees where company = &#39;some company&#39; and department = &#39;some department&#39;”, depending on which fields have been filled in by the user.

Use Iterators
Python iterators are a natural fit for the problem of iterating over lots of database records. Here is an example of a function that performs a database query and returns an iterator for the results, instead of returning them all at once. It relies on the fact that, in APSW (the Python 3 interface library for SQLite), the  method itself returns an iterator for the result records. The result is that you can write very concise code for doing complex database queries in Python.

Example uses of this function:

and

In the first example, since  returns a tuple for each record, this can be directly assigned to individual variables for the record fields. In the second example, the tuple has only one element, so a custom  is used to extract this element and return it instead of the tuple.

Never Use “SELECT *” in a Script
Database table definitions are frequently subject to change. As application requirements evolve, fields and even entire tables are often added, or sometimes removed. Consider a statement like

You may happen to know that the <TT>employees</TT> table currently contains, say, 4 fields. But tomorrow someone may add a fifth field. Did you remember to update your code to deal with this? If not, it’s liable to crash. Or even worse, produce an incorrect result!

Better to always list the specific fields you’re interested in, no matter how many there are:

That way, any extra fields added will simply be ignored. And if any of the named fields are removed, the code will at least fail with a runtime error, which is a good reminder that you forgot to update it!

Looping on Field Breaks
Consider the following scenario: your sales company database has a table of employees, and also a table of sales made by each employee. You want to loop over these sale entries, and produce some per-employee statistics. A naïve approach might be:


 * Query the database to get a list of employees
 * For each employee, do a database query to get the list of sales for each employee.

If you have a lot of employees, then the first query may produce a large list, and the second step will involve a correspondingly large number of database queries.

In fact, the entire processing loop can run off a single database query, using the standard SQL construct called a.

Here is what an example of such a loop could look like:

Here the statistics are quite simple: earliest and latest sale, and number and total amount of sales, and could be computed directly within the SQL query. But the same loop could compute more complex statistics (like standard deviation) that cannot be represented directly within a simple SQL query.

Note how the statistics for each employee are written out under either of two conditions:
 * The employee name of the next record is different from the previous one
 * The end of the query results has been reached.

Both conditions are tested with ; after writing out the employee statistics, a separate check for the second condition   is used to terminate the loop. If the loop doesn’t terminate, then processing is initialized for the new employee.

Note also the use of a  in this case: if an employee has had no sales, then the join will return a single row for that employee, with SQL   values (represented by   in Python) for the fields from the <TT>sales</TT> table. This is why we need checks for such  values before processing those fields.

Alternatively, we could have used an, which would have returned no results for an employee with no sales. Whether you want to omit such an employee from your report, or include them with totals of zero, is really up to your application.