MySQL/Optimization

Before Starting To Optimise
When the database seems to be "slow" first consider all of the following points as e.g. making a certain query absolutely unnecessary by simply using a more sophisticated algorithm in the application is always the most elegant way of optimising it :)
 * 1) Finding the bottleneck (CPU, memory, I/O, which queries)
 * 2) Optimising the application (remove unnecessary queries or cache PHP generated web pages)
 * 3) Optimising the queries (using indices, temporary tables or different ways of joining)
 * 4) Optimising the database server (cache sizes etc)
 * 5) Optimising the system (different filesystem types, swap space and kernel versions)
 * 6) Optimising the hardware (sometimes indeed the cheapest and fastest way)

To find those bottlenecks the following tools have been found to be helpful:
 * vmstat
 * to quickly monitor cpu, memory and I/O usage and decide which is the bottleneck


 * top
 * to check the current memory and cpu usage of mysqld as well as of the applications


 * mytop
 * to figure out which queries cause trouble


 * mysql-admin (the GUI application, not to confuse with mysqladmin)
 * to monitor and tune mysql in a very convenient way


 * mysqlreport
 * which output should be use as kind of step by step check list

Using these tools most applications can also be categorised very broadly using the following groups:
 * I/O based and reading (blogs, news)
 * I/O based and writing (web access tracker, accounting data collection)
 * CPU based (complex content management systems, business apps)

Optimising the Tables
Use the following command regularly to reorganize the disk space which reduces the table size without deleting any record :

Moreover, when creating the tables, their smallest types are preferable. For example:
 * if a number is always positive, choose an  type to be able to store twice more into the same number of bytes.
 * to store the contemporaneous dates (from 1970 to 2038), it's better to take a  on four bytes, than a   on 8.

Comparing functions with BENCHMARK
The BENCHMARK function can be used to compare the speed of MySQL functions or operators. For example: mysql> SELECT BENCHMARK(100000000, CONCAT('a','b')); +---+ | BENCHMARK(100000000, CONCAT('a','b')) | +---+ |                                    0 | +---+ 1 row in set (21.30 sec)

However, this cannot be used to compare queries: mysql> SELECT BENCHMARK(100, SELECT `id` FROM `lines`); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT `id` FROM `lines`)' at line 1

As MySQL needs a fraction of a second just to parse the query and the system is probably busy doing other things, too, benchmarks with runtimes of less than 5-10s can be considered as totally meaningless and equally runtimes differences in that order of magnitude as pure chance.

Analysing functions with EXPLAIN
When you precede a SELECT statement with the keyword EXPLAIN, MySQL explains how it would process the SELECT, providing information about how tables are joined and in which order. This allows to place some eventual hints in function.

Using and understanding EXPLAIN is essential when aiming for good performance therefore the relevant chapters of the official documentation are a mandatory reading!

A simple example
The join of two table that both do not have indices:

Now the second table gets an index and the explain shows that MySQL now knows that only 2 of the 3 rows have to be used.

Now the first table also gets an index so that the WHERE condition can be improved and MySQL knows that only 1 row from the first table is relevant before even trying to search it in the data file.

Status and server variables
MySQL can be monitored and tuned by watching the status-variables and setting the server-variables which can both be global or per session. The status-variables can be monitored by SHOW [GLOBAL|SESSION] STATUS [LIKE '%foo%'] or mysqladmin [extended-]status. The server-variables can be set in the /etc/mysql/my.cnf file or via SET [GLOBAL|SESSION] VARIABLE foo := bar and be shown with mysqladmin variables or SHOW [GLOBAL|SESSION] VARIABLES [LIKE '%foo%'].

Generally status variables start with a capital letter and server variables with a lowercase one.

When dealing with the above mentioned per-session system variables it should always be considered that those have to be multiplied by max_connections to estimate the maximal memory consumption. Failing to do so can easily lead to server crashes at times of load peaks when more than usual clients connect to the server! A quick and dirty estimation can be made with the following formular:

min_memory_needed = global_buffers + (thread_buffers * max_connections)

global_buffers: key_buffer innodb_buffer_pool innodb_log_buffer innodb_additional_mem_pool net_buffer

thread_buffers: sort_buffer myisam_sort_buffer read_buffer join_buffer read_rnd_buffer

Note: Especially when dealing with server settings, all information should be verified in the respective chapters of the official documentation as these are subject of change and the authors of this text lack confirmed knowledge about how the server works internally.

Index / Indices
Indices are a way to locate elements faster. This works for single elements as well as range of elements.

Experiment
Note: when you make your time tests, make sure the query cache is disabled ( in my.cnf) to force recomputing your queries each time you type them instead of just taking the pre-computed results from the cache.

Let's run the following Perl program:

What does it do? It simulate the data feeds from an industrial lines that weight stuff at regular intervals so we can compute the average material usage. Over time lots of records are piling up.

How to use it? We can check the number of elements with: The size must be important: Incidentally restoring from the dump is way faster, because it uses extended inserts!

This SQL command will scan all records to get a total sum: Let's say we need to compute the total material used during January 1st 2008: MySQL will also need to browse the entire database, even for this tiny number of records. This is because records can be anywhere: at the bottom, at the end, in the middle, nothing guarantees that the records are ordered.

To improve this, we can add an index to the 'date' field. This means MySQL will create a new hidden table with all the date sorted chronologically, and store their offset (position) in the 'weightin' table to retrieve the full record.

Because the index is sorted, it's way faster for MySQL to locate a single record (using a binary search algorithm) or even a range of data (find the first and last element, the range is in-between).

To add the index: The index doesn't work if the query needs computer on the field (e.g. ) but works for ranges (e.g.  ).

You can notice that the .MYD file grew: That's were MySQL stores the indices. Initially there was an index for the 'id' field, which the case for all primary keys.

Another example
Another example: let's say we want to optimise this query:

We can do so by adding an index on the 'line' field, in order to group the doubles together, which will avoid the query to rescan the whole table to localize them:

The index file grew:

General considerations
The first and foremost question that is always asked for SELECT queries is always if indices (aka "keys") are configured and if they are, whether or not they are actually be used by the database server.

Individual queries can be checked with the "EXPLAIN" command. For the whole server the "Sort_%" variables should be monitored as they indicate how often MySQL had to browse through the whole data file because there was no usable index available.
 * 1. Check if the indices are actually used

Keeping the indices in memory improves read performance a lot. The quotient of "Key_reads / Key_read_requests" tells how often MySQL actually accessed the index file on disk when it needed a key. Same goes for Key_writes, use mysqlreport to do the math for you here. If the percentage is too high, key_buffer_size for MyISAM and innodb_buffer_pool_size for InnoDB are the corresponding variables to tune.
 * 2. Are the indices buffered

The Key_blocks_% variables can be used to see how much of the configured key buffer is actually used. The unit is 1KB if not set otherwise in key_cache_block_size. As MySQL uses some blocks internally, key_blocks_unused has to be checked. To estimate how big the buffer should be, the sizes of the relevant .MYI files can be summed up. For InnoDB there is innodb_buffer_pool_size although in this case not only the indices but also the data gets buffered.


 * 3. Further settings

sort_buffer_size (per-thread) is the memory that is used for ORDER BY and GROUP BY. myisam_sort_buffer_size is something completely different and should not be altered.

read_buffer_size (per-thread) is the size of memory chunks that are read from disk into memory at once when doing a full table scan as big tables do not fit into memory completely. This seldomly needs tuning.

Query cache
The main reason not to use any MySQL version below 4.0.1 if you have read-based applications is that beginning with that version, MySQL has the ability to store the result of SELECT queries until their tables are modified.

The Query Cache can be configured with the query_cache_% variables. Most important here are the global query_cache_size and query_cache_limit which prevents single queries with unusual big results larger than this size to use up the whole cache.

Note that the Query Cache blocks have a variable size whose minimum size is query_cache_min_res_unit, so after a complete cache flush the number of free blocks is ideally just one. A large value of Qcache_free_blocks just indicates a high fragmentation.

Worth monitoring are the following variables:
 * Qcache_free_blocks
 * If this value is high it indicates a high fragmentation which does not need to be a bad thing though.


 * Qcache_not_cached
 * If this value is high there are either much uncachable queries (e.g. because they use functions like now) or the value for query_cache_limit is too low.


 * Qcache_lowmem_prunes
 * This is the number of old results that have been purged because the cache was full and not because their underlying tables have been modified. query_cache_size must be increased to lower this variable.

Examples:

An empty cache: mysql> SHOW VARIABLES LIKE 'query_cache_type'; +--+---+ | Variable_name   | Value | +--+---+ | query_cache_type | ON   | +--+---+ 1 row in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'query_cache_size'; +--+---+ | Variable_name   | Value | +--+---+ | query_cache_size | 0    | +--+---+ 1 row in set (0.00 sec) mysql> SHOW STATUS LIKE 'Qcache%'; +-+---+ | Variable_name          | Value | +-+---+ | Qcache_free_blocks     | 0     | | Qcache_free_memory     | 0     | | Qcache_hits            | 0     | | Qcache_inserts         | 0     | | Qcache_lowmem_prunes   | 0     | | Qcache_not_cached      | 0     | | Qcache_queries_in_cache | 0    | | Qcache_total_blocks    | 0     | +-+---+ 8 rows in set (0.00 sec)

A used cache (savannah.gnu.org): mysql> SHOW VARIABLES LIKE "query_cache_size"; +--+--+ | Variable_name   | Value    | +--+--+ | query_cache_size | 33554432 | +--+--+ 1 row in set (0.00 sec)

mysql> SHOW STATUS LIKE "Qcache%"; +-+--+ | Variable_name          | Value    | +-+--+ | Qcache_free_blocks     | 1409     | | Qcache_free_memory     | 27629552 | | Qcache_hits            | 7925191  | | Qcache_inserts         | 3400435  | | Qcache_lowmem_prunes   | 2946778  | | Qcache_not_cached      | 71255    | | Qcache_queries_in_cache | 4546    | | Qcache_total_blocks    | 10575    | +-+--+ 8 rows in set (0.00 sec)

The matching  configuration parameter is: query_cache_size = 32M

To clear the cache (useful when testing a new query's efficiency): mysql> RESET QUERY CACHE; Query OK, 0 rows affected (0.00 sec)

Waiting for locks
The Table_locks_% variables show the number of queries that had to wait because the tables they tried to access where currently locked by other queries. These situations can be caused by "LOCK TABLE" statements and also by e.g. simultaneous write accesses to the same table.

Table cache
MySQL needs a certain time just to "open" a table and read its meta data like column names etc.

If many threads are trying to access the same table, it is opened multiple times.

To speed this up the meta data can be cached in the table_cache (alias table_open_cache since MySQL 5.1.3).

A good value for this setting is the number of max_connections multiplied with the number of usually used tables per SELECT.

Using mysqlreport or by looking at the currently Open_tables and ever since Opened_tables as well as the Uptime the number of necessary table opens per second can be calculated (consider the off-peak times like nights though).

Connections and threads
For every client connection (aka session) MySQL creates a separated thread under the main mysqld process. For big sites with several hundred new connections per second, creating the threads itself can consume a significant amount of time. To speed things up, idle threads can be cached after their client disconnected. As a rule of thumb not more than one thread per second should be newly created. Clients that send several queries to the server should use persistent connections like with PHPs mysql_pconnect function.

This cache can be configured by thread_cache_size and monitored with the threads_% variables.

To avoid overloads MySQL blocks new connections if more than max_connections are currently in use. Start with max_used_connections and monitor the number of connection that were rejected in Aborted_clients and the ones that timed out in Aborted_connections. Forgotten disconnects from clients that use persistent connections can easily lead to a denial of service situation so be aware! Normally connections are closed after wait_timeout seconds of being idle.

Temporary tables
It is perfectly normal that MySQL creates temporary tables while sorting or grouping results. Those tables are either be held in memory or if too large be written to disk which is naturally much slower. The number of disk tables among the Created_tmp_% variables should be neglectible or else the settings in max_heap_table_size and tmp_table_size be reconsidered.

Delayed writes
In situations like writing webserver access log files to a database, with many subsequent INSERT queries for rather unimportant data into the same table, the performance can be improved by advising the server to cache the write requests a little while and then send a whole batch of data to disk.

Be aware though that all mentioned methods contradicts ACID compliance because INSERT queries are acknowledged with OK to the client before the data has actually be written to disk and thus can still get lost in case of an power outage or server crash. Additionally the side effects mentioned in the documentation often reads like a patient information leaflet of a modern medicament...

MyISAM tables can be given the DELAY_KEY_WRITE option using CREATE or ALTER TABLE. The drawback is that after a crash the table is automatically marked as corrupt and has to be checked/repaired which can take some time.

InnoDB can be told with innodb_flush_log_at_trx_commit to delay writing the data a bit. In case of a server crash the data itself is supposed to be still consistent, just the indices have to be rebuilt.

INSERT DELAYED works on main Storage Engines on a per query base.