How Db2 Handles SQL
SQL is a powerful language in the hands of the right RDBMS. One of SQL’s strengths is that you specify what data you want returned without specifying what file, disk, san, or sector that data is stored on. Because we describe the result set we want, Db2 has a powerful cost-based optimizer to translate that into how exactly to retrieve, format, and combine data to produce that result set. There are a number of things that can make that process go more slowly. Avoiding these common issues can make a drastic difference in overall Db2 database performance.
Selecting Too Much Data
SELECT *
is the bane of a DBA’s existence. This is true across platforms. Naming only the specific columns needed can vastly improve query performance because all the interim steps of the query have less data to work with, requiring less memory and fewer CPU cycles. This is even more true in BLU environments.
Locking More Than You Need
Db2 is more likely to lock than some other RDBMSes. We see this as an advantage – we value data integrity above all else. I’m not sure that developers or DBAs from other platforms agree with this. This means that when working with Db2, developers and DBAs need to be even more aware of isolation levels and the impact that they have.
Whenever your use of the data allows it, use the Uncommitted Read (UR) isolation level. This isolation level incurs the least possible locking, and is compatible with the vast majority of other locks. You can do this by tacking with ur
on to the end of your statements or by the way an application is bound to the database.
Even if UR does not work for you, be aware of the isolation level in use. Often administrative GUIs default to RR, which can cause serious concurrency issues and negative locking phenomena. Some applications are written for CS but then default to RS. Look into what isolation levels your application supports and verify what isolation levels are actually in use.
Be sure to monitor your database for negative locking phenomena like lock escalations, deadlocks, lock timeouts, and even excessive lock wait times. Consider forcing off connections that are long-idle but holding on to locks or log files to prevent database outages and problems.
Applying Functions to Column Data
Most of the time, when a function is applied to column data, it prevents Db2 from being able to use an index. If an index is used, and it is not a function-based index, it is likely that Db2 is doing a leaf-page scan instead of quick and efficient index access. But some SQL can be rewritten to apply some functions differently to avoid applying the function to the column data.
Here’s a real-world example. This condition:
(DAYS(CURRENT TIMESTAMP) - DAYS(lastupdate)) >= 30
can be functionally re-written like this for most scenarios:
lastupdate <= CURRENT TIMESTAMP - 30 DAYS
The way I think of this is that columns go on the left while functions go on the right. That’s an over-simplification. Not every condition can be re-written this way, but when they can, the performance impact is astonishing.
Avoid Data Type Conversions
Data type conversions are often automatically handled by Db2 these days. However, they are still expensive. Db2 will properly compare ‘5’ to 5 and determine that they are the same thing, however, it may take twice as long as comparing 5 to 5 or ‘5’ to ‘5’.
In one situation, a client had a column that was defined in some tables as a SMALLINT and in other tables as char(1). These tables were frequently joined on this column. Some complex queries went from minutes of elapsed execution time down to seconds or sub-second just by changing the tables to make the column all the same data type.
Keep in mind data type conversions when considering what data type to use for a column.
Excessive Sorting
There are certainly queries where sorting is required, but sometimes it just doesn’t matter. Maybe the application sorts the data after retrieving it anyway. Maybe the “ORDER BY” on one query was just copied because it was on another query.
Sorts consume memory and CPU cycles, and in extreme cases, temporary storage. For Db2 performance, sort is a four-letter word. If an “ORDER BY” clause can be left off, it should be.
Inadvertent Cartesian Products
Steps should be taken to avoid Cartesian products. Cartesian products can really blow out memory areas and even temp space. Most of the time, a Cartesian product happens by mistake when a join condition is left off. One way to avoid that kind of mistake is by using explicit join syntax. Instead of:
FROM table1, table2
WHERE table1.cola=table2.colb
It is safer to use:
FROM table1
JOIN table2 ON table1.cola=table2.colb
This makes it much harder to accidentally delete a join condition when working on a query and helps properly define each table’s relationship when it is added into the FROM clause.
Proper use of Parameter Markers
Parameter markers, when used properly, can make a huge difference in database performance. For very frequently executed SQL, using parameter markers means that the access plan can be calculated only once and reused for different sets of values. This puts place-holders in the SQL for the variables you choose to use parameter markers for. This saves SQL compilation time and memory for storing access plans. Parameter markers show up in queries as question marks. Within the same query, it may make sense to use parameter markers for some variables and not for others. Not only does proper use of parameter markers improve SQL performance, but it may reduce the hardware you need to run your database workload.