Sometimes a client calls for help with a performance problem. On one particular Tuesday, a client called about a long-running query. Actually they basically said “What’s wrong with DB2? my query is running long! Are there lock timeouts happening?”
The query came from a new application implemented just the weekend before. When I asked how long the query was running, the answer was “more than the three-minute time-out”. This is a transaction processing database, so three minutes is generally not acceptable.
The Query
The query in this case was amazingly simple – joining only three tables:
select distinct(driver_id),
(select max(pos_timestamp) from schema1.position as pos2
where pos2.driver_id = pos.driver_id) as pos_timestamp,
(select max(pos_timestamp) from schema1.breadcrumb as bc
where bc.driver_id = pos.driver_id) as last_breadcrumb_ts
from schema1.position as pos where pos_timestamp > current_timestamp - 6 hours
order by pos_timestamp
Explaining this statement gives a fairly ugly explain plan, although still simple:
Access Plan:
-----------
Total Cost: 1.16818e+006
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
5034
TBSCAN
( 2)
1.16818e+006
125360
|
5034
SORT
( 3)
1.16818e+006
125360
|
27300.6
NLJOIN
( 4)
1.16817e+006
125360
/----------+----------
27300.6 1
NLJOIN GRPBY
( 5) ( 12)
1.08987e+006 529.777
125115 245
/-------+-------- |
27300.6 1 1.42729
FETCH GRPBY TBSCAN
( 6) ( 10) ( 13)
2612.27 39.9434 529.777
871.255 4.55094 245
/----+---- | |
27300.6 9.61044e+007 19091.1 7185
RIDSCN TABLE: SCHEMA1 IXSCAN TABLE: SCHEMA1
( 7) POSITION ( 11) BREADCRUMB
905.859 Q7 1534 Q1
102.546 174.777
| |
27300.6 9.61044e+007
SORT INDEX: SCHEMA1
( 8) POS_DRIVER_NDX
905.859 Q4
102.546
|
27300.6
IXSCAN
( 9)
899.457
102.546
|
9.61044e+007
INDEX: SCHEMA1
IDX_POSITION__POS_TIMESTAMP_03062015
Q7
Looking at this explain plan, we can see that most of the expense comes in with operator #5 – an NLJOIN that is joining the POSITION table to ITSELF.
Rewriting
I immediately thought that rewriting might help this particular query significantly. If I could just make that join more efficient somehow. Both accesses to the table were through indexes, and one of them through index-only access.
I first tried to break out the correlated subqueries to Common Table Expressions(CTE). This bumped up my timeron count to over 5 million – 5 times worse than the original. We can’t all come up with the perfect answer the first time. Then as I was continuing to look at the query, I realized that I could write the distinct as a group-by instead. I rewrote the query to this:
select pos.driver_id
, max(pos_timestamp) as max_pos_timestamp
,(select max(pos_timestamp) from schema1.breadcrumb as bc
where bc.driver_id = pos.driver_id) as last_breadcrumb_ts
from schema1.position as pos
where pos_timestamp > current_timestamp - 6 hours
group by pos.driver_id
order by max_pos_timestamp
In every scenario that I could come up with, the results from the two queries were the same. I asked the application owner to verify that the results of this other way of writing the query were indeed what he needed.
The cost of the second query was just 17,489 Timerons! That was a 98.5% reduction in the cost of the query. Here’s what the explain plan looks like for the rewritten query:
Access Plan:
-----------
Total Cost: 17489.1
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
5034
NLJOIN
( 2)
17489.1
1116.25
/----+-----
5034 1
TBSCAN GRPBY
( 3) ( 12)
2622.28 529.777
871.255 245
| |
5034 1.42729
SORT TBSCAN
( 4) ( 13)
2622.12 529.777
871.255 245
| |
5034 7185
GRPBY TABLE: SCHEMA1
( 5) BREADCRUMB
2620.41 Q1
871.255
|
5034
TBSCAN
( 6)
2620.24
871.255
|
5034
SORT
( 7)
2620.07
871.255
|
27300.6
FETCH
( 8)
2612.27
871.255
/----+----
27300.6 9.61044e+007
RIDSCN TABLE: SCHEMA1
( 9) POSITION
905.859 Q4
102.546
|
27300.6
SORT
( 10)
905.859
102.546
|
27300.6
IXSCAN
( 11)
899.457
102.546
|
9.61044e+007
INDEX: SCHEMA1
IDX_POSITION__POS_TIMESTAMP_03062015
Q4
According to the DB2 index advisor, there are also indexes that I can add to reduce each side of the explain to index-only access, further reducing the cost by about 80%.
Summary
Some say that the DB2 optimizer is so good that you don’t have to rewrite queries. A significant portion of the time I find that to be true, but there are some edge cases like this type of query that are the exception. An SQL code review with a talented DB2 DBA can cut minutes or hours off of query execution time.