2020-02 Db2 UPDATE column: UPDATEs for nothing and CPU ain’t free!
A bad misquote of a great Dire Straits song, but it is one great thing I saw last year!
What is an Update?
We all know what an update is, right? You have a column containing some value and you wish to update it to a new – different – value. You code an UPDATE with SET and all is done.
But, what happens “under the covers” when the column value you are updating is exactly the same as the new value?
Suspected real-time abuse
The problem surfaced gradually… as all good problems do… The DBAs were wondering why an SQL that was executed half a million times per day was waiting for other threads for so long. (This was discovered using our WorkLoadExpert (WLX)). The wait times were frighteningly high, and so it was decided that this SQL should be the target of some sort of tuning effort.
It then came out, while gathering basic tablespace statistical data, that:
the related tablespace had not been REORGed for over six months when, according to SEGs WorkLoadExpert, there were half a million updates every day against it!
RTS or SEG Bug?
Naturally the first idea is it must be a bug.
- Either the Real-time Statistics (Not incrementing the REORGUPDATES counter – It could even be NULL for example) or,
- heaven forbid, a bug in our WorkLoadExpert.
I took a closer look at the SQL:
UPDATE aaa.bbb SET COL1 = ? WHERE COL1 = ? ;
and just sort of wondered out loud,
“They are not using the same value in both parameter markers are they?”
The “they” in this case was the developer of course…
Oh My Word!
After a quick e-mail discussion, it then came out that, that was indeed the case! Db2 is clever but sometimes not that brilliant! The developer had had the idea of executing this SQL to “see” if the value existed or not… He did not think about what Db2 then actually does…
Under the Covers
Db2 does not “know” what the current value of COL1 is. It used, in this case, Index access to get and obtain an X lock on the target page – (this was then the reason for the very large wait times on the other threads!). Once the lock was held, it could then discover that there was *nothing* to do, and so it did nothing! Then it happily released the lock(s) after doing nothing and returned SQLCODE 0.
No Log data was written as nothing was done, and REORGUPDATES was not incremented as nothing was done, but the CPU/Elapsed overhead was enormous!
The right way
The head DBA has said the SQL should look like:
SELECT ‘A’ FROM aaa.bbb WHERE COL1 = ? FETCH FIRST 1 ROW ONLY WITH UR ;
This is now on its way through change management! Naturally, it is the way the developer should have coded it from the get go!
What can you do?
Now this, of course, caused alarm bells to ring as “cut-and-paste” is your friend. If there is bad code in one place it is probably being copied further, even as you read this! Using SEG’s WorkLoadExpert and the Real-time Statistics, you can easily pull out and analyze any “bad guys”.
Put simply, use the UPDATE count from WLX and correlate it to the REORGUPDATES counter. If they are wildly different, taking into account REORGLASTTIME and WLX_TIMESTAMP, then you have a candidate to further track down!
Now, where are those refrigerators we have to move?
As always, I would be pleased to hear from you and any war stories you have!