2020-01 How RUNSTATS causes an error SQLCODE

Impossible! I hear you all say… How can a RUNSTATS *cause* an SQL error code? Well, my gentle readers, read on.


Two examples in SPUFI at the end of this newsletter

What we know

We are all, I hope, aware that a badly timed RUNSTATS can cripple your SQL Performance. Just think of a see-sawing, or volatile table, and it gets a RUNSTATS at the zero point… Tablespace scan is then a good access choice… After a couple of million inserts that is probably not the best!

Awful, Implicit Casting

Implicit casting came along a few Db2 releases ago and either made you very happy: “I never have to worry about using apostrophes again!” or very angry: “Developers must know what they are coding!” The thing is, we got it and you cannot *not* do it!

It goes both ways…

The idea behind Implicit casting, in case you don’t know, is that:

The predicate COL1 = 1 and COL1 = ‘1’ are the same to Db2.

It will take that ‘1’ and “cast” it to a variable type that will then be able to be compared to whatever type COL1 is.

So,

  • if COL1 contains a numeric representation of data everything is ok.
  • But if COL1 is CHAR(1) and contains a ’Y’ you then get an SQLCODE -420

– Nasty, nasty business…

Access path is also sub-optimal

When Db2 does implicit casting, it casts to DECFLOAT to then do the comparison etc. (See my DECFLOAT newsletter about what I think about that data type!) Anyway, it is *not* good for performance. In fact, it got so annoying that a few users actually asked for a ZPARM to switch off implicit casting! They actually wanted an error whenever they compared mismatched data types. This request was, of course, turned down.

So where’s the beef?

So, what happened in production was quite simple really: a query had been running for three years with never a problem. Then one day, after a RUNSTATS, it started returning SQLCODE -420. This was due to the fact that the table processing order had switched, due to the RUNSTATS running at a “bad” time.

This in turn exposed the buggy SQL WHERE predicate that previously had never seen the bad data as it was removed in an earlier branch! This could also happen when the column in question actually contains non-numeric data perhaps dues to a code bug:

See the example SQL at the end.

RUNSTATS Rescue for Db2 Zos

This user site has our software RUNSTATS Rescue so they quickly got the query up and running, without doing a code change, in a matter of seconds. Of course, this bad code was discovered in the middle of the year end production freeze so they could not simply change the application code! This code change has been scheduled and will be done in the new year.

Bottom Line

RUNSTATS can cause negative SQLCODEs to be returned and RUNSTATS Rescue buys you the needed time to continue running – even with buggy SQL code!

As always, I would be pleased to hear from you and any war stories you have!

TTFN,
Roy Boxwell
Senior Architect


PS: Just for the record, here are two examples in SPUFI:


 SELECT *                                                               
 FROM SYSIBM.SYSDUMMY1                                                  
 WHERE IBMREQD = '1'                                                    
 ;                                                                      
 ---------+---------+---------+---------+---------+---------+---------+-
 IBMREQD                                                                
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNE610I NUMBER OF ROWS DISPLAYED IS 0                                 
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100            
 ---------+---------+---------+---------+---------+---------+---------+-
 SELECT *                                                               
 FROM SYSIBM.SYSDUMMY1                                                  
 WHERE IBMREQD = 1                                                      
 ;                                                                      
 ---------+---------+---------+---------+---------+---------+---------+-
 IBMREQD                                                                
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNE610I NUMBER OF ROWS DISPLAYED IS 0                                 
 DSNT408I SQLCODE = -420, ERROR:  THE VALUE OF A STRING ARGUMENT WAS NOT
          ACCEPTABLE TO THE DECFLOAT FUNCTION     
 DSNT418I SQLSTATE   = 22018 SQLSTATE RETURN CODE                        
 DSNT415I SQLERRP    = DSNXRNUM SQL PROCEDURE DETECTING ERROR            
 DSNT416I SQLERRD    = -245 0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION    
 DSNT416I SQLERRD    = X'FFFFFF0B'  X'00000000'  X'00000000'  X'FFFFFFFF'
          X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION            
And the result when the column actually contains a numeric valid value and not “Y”:
CREATE TABLE BOXWELL.SYSDUMMY1 LIKE SYSIBM.SYSDUMMY1       
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0  
 ---------+---------+---------+---------+---------+---------
 INSERT INTO  BOXWELL.SYSDUMMY1 VALUES ('1')                
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 DSNE615I NUMBER OF ROWS AFFECTED IS 1                      
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0  
 ---------+---------+---------+---------+---------+---------
 SELECT *                                                   
 FROM BOXWELL.SYSDUMMY1                                     
 WHERE IBMREQD = '1'                                        
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 IBMREQD                                                    
 ---------+---------+---------+---------+---------+---------
 1                                                          
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1                     
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
 ---------+---------+---------+---------+---------+---------
 SELECT *                                                   
 FROM BOXWELL.SYSDUMMY1                                     
 WHERE IBMREQD = 1                                          
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 IBMREQD                                                    
 ---------+---------+---------+---------+---------+---------
 1                                                          
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1                     
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
Cool and dangerous! Just like me! LoL ! ! !

2015-08 Overloaded Logs

 

Have all your SQL members an “equivalent” elapsed time?

 

Hey, just a quick newsletter this month to highlight an interesting observation I had recently…

 

The high-water mark for elapsed time for some SQL was inexplicably high…

At a customer site, the Lead DBA and I were analysing SQL performance—using our  tool—and we saw in the data-sharing aggregated view of the executed SQLs that the high-water mark for elapsed time for some SQL was inexplicably high.

We then viewed the data at the Member level for these SQLs and what we saw practically jumped right out of the data at us. The same SQL on one Member was taking up to 45% longer to execute when compared against another member of the Data-sharing Group. Time to see what was going on…

 

High elapsed times and lots of Wait times on one member

On the member in question there were very high elapsed times and lots of Wait times—all well in excess of the other members. Viewing the SQLs that ran on the “problem” Member, it was quickly apparent that data change SQLs, (Insert, Delete, Update and Merge), were happening orders of magnitude more often than on any other member in the group. This started giving the Lead DBA a good idea about where the Problem could lie.

 

The problem was Logging

On this member, the OUTBUFF was increased in size, both the Active and Archive Logs were increased in size, and some traffic was switched to another Member in the group.

Tra la! The problem was fixed! All members now have an “equivalent” elapsed time for the same SQL.

 

When was the last time that you took a look to see how your workload for SQL Updates is “balanced” across your machine? Just saying.

 

As usual, any comments or questions are welcome!

TTFN

Roy Boxwell