Rotten Results from RUNSTATS Require Rescue

Do you know the basic rules to ensure access path stability when using RUNSTATS?

Time for another of my “I noticed something strange at a customer site recently” newsletters. Enjoy!


RUNSTATS are good aren’t they?

At this particular site, the RUNSTATS methodology of RUNSTATS was, shall we say, “sub-optimal.” They use an ancient system to decide when to RUNSTATS, and they do tablespace’s and index’s *never* at the same time. Just to complicate matters even more, they never use inline RUNSTATS because “if the REORG abends, the statistics in the DB2 catalog are dead”. Now you are all probably well aware of the scale of the disaster at this site?


Daily fire fighting

Nearly every day, some access path somewhere goes horribly wrong… the under- manned and over-worked DBA group are tasked to find and fix ASAP. Cures range from a quick INDEX create or change, or perhaps even a really needed RUNSTATS or REORG.


Why do the Access Paths go “wrong”?

The real goal is to stop firefighting and to investigate the root cause. Why do so many access paths go wrong on such a regular basis? The answer is the systemic horribleness of RUNSTATS collection. Dynamic SQL is, obviously, very very sensitive to RUNSTATS. For one thing, the statements are kicked out of the cache! The very next time they come back, the DB2 Optimizer redrives the cost calculations and “Hey Presto!” you have a bad access path. Terry Purcell and Pat Bossmann have often said that about 90% of DB2 performance problems stem from bad RUNSTATS. The old adage “garbage in – garbage out” is still true!


Timing is everything

The timing of the RUNSTATS is critical for stable access paths.

Basic rules are:

 1 Only do a RUNSTATS if you really really need to!
a. RUNSTATS are not cheap!
b. The Dynamic Statement cache gets wiped
c. Locks on the Catalog can occur
2Avoid doing RUNSTATS even if RTS says to run one!
a. Lots of people use the incorrect counters to trigger a RUNSTATS. Use the correct ones for the correct Object type
b. Never RUNSTATS LOB spaces – completely pointless work!
c. Even if a MASSDELETE has occurred do you really want to “reset” the DB2 catalog statistics?
d. VOLATILE tables must be handled with *extreme* care!
3Choose your RUNSTATS parameters wisely!
a. Doing a blind “RUNSTATS the world” is just as bad as running an empty RUNSTATS!
b. HISTOGRAM should be used with caution
c. More than a hundred COLGROUPs should start alarm bells ringing

Quite a list here, and it really only shows some “Rules of Thumb”. I’ll bet you all have you own?

Is there a way back from the abyss?

But what happens if you have 1000’s of partitions with terabytes of data and the RUNSTATS was, shall we say, ill-advised or badly timed? Can you go back in time? Hands up those who wants to do a PiT recovery on the production catalog! No takers???

Yes! There is a way back from the abyss

I’ll bet you are all well ahead of me here, but the way to do this is pretty straightforward. You simply acquire our latest tool, RUNSTATS Rescue, to handle it all for you. Or, you could try and reset the data in the DB2 catalog from off-line backups that you happen to have taken before the RUNSTATS that is now killing you. …You did do that, right?

Why a tool?

Apart from the fact that this tool is from us, my firm, just trying to “roll your own” can be a real nightmare. Why?

  • Because you must first find out all of the objects that were touched by the badly performing SQL.
  • Then you must get all of the DB2 Optimizer relevant data back from a point in time before the RUNSTATS executed, and/or the last REBIND(s),
  • and then you must flush the dynamic statement cache and REBIND any static SQL.

Sounds like a lot of work.

What else must you do?

You also have to be transparent and so log what you do. You must allow for the ability to back-out your changes as perhaps you make another access path even worse. And it would be really cool if you could do “on the fly” explains to check that the RUNSTATS really *is* the root of all that evil. Remember that ZPARMS and BUFFERPOOLS also have a major influence on access paths. Even the speed of your machine! It is also a must to then be able to go even further back in time – perhaps as much as a year?

Hang on – What about PLAN STABILITY?

Doesn’t plan stability save you? I hear you all cry. Well, “No” is the short answer! If your package is invalidated by a Schema change (the classics are index drop and recreate or VIEW change), then plan stability does not work anymore. Further, in DB2 12, Dynamic Plan Stability has been announced. Sadly it *also* fails right here as there is no SWITCH PREVIOUS/ORIGINAL support!

It all works together

So, for the static SQL case where the package is not invalided, Plan Stability is good. If not: – RUNSTATS Rescue to the rescue. For Dynamic SQL – RUNSTATS Rescue is the answer.

As always, any questions or comments would be most welcome!


Roy Boxwell


2015-07 Bad Data Day

A Good time to check your DB2 Catalog Statistics !

One of my favorite topics is STATISTICS and RUNSTATS.

This month I have a short newsletter involving both of them!


Something jumped right out…

Some time ago we were helping one of our customers to perform an Early Precheck (Going from DB2 9 to 10). To do so, we requested a copy of their entire DB2 production statistics so the optimizer could work here at our labs in Dusseldorf – just like at the customer site.


…as we loaded up DB2 production statistics of a customer

We loaded up the data and I noticed something that just jumped right out…

       WHERE TBOWNER = 'aaaaaaa' 
         AND TBNAME  = 'bbbbbb' 
         AND NAME    = 'cccccc' 
ORDER BY 1 , 3 , 2 
F    +0.6066539624818615E-02            1 . xxxxxxxxxx .. 
F    +0.6066539624818615E-02            1 . xxxxxxxxxx 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx .. 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx .. 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx .. 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx .. 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx .. 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx 


There were “duplicates”!

This is just one of many examples. Also above, you can see the problem – Notice the data in COLGROUPCOLNO? Sometimes there is a hex value and sometimes not!

Now, believe this or not, all of this goes back to a bug in RUNSTATS in DB2 V8 which got this APAR:



What’s interesting here, is this APAR is marked as FIN so the “bug” disappeared in DB2 9!

The bug was fixed in DB2 9, but the “bad” data was not automatically cleaned up

As you can see, the bug caused an erroneous value in the COLGROUPCOLNO to be set for single column frequency rows.

The bug was then fixed in DB2 9, but the “bad” data that had been inserted was not automatically cleaned up and, as I hope you all know, the SYSCOLDIST data is never automatically deleted – it is only ever updated. So when the bug was fixed, the low-value or hexadecimal column number in that field was no longer EQUAL to a zero length field, and so an insert was done. Since then these rows have just stayed there…


Query to count this bad rows in the SYSCOLDIST

I wrote a little query just to show the count of how many of these bad rows existed in the SYSCOLDIST:

WHERE NUMCOLUMNS                = 1 
  AND TYPE                      = 'F' 

All of these entries should be deleted from SYSCOLDIST to help the optimizer pick the right access paths!

Check your stats !

I first noticed this problem during a test of our Statistics HealthCheck product, which flagged over 1,200 critical problems in the DB2 catalog and, thinking I had broken something, I checked all of the checks and found the above bad data. Now is as good a time as any to check your stats *and* download our Statistics Healthcheck Freeware!!


As usual any questions or  comments are welcome,

TTFN Roy Boxwell

Senior Software Architect

2015-05 Top 10 Things to Ignore for DB2 z/OS


This newsletter was inspired by a recent article I read in the “Enterprise Systems Magazine” called “Top 10 Ways to Waste Money on CPU”. Why not the Top 10 things to ignore?


DB2 z/OS things you could ignore but most definitely should not!

So here’s my little list, in no particular order, of things you could ignore but most definitely should not!

  1SQL DELETE statements in mega-million  tablesSQL DELETE statements in mega-million  tables when a REORG DISCARD would kill two birds with one stone. (I love that phrase) Anyway, after 500,000 singleton deletes the tablespace probably needs a REORG anyway and so why not do two in one? A bit of a no-brainer really.
  2LOB columnsLOB columns, whose size would *easily* fit inside an inline LOB or even a VARCHAR. LOBs are still slow and cumbersome to use, but inline LOBs are great. If you can use ‘em – do so!
  3BP0 being used for *everything* by default…BP0 being used for *everything* by default… Please split the BP s into groups!!! BP0 is only, and I mean ONLY, for the Catalog and Directory. That way you can actually keep the size low and spare some memory for other BPs. LOB and XML tablespaces get their own BP. Tables and Indexes are split. Sort gets its own. You get the idea ?
  4Utility jobs still based on 1990’s ideasUtility jobs still based on 1990’s ideas. Are you still running a RUNSTATS to see if a REORG is needed? Are you running REORGs without inline RUNSTATS? Are your RUNSTATS using FREQVAL and, if required, HISTOGRAM?
  5Death by “indexiphication”.Death by “indexiphication”. Do you have tables with more than three indexes? Do you have ten or more indexes? Time to look for INCLUDE usage and LASTUSED Timestamps here!
  6PLAN_TABLE explosionPLAN_TABLE explosion. Do you have multiple PLAN_TABLEs in production? Are you REORGing, RUNSTATSing and Image Copying them on a regular basis? Are you purging them of rubbish data on a regular basis?
  7Are your ZPARMs up to date?Have you checked the Rules of Thumb in regard to ZPARMS since they were last set back in the 80’s? Now is the time to do a review of all the ZPARMS to see where you can really get performance boosts. (For example the default SRTPOOL In DB2 10 is now 10,000k but in DB2 V8 and 9 it was just 2,000k)
 8Are you removing garbage from the DB2 Catalog and Directory ?Are you removing garbage from the DB2 Catalog and Directory ? Do you really need all the packages and versions of those packages from 1989 these days? If a table gets RUNSTATSed that these ancient, never executed, packages uses then it should trigger a review of the access paths, which could, of course, flag up problems where no real problem exists.
  9 COMMIT frequency.You never need to check or change this do you…
 10 TrainingIDUG, Insight, and RUGs etc. you can never ever get enough info about how things work and how to make things better.


One thing you should certainly NOT ignore, is my newsletter! I have lots of exciting topics coming up in 2015 and I’ll also let you know about our webinars.

Upcoming Newsletters

  • SOUNDEX and other cool features part 4 – update for DB2 10 & all new for DB2 11
  • BAD Data Day
  • Overloaded Log
  • A real CLUSTER Buster


As usual, any comments or questions are welcome!


Roy Boxwell