2018-02 Db2 Catalog Statistics revisited

Db2 Optimizer & access path selection for Db2 11 & Db2 12 :

Db2 Catalog dataProblematic default values | Correlations in the Db2 Catalog 

It has been six years since the last update so I thought, after Terry Purcell’s excellent presentation in January 2018, it would be a good point in time to go over and rake the coals again—especially as a couple of things have changed for Db2 12!


Terry Purcell – Db2 12 for z/OS Optimizer and RUNSTATS improvements
Webcast replay          Abstract

Are you a RUNSTATS Master?

Every now and again, I hold a little presentation called “Are you a RUNSTATS Master?” Actually these days it’s called “Db2 z/OS Lies, Damn Lies, and Statistics…” where I describe in detail, what the Db2 Optimizer uses for access path selection in relation to the Db2 Catalog data.

Surprised? You will be!

Personally, I am always surprised at how often people say “just that data?” or “is that it?” (the various other reasons for access path selection like CP speed, RID Pool size, Sort Pool size, Max data caching size, and, of course, the 80 bufferpools are also mentioned, but these have nothing to do with RUNSTATS).

So generally the answer is “Yes.” However, the permutations and combinations make the devil in the detail – The Db2 Optimizer’s algorithms are top secret, but the input data it uses is fully described in the documentation.

Just the facts ma’am

What I want to do, is show :

  • the Db2 Catalog data that is used
  • the default values that can cause surprising things to happen
  • the problem of correlations in the Db2 Catalog

Which data are used by the Db2 Optimizer and which are updated by RUNSTATS?

Here is a complete list of the eleven tables used by the Db2 Optimizer:

  1. SYSIBM.SYSCOLDIST
  2. SYSIBM.SYSCOLSTATS *
  3. SYSIBM.SYSCOLUMNS
  4. SYSIBM.SYSINDEXES
  5. SYSIBM.SYSINDEXPART
  6. SYSIBM.SYSKEYTARGETS (same as SYSCOLUMNS)
  7. SYSIBM.SYSKEYTGTDIST (same as SYSCOLDIST)
  8. SYSIBM.SYSROUTINES
  9. SYSIBM.SYSTABLES
  10. SYSIBM.SYSTABLESPACE
  11. SYSIBM.SYSTABSTATS

* degree of parallelism only and, after APAR PK62804, also „sometimes“ used to bound filter factor estimates…

By the Columns

Now we can also list out all of the columns (obviously not including the key columns) which are used by the Db2 Optimizer:


SYSCOLDIST
CARDF, COLGROUPCOLNO, COLVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, QUANTILENO, STATSTIME


SYSCOLSTATS
COLCARD, HIGHKEY, LOWKEY


SYSCOLUMNS
COLCARDF, HIGH2KEY, LOW2KEY


SYSINDEXES
CLUSTERING*, CLUSTERRATIO, CLUSTERRATIOF, DATAREPEATFACTORF, FIRSTKEYCARDF, FULLKEYCARDF, NLEAF, NLEVELS


SYSINDEXPART
LIMITKEY*


SYSKEYTARGETS
CARDF, HIGH2KEY, LOW2KEY, STATS_FORMAT


SYSKEYTGTDIST
CARDF, KEYGROUPKEYNO, KEYVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMKEYS, QUANTILENO, STATSTIME


SYSROUTINES
CARDINALITY*, INITIAL_INSTS*, INITIAL_IOS*, INSTS_PER_INVOC*, IOS_PER_INVOC*


SYSTABLES
CARDF, EDPROC*, NPAGES, NPAGESF, PCTROWCOMP


SYSTABLESPACE
NACTIVE, NACTIVEF


SYSTABSTATS
CARD, CARDF, NPAGES


Notes: * Columns are not updated by RUNSTATS and _ Columns are not updatable at all. The column STATSTIME is used only if there are duplicates in the SYSCOLDIST table, and then the Db2 Optimizer will use the “newer” data that was probably inserted by a User.

Know your defaults

Which default column values trigger the Db2 Optimizer to use its own internal default values?


SYSCOLUMNS


If COLCARDF                       = -1 then use 25


SYSINDEXES


If CLUSTERRATIOF            = 0 then use CLUSTERRATIO


If CLUSTERRATIO              = 0 then use 0.95 if the index is CLUSTERing = ‘Y’ otherwise 0.00


DATAREPEATFACTORF    = -1 then is ignored


If FIRSTKEYCARDF            = -1 then use 25


If FULLKEYCARDF             = -1 then use 25


If NLEAF                              = -1 then use 33 (Which is SYSTABLES.CARDF / 300)


If NLEVELS                         = -1 then use 2


SYSROUTINES


If CARDINALITY                  = -1 then use 10,000  


If INITIAL_INSTS                 = -1 then use 40,000


If INITIAL_IOS                     = -1 then use 0


If INSTS_PER_INVOC        = -1 then use 4,000


If IOS_PER_INVOC            = -1 then use 0


If IOS_PER_INVOC            = -1 then use 0


SYSTABLES


If CARDF                             = -1 then use 10,000


If NPAGESF                      <= 0 then use NPAGES


If NPAGES                          = -1 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))

Here you must be very careful if using NPGTHRSH ZPARM as 501 is more than the default value in most shops. This is one of the little changes in Db2 12 where the value -1 is treated as -1 for the NPGTHRSH check.


SYSTABLESPACE


If NACTIVEF                     = 0 then use NACTIVE


If NACTIVE                       = 0 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))


SYSTABSTATS


If CARDF                         = -1 then use 10,000


If NPAGES                       = -1 then use 501 (Which is CEILING (1 + SYSTABSTATS.CARDF / 20))


So now you can see that non-floating point “old” data, may still be used today and this may cause access path headaches!

Never ever say never

Now to top it all, the data in the SYSCOLDIST and SYSKEYTGTDIST never gets simply “deleted”.

Well, actually, in Db2 12 you can now do a RUNSTATS xxx.yyy RESET ACCESSPATH to delete all SYSCOLDIST and SYSKEYTGTDIST data and set all other relevant columns to their respective defaults, but you must time this RUNSTATS very wisely! If you run it and then forget to do a normal full RUNSTATS…

Oldie but a goldie

Once the data are inserted, they stay there, until they are overwritten by new data, a RUNSTATS RESET, or the object is dropped. This all leads to some very old data in these two tables that can and do cause the Db2 Optimizer a ton of grief! One of the first things I do is to simply select the MIN(STATSTIME) from these tables just to see how old the data really is. Do it yourself and be surprised! I have seen sites with eight-year old data in the SYSCOLDIST and that cannot be good!

Correlate the world

Now onto correlations… There are lots of little tricks that DBAs use to “massage” access path choice. One of these, is to just set NLEVELS to 15 for a given index. Then lots of queries simply refuse to touch it as it would appear to be HUGE. Now, just simply updating columns can cause the Db2 Optimizer, in the best case, to ignore the updates or, perhaps, makes things even worse! So here is a list of the correlations (In other words, if you change xxx remember to change yyy and zzz as well):

Relationships exist among certain columns of certain tables:

  •       Columns within SYSCOLUMNS
  •       Columns in the tables SYSCOLUMNS and SYSINDEXES
  •       Columns in the tables SYSCOLUMNS and SYSCOLDIST
  •       Columns in the tables SYSCOLUMNS, SYSCOLDIST, and SYSINDEXES

 If you plan to update some values, keep in mind the following correlations:

  •  COLCARDF and FIRSTKEYCARDF/FULLKEYCARDF (They must be equal for the 1st column and full, if a single column index)
  •  COLCARDF, LOW2KEY and HIGH2KEY. (For non-default COLCARDF LOW2KEY and HIGH2KEY key must be filled with data) and if the COLCARDF is 1 or 2 Db2 uses LOW2KEY and HIGH2KEY as domain statistics to generate frequencies.
  • CARDF in SYSCOLDIST.  CARDF is related to COLCARDF and FIRSTKEYCARDF and FULLKEYCARDF. It must be at a minimum
  • A value between FIRSTKEYCARDF and FULLKEYCARDF if the index contains the same set of columns
  • A value between MAX(colcardf of each col) and the product of all the columns COLCARDFs in the group
  • CARDF in SYSTABLES. CARDF must be equal or larger than any other cardinalities, such as COLCARDF, FIRSTKEYCARDF, FULLKEYCARDF, and CARDF in SYSCOLDIST
  • FREQUENCYF and COLCARDF or CARDF. The number of frequencies collected must be less than or equal to COLCARDF for the column or CARDF for the column group
  • FREQUENCYF. The sum of frequencies collected for a column or column group must be less than or equal to 1

New in Db2 11

In Db2 11, the table SYSSTATFEEDBACK was introduced giving us the first chance to see what the optimizer thinks is missing. This is truly awesome, as then we can tailor our RUNSTATS to generate exactly what the optimizer needs to really validate and generate a good, stable access path. Of course, you should be a little bit careful with this data as too much of a good thing can be bad for you!

New in Db2 12

(Not just the lowercase b!)

In Db2 12, the SYSSTATFEEDBACK was made even more interesting by now externalizing the required RUNSTATS options *directly* into the already existing RUNSTATS profile or, indeed, actually creating a RUNSTATS profile for you.

I think that is really dangerous, as then you could easily flood your system with bogus stats for end user QMF/SPUFI queries that were run “by accident,” or so called “boss queries” where someone with *no* idea of SQL clicks together a highly complex and badly written SQL before letting it run for a weekend. Naturally the SQL gets rewritten by a helpful ever present DBA, but the statistics recommendations have now landed in the profile and will be updated and kept from this point on.

My personal recommendation is to switch off this feature as it is sadly *on* by default!

Here are the ZPARMs of interest

ZPARM STATFDBK_SCOPE set to ALL by default.
ZPARM STATFDBK_PROFILE set to YES by default.
Plus, in table SYSIBM.SYSTABLES column STATS_FEEDBACK is set to Y by default.

Out-of-the box it starts automatically creating (for TYPE=’C’ with NUMCOLS > 1 and TYPE=’F’ or ‘H‘) profiles and updating existing profiles…Here you must manually check the size of your profiles every now and again just to make sure everything is ok!

One other new thing in Db2 12, is that XML columns can get statistics now to help XMLEXISTS get a better access path.

and finally

Do not forget that our little Freeware tool StatisticsHealthCheck will find all bad correlations, old data and badly updated data for you and it is FREE!

So I hope this little round-up of Db2 Catalog Statistics data was interesting, and, as usual, if you have any comments or questions, then please, feel free to mail me!

TTFN

Roy Boxwell

 

 

Southwest Db2 Users Group – February 2018

Db2 Forum.  Southwest Db2 Users Group – February 2018 – Grapevine (Dallas), TX, USA

SEGUS & SOFTWARE ENGINEERING proudly sponsor this event & present

1 – Pdf Präsentation Compliance with compliments! Viable Db2 z/OS workload tracking.

2 – Pdf Präsentation Db2 12 Continuous Delivery – New challenges for deployment.

3 – Pdf Präsentation –  Db2 z/OS Lies, Damn Lies, and Statistics… 


1 – Db2 z/OS Security Audit: Compliance with compliments! Viable Db2 z/OS workload tracking.

Audit and Compliance is a need that many companies want and have to fulfill.

There’s different ways and tools that promise to be able to do it, but what can they really do and what are the associated costs? This presentation introduces Db2 10/11 technology exploitation that delivers any DML, DDL, DCL being executed in a Db2 environment along with identification details. Learn how you can run Audit analytics against a long‐term repository, pinpointing who executed a query, when and from where. Analyze your entire workload to understand access patterns and abnormalities.


More about Db2 Audit

Presentation Outline

  • Audit needs and musts Take a journey to GLB HIPAA PCI‐DSS Basel III Sarbanes‐Oxley CA SB1386 Federal Information Security Management Act “ed Flag”Rules (FRCA)5.
  • Solution overview and their Pros/Cons Get an overview about the existing solutions and understand how they work.
  • The viable way – let Db2 do the magic! Learn about Db2 enhancements in Db2 10/11 that deliver the Db2 workload being processed and understand why it’s so efficient.
  • Customer results from the banking industry Receive some experience from a large banking company and how they successfully replaced their Db2 Audit feature based reporting by a modern SQL tracking and analytics process.

 


2 Db2 12 Continuous Delivery – New challenges for deployment.

Fundamental changes in the Db2 z world often lead to concerns. Let’s face it – some changes force us to change! While a Db2 version migration usually took months, or even years, there will be no new Db2 version after 12, but continuous code drops.

This will have a tremendous impact on migration strategies, because we have to find a reliable way to test these code deliveries in a fraction of time. If we make it, Business Divisions will become enthused at how quickly new technology becomes available for new applications. This presentation will describe the difference between Code, Catalog, Function and Application Levels, how you can control them and how you can fallback in case of anomalies. It also illustrates how we still can be pro-active in testing without burning weeks and months.
Learn how to choose from four different levels of testing and a new way of automation. CD-Screening allows you to pick and choose from KPI based test automation. The levels include simple anomaly alerting, access path verification, clone Pre-apply and even workload capture/replay to easily discover different behaviour resulting from a new code Level.


More about Db2 Continuous Delivery – CD

Presentation Outline

Joining this presentation, you’ll learn how to align Continuous Delivery to your Continuous Availability.

  • Agile, Continuous Delivery, DevOps – just buzz words, or new methodologies?
  • Db2 Code, Catalog, Function and Application Levels – differences and dependencies.
  • Activation/Deactivation of new code and how to fallback and when you can’t.
  • Different flavors of (pro-active) CD-Screening and how it can be automated:

* Anomaly alerting based on Incompatibility Change Indicators (ICIs)
* Dyn./Stat.Access Path Change Detection e.g.via Plan Management
* Clone based code change pre-apply exploiting Backup System
* Workload-KPI verification using SQL replay and KPI comparison

Audience Experience:   Intermediate Advanced
Platform:                        Db2 z/OS
Presentation Length:     60 minutes
Presentation Category:  Database Administration Performance Management Db2 Migration

 


3 Db2 z/OS Lies, Damn Lies, and Statistics…

– Benjamin Disraeli, Prime Minister of England (1868, 1874-1880)

The above line may, or may not, have been spoken well over 100 years ago, but the need for statistics and, above all else, accurate statistics is more important than ever in the Db2 world of today.


More about Db2 RUNSTATS

Presentation Outline

  • Db2 RUNSTATS basics & catalog tables and Columns used for access path
  • IBM recommendations through the ages : from Db2 V3 to Db2 12
  • Db2 RUNSTATS advanced
  • SYSCOLDIST explained
  • RUNSTATS real world Q&A :
    use of SAMPLE, COLGROUP, PROFILE, REOPT (ONCE), TABLESAMPLE SYSTEM, HISTOGRAM, …
  • RUNSTATS reversal

Speaker biography

Roy Boxwell has more than 32 years of experience in MVS, OS/390, and z/OS environments – 30 of those in Db2. He specializes in installation, migration, and performance monitoring and tuning. Roy leads the SEG development team responsible for the real time database maintenance solutions. He is also an active participant, speaker and contributor on the IDUG Db2 Listserv and sends out a monthly Db2 z/OS Newsletter.

Heart of Texas Db2 Users Group – February 2018

HOTDUG – Heart of Texas Db2 User Group – February 2018 –  Austin, TX, USA

SEGUS & SOFTWARE ENGINEERING proudly sponsor this event & present

1 – Pdf PresentationCompliance with compliments! Viable Db2 z/OS workload tracking.

2 Pdf PresentationDb2 12 Continuous Delivery – New challenges for deployment.

3 Pdf PresentationDb2 z/OS Lies, Damn lies, and Statistics… 


1 – Db2 z/OS Security Audit: Compliance with compliments! Viable Db2 z/OS workload tracking.

Audit and Compliance is a need that many companies want and have to fulfill.

There’s different ways and tools that promise to be able to do it, but what can they really do and what are the associated costs? This presentation introduces Db2 10/11 technology exploitation that delivers any DML, DDL, DCL being executed in a Db2 environment along with identification details. Learn how you can run Audit analytics against a long‐term repository, pinpointing who executed a query, when and from where. Analyze your entire workload to understand access patterns and abnormalities.


More about Db2 Audit

Presentation Outline

  • Audit needs and musts Take a journey to GLB HIPAA PCI‐DSS Basel III Sarbanes‐Oxley CA SB1386 Federal Information Security Management Act “ed Flag”Rules (FRCA)5.
  • Solution overview and their Pros/Cons Get an overview about the existing solutions and understand how they work.
  • The viable way – let Db2 do the magic! Learn about Db2 enhancements in Db2 10/11 that deliver the Db2 workload being processed and understand why it’s so efficient.
  • Customer results from the banking industry Receive some experience from a large banking company and how they successfully replaced their Db2 Audit feature based reporting by a modern SQL tracking and analytics process.

 


2Db2 12 Continuous Delivery – New challenges for deployment.

Fundamental changes in the Db2 z world often lead to concerns. Let’s face it – some changes force us to change! While a Db2 version migration usually took months, or even years, there will be no new Db2 version after 12, but continuous code drops.

This will have a tremendous impact on migration strategies, because we have to find a reliable way to test these code deliveries in a fraction of time. If we make it, Business Divisions will become enthused at how quickly new technology becomes available for new applications. This presentation will describe the difference between Code, Catalog, Function and Application Levels, how you can control them and how you can fallback in case of anomalies. It also illustrates how we still can be pro-active in testing without burning weeks and months.
Learn how to choose from four different levels of testing and a new way of automation. CD-Screening allows you to pick and choose from KPI based test automation. The levels include simple anomaly alerting, access path verification, clone Pre-apply and even workload capture/replay to easily discover different behaviour resulting from a new code Level.


More about Db2 Continuous Delivery – CD

Presentation Outline

Joining this presentation, you’ll learn how to align Continuous Delivery to your Continuous Availability.

  • Agile, Continuous Delivery, DevOps – just buzz words, or new methodologies?
  • Db2 Code, Catalog, Function and Application Levels – differences and dependencies.
  • Activation/Deactivation of new code and how to fallback and when you can’t.
  • Different flavors of (pro-active) CD-Screening and how it can be automated:

* Anomaly alerting based on Incompatibility Change Indicators (ICIs)
* Dyn./Stat.Access Path Change Detection e.g.via Plan Management
* Clone based code change pre-apply exploiting Backup System
* Workload-KPI verification using SQL replay and KPI comparison

Audience Experience:   Intermediate Advanced
Platform:                        Db2 z/OS
Presentation Length:     60 minutes
Presentation Category:  Database Administration Performance Management Db2 Migration

 


3Db2 z/OS Lies, Damn lies, and Statistics…

– Benjamin Disraeli, Prime Minister of England (1868, 1874-1880)

The above line may, or may not, have been spoken well over 100 years ago, but the need for statistics and, above all else, accurate statistics is more important than ever in the Db2 world of today.


More about Db2 RUNSTATS

Presentation Outline

  • Db2 RUNSTATS basics & catalog tables and Columns used for access path
  • IBM recommendations through the ages : from Db2 V3 to Db2 12
  • Db2 RUNSTATS advanced
  • SYSCOLDIST explained
  • RUNSTATS real world Q&A :
    use of SAMPLE, COLGROUP, PROFILE, REOPT (ONCE), TABLESAMPLE SYSTEM, HISTOGRAM, …
  • RUNSTATS reversal

Speaker biography

Roy Boxwell has more than 32 years of experience in MVS, OS/390, and z/OS environments – 30 of those in Db2. He specializes in installation, migration, and performance monitoring and tuning. Roy leads the SEG development team responsible for the real time database maintenance solutions. He is also an active participant, speaker and contributor on the IDUG Db2 Listserv and sends out a monthly Db2 z/OS Newsletter.

Northeast Ohio Database Users Group – December 2017

NEODBUG – Northeast Ohio Database Users Group – December 2017

December 07th, 2017

SEGUS & SOFTWARE ENGINEERING present

1- 25 Years of Missed Opportunities? SQL Tuning Revisited

2- Db2 z/OS Lies, Damn lies, and Statistics


1- 25 Years of Missed Opportunities? SQL Tuning Revisited

“But that’s the way we’ve always done SQL Tuning” – sound familiar?

The definition of madness is to keep doing the same thing and expecting a different outcome.
Find out how to stop the madness.

Nothing is more expensive than a missed opportunity” (H. Jackson Brown, Jr).

This is especially true when it comes to SQL Tuning!

There are many low-hanging fruits within easy grasp of anyone tasked with tuning their systems. But how do you know what is dangling within reach if you can’t see it?

Find out how a modern DB2 z/OS SQL warehouse can collect and store all executed static and dynamic SQL (plex-wide) with basically no overhead.

By comparing SQL statements side-by-side, the “easy pickings” will immediately become apparent. But why stop there? Experts can effortlessly dig deeper and find the totally hidden gems: Disc Problem Detection, Delay Detection, Never executed SQL, SQL Timeline. Precisely pinpoint specific areas to target and get the most out of your DB2 system – while reducing costs.

Outlines

1 – Tuning SQL – how we always done it

2 – Single SQL, package, application

3 – Tuning SQL – year 2004 – ACCESS PATH comparison and simulation

4 – Tuning SQL Revisited – A new methodology

5 – Harvesting the low hanging fruit


2- Db2 z/OS Lies, Damn lies, and Statistics

..Lies, damn lies, and statistics… – Benjamin Disraeli, Prime Minister of England (1868, 1874-1880)

The above line may, or may not, have been spoken well over 100 years ago, but the need for statistics and, above all else, accurate statistics is more important than ever in the Db2 world of today.

Presentation Outline

  • Db2 RUNSTATS basics & catalog tables and Columns used for access path
  • IBM recommendations through the ages : from Db2 V3 to Db2 12
  • Db2 RUNSTATS advanced
  • SYSCOLDIST explained
  • RUNSTATS real world Q&A :
    use of SAMPLE, COLGROUP, PROFILE, REOPT (ONCE), TABLESAMPLE SYSTEM, HISTOGRAM, …
  • RUNSTATS reversal

Speaker Biography

Ulf Heinrich is the Director of Solutions Delivery at SOFTWARE ENGINEERING GmbH. He specializes in Db2 recovery issues and database maintenance, focusing on the growing requirement for cost reduction and 24×7 operations. As a consultant at large customer sites, he has implemented database maintenance procedures and recovery strategies, and also experienced the pitfalls of recovery scenarios under real‐world recovery pressure. His activities cover EMEA, as well as North America through SE’ U.S. subsidiary, SEGUS Inc. He’s involved in the development of SE’ maintenance and recovery Solutions.

Michigan Db2 Users Group – December 2017

MDUG – Michigan Db2 Users Group

December 6th, 2017

SEGUS & SOFTWARE ENGINEERING present

1- 25 Years of Missed Opportunities? SQL Tuning Revisited

2- Db2 z/OS Lies, Damn Lies, and Statistics


1- 25 Years of Missed Opportunities? SQL Tuning Revisited

“But that’s the way we’ve always done SQL Tuning” – sound familiar?

The definition of madness is to keep doing the same thing and expecting a different outcome.
Find out how to stop the madness.

Nothing is more expensive than a missed opportunity” (H. Jackson Brown, Jr).

This is especially true when it comes to SQL Tuning!

There are many low-hanging fruits within easy grasp of anyone tasked with tuning their systems. But how do you know what is dangling within reach if you can’t see it?

Find out how a modern DB2 z/OS SQL warehouse can collect and store all executed static and dynamic SQL (plex-wide) with basically no overhead.

By comparing SQL statements side-by-side, the “easy pickings” will immediately become apparent. But why stop there? Experts can effortlessly dig deeper and find the totally hidden gems: Disc Problem Detection, Delay Detection, Never executed SQL, SQL Timeline. Precisely pinpoint specific areas to target and get the most out of your DB2 system – while reducing costs.

Outlines

1 – Tuning SQL – how we always done it

2 – Single SQL, package, application

3 – Tuning SQL – year 2004 – ACCESS PATH comparison and simulation

4 – Tuning SQL Revisited – A new methodology

5 – Harvesting the low hanging fruit


2- Db2 z/OS Lies, Damn Lies, and Statistics

..Lies, damn lies, and statistics… – Benjamin Disraeli, Prime Minister of England (1868, 1874-1880)

The above line may, or may not, have been spoken well over 100 years ago, but the need for statistics and, above all else, accurate statistics is more important than ever in the Db2 world of today.

Presentation Outline

  • Db2 RUNSTATS basics & catalog tables and Columns used for access path
  • IBM recommendations through the ages : from Db2 V3 to Db2 12
  • Db2 RUNSTATS advanced
  • SYSCOLDIST explained
  • RUNSTATS real world Q&A :
    use of SAMPLE, COLGROUP, PROFILE, REOPT (ONCE), TABLESAMPLE SYSTEM, HISTOGRAM, …
  • RUNSTATS reversal

Speaker Biography

Ulf Heinrich is the Director of Solutions Delivery at SOFTWARE ENGINEERING GmbH. He specializes in Db2 recovery issues and database maintenance, focusing on the growing requirement for cost reduction and 24×7 operations. As a consultant at large customer sites, he has implemented database maintenance procedures and recovery strategies, and also experienced the pitfalls of recovery scenarios under real‐world recovery pressure. His activities cover EMEA, as well as North America through SE’ U.S. subsidiary, SEGUS Inc. He’s involved in the development of SE’ maintenance and recovery Solutions.

Central Ohio Db2 Users Group – December 2017

CODUG – Central Ohio Db2 Users Group – December 2017

December 05th, 2017

SEGUS & SOFTWARE ENGINEERING present

Db2 z/OS Lies, Damn Lies, and Statistics

..Lies, damn lies, and statistics… – Benjamin Disraeli, Prime Minister of England (1868, 1874-1880)

The above line may, or may not, have been spoken well over 100 years ago, but the need for statistics and, above all else, accurate statistics is more important than ever in the Db2 world of today.

Presentation Outline

  • Db2 RUNSTATS basics & catalog tables and Columns used for access path
  • IBM recommendations through the ages : from Db2 V3 to Db2 12
  • Db2 RUNSTATS advanced
  • SYSCOLDIST explained
  • RUNSTATS real world Q&A :
    use of SAMPLE, COLGROUP, PROFILE, REOPT (ONCE), TABLESAMPLE SYSTEM, HISTOGRAM, …
  • RUNSTATS reversal

Speaker Biography

Ulf Heinrich is the Director of Solutions Delivery at SOFTWARE ENGINEERING GmbH. He specializes in Db2 recovery issues and database maintenance, focusing on the growing requirement for cost reduction and 24×7 operations. As a consultant at large customer sites, he has implemented database maintenance procedures and recovery strategies, and also experienced the pitfalls of recovery scenarios under real‐world recovery pressure. His activities cover EMEA, as well as North America through SE’ U.S. subsidiary, SEGUS Inc. He’s involved in the development of SE’ maintenance and recovery Solutions.

2015-07 Bad Data Day

A Good time to check your DB2 Catalog Statistics !

One of my favorite topics is STATISTICS and RUNSTATS.

This month I have a short newsletter involving both of them!

 

Something jumped right out…

Some time ago we were helping one of our customers to perform an Early Precheck (Going from DB2 9 to 10). To do so, we requested a copy of their entire DB2 production statistics so the optimizer could work here at our labs in Dusseldorf – just like at the customer site.

 

…as we loaded up DB2 production statistics of a customer

We loaded up the data and I noticed something that just jumped right out…

SELECT TYPE 
      ,FREQUENCYF 
      ,NUMCOLUMNS 
      ,SUBSTR(COLVALUE , 1 , 11) AS COLVALUE 
      ,COLGROUPCOLNO 
       FROM SYSIBM.SYSCOLDIST 
       WHERE TBOWNER = 'aaaaaaa' 
         AND TBNAME  = 'bbbbbb' 
         AND NAME    = 'cccccc' 
ORDER BY 1 , 3 , 2 
; 
---------+---------+---------+---------+---------+---------+---------
TYPE             FREQUENCYF   NUMCOLUMNS   COLVALUE      COLGROUPCOLNO
---------+---------+---------+---------+---------+---------+---------
F    +0.6066539624818615E-02            1 . xxxxxxxxxx .. 
F    +0.6066539624818615E-02            1 . xxxxxxxxxx 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx .. 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx .. 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx .. 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx .. 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx .. 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx 
DSNE610I NUMBER OF ROWS DISPLAYED IS 20 
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100 

 

There were “duplicates”!

This is just one of many examples. Also above, you can see the problem – Notice the data in COLGROUPCOLNO? Sometimes there is a hex value and sometimes not!

Now, believe this or not, all of this goes back to a bug in RUNSTATS in DB2 V8 which got this APAR:

PK33517:

COLGROUPCOLNO ASSOCIATED WITH SINGLE COLUMN CONTAINS A NUMERIC VALUE INSTEAD OF ZERO LENGTH FIELD ACCORDING TO SQL REF

What’s interesting here, is this APAR is marked as FIN so the “bug” disappeared in DB2 9!

The bug was fixed in DB2 9, but the “bad” data was not automatically cleaned up

As you can see, the bug caused an erroneous value in the COLGROUPCOLNO to be set for single column frequency rows.

The bug was then fixed in DB2 9, but the “bad” data that had been inserted was not automatically cleaned up and, as I hope you all know, the SYSCOLDIST data is never automatically deleted – it is only ever updated. So when the bug was fixed, the low-value or hexadecimal column number in that field was no longer EQUAL to a zero length field, and so an insert was done. Since then these rows have just stayed there…

 

Query to count this bad rows in the SYSCOLDIST

I wrote a little query just to show the count of how many of these bad rows existed in the SYSCOLDIST:

SELECT COUNT(*) AS BAD_GUYS 
FROM SYSIBM.SYSCOLDIST 
WHERE NUMCOLUMNS                = 1 
  AND TYPE                      = 'F' 
  AND NOT LENGTH(COLGROUPCOLNO) = 0 
; 
---------+---------+---------+---------+------
    BAD_GUYS 
---------+---------+---------+---------+------
       8680 
DSNE610I NUMBER OF ROWS DISPLAYED IS 1


All of these entries should be deleted from SYSCOLDIST to help the optimizer pick the right access paths!

Check your stats !

I first noticed this problem during a test of our Statistics HealthCheck product, which flagged over 1,200 critical problems in the DB2 catalog and, thinking I had broken something, I checked all of the checks and found the above bad data. Now is as good a time as any to check your stats *and* download our Statistics Healthcheck Freeware!!

 

As usual any questions or  comments are welcome,

TTFN Roy Boxwell

Senior Software Architect