Db2 Continuous Delivery Archives - Page 2 of 3

May 20, 2019January 14, 2021

Machine Learning for Db2 z/OS: Artificial Intelligence – Hype or Reality?

Now that the Db2 Optimizer has gone all Artificial Intelligent (AI from now on) on us, I thought it would be interesting to review where we are in this brave new world!

Keeps you out of a BIND

My firm created our BindImpactExpert (BIX) software many moons ago to compare access paths, typically before a program non SQL change or a REBIND after RUNSTATS etc. Just to warn you not to do the REBIND or whatever you were going to do, as it would change the access plan to something you may well not really want to have. This means not just no-brainers like index access to tablespace scan, but also different index usage or table access order changes.

Apples and Oranges

So, what has that got to do with AI?
Well, we used a marketing slogan of trying to compare oranges and apples to try and make it clear :

how tough it really is to try and match SQL statements across different program releases and to compare the results.

ML and AI – The new buzzwords

Now in Db2 12, the optimizer has got machine learning (ML from now on) to help it study which host variables are used the most and how many rows are usually returned by a given query.

Do you “know” your data?

When the optimizer “knows” this info it can of course make much better “guesses” at good repeatable access plans that help everyone! Now the usage of AI and ML always gets me interested because of all the fun you can have with them.

There was recently a great article in my favourite science magazine “New Scientist” all about how to fool image recognition AI:

https://www.newscientist.com/article/mg24232270-200-machine-mind-hack-the-new-threat-that-could-scupper-the-ai-revolution/
(subscribe to read this article)

Now you should see the connection back to our BindImpactExpert slogan. So much for Oranges and Apples – AI cannot even see that an Orange is an Orange…

this is not good or even perhaps “suboptimal”…

Adversarial Images and the fun you can have with them…

From 2018 this article

https://www.wired.co.uk/article/artificial-intelligence-hacking-machine-learning-adversarial

Within this article is a great paper all about graffiti changed road signs! Here is a direct link to the article https://arxiv.org/pdf/1707.08945.pdf

Fooling Deep Learning “

Another great paper where Deep Learning AIs jump off the deep end:

http://www.evolvingai.org/fooling

Disappearing Humans

and finally this joyous article and a link to a You Tube video:

https://www.theverge.com/2019/4/23/18512472/fool-ai-surveillance-adversarial-example-yolov2-person-detection

But do not wear this picture as a T-Shirt down the street anytime soon!!!

Would you sit in a driverless car?

What this all boils down to is that I, for sure, will never sit in a driverless car within the next 40 years!
Why?

Firstly, I believe a true AI does not exist yet and will not exist for at least another 40 years.
Secondly, when I see an Orange I see an Orange because I “know” what an Orange looks like! As you have seen AIs have no concept of what an Orange *is* and so can easily see a drill instead or mistake skiers for dogs…
Thirdly, I am a programmer and someone somewhere is writing and testing – I hope! – the code… need I say more? Don’t even get me started with Agile or KANBAN.

extract from © 2019 – „We need to talk, AI” – Dr. Julia Schneider und Lena Kadriye Ziyal
A Comic Essay on Artificial Intelligence

https://weneedtotalk.ai

Room full of CTO’s

Have you heard about the hopefully apocryphal story of a room full of CTO’s at a software conference when they were asked “Would you fly in a computer controlled aircraft?” All their hands went up. Then they were asked “Would you fly if it was your software teams that had written the flight control software?” all their hands went down!

Flying is strangely different

While talking about flying my answer would be different?

Why?
Because in the air you have no-one else apart from other highly automated devices anywhere near you! This is the one hope that I can see for the whole driverless car stuff – Driving on the Interstate, Motorway or Autobahn. In the air, the environment is much more rigid. No lights, no crossings, no opposite traffic, no foot traffic, no children chasing a ball across the road etc.

They will come…

I am sure that driverless cars will come soon and I am equally sure that people will die because of them – However, I am also sure that lots of accidents will be avoided as well. After all the AI will always react faster than a human.

With this comes the ethics of AI and ML of course which will also play a major role. If the car has to decide in a crash scenario :

who gets hurt, should it protect itself and its cargo above all others?
What about a group of playing children or a bus full of nuns?
How can it decide who lives and who dies?

Looking on the bright side at least it will not get drunk, or high, and drive! I would also seriously hope that AI does not get road rage!

But even with these “bonus” points, I still have massive doubts and serious worries about the quality of the software and the true image recognition potential of current AIs.

AI everywhere

We were recently asked by a customer to add AI and ML to our SpaceAssuranceExpert (SAX) software. This has been happily running for years, capturing dataset allocations and extent usage in Db2 and making sure they never ran out of space or extents without getting told about it well beforehand.

I had to do a real double take and think “How on earth could AI help here?”

You are approaching 32 Linear Datasets with your segmented tablespace. You have to ALTER and REORG before you use up all the space in the 32^nd. Where can AI help here?
We already have extrapolation from current usage…

Don’t get me wrong

I am not against AI and ML…
but I think we *all* need to get a bit more skeptical about what can actually be delivered now.

On the one hand host variable contents and row counts – Yes indeed, but even here outliers will cause grief and then on the other hand the complete and utter science fiction of self-driving cars in our towns and cities.

The money being spent is also amazing, in 2019 it is estimated, by Syncsort,

that the AI budget will be $35.8 Billion 44% more than in 2018.

What do you think?

I am sure that autonomous cars will be a plague upon us – are you?

TTFN,
Roy Boxwell
Senior Architect

PS: I am looking forward to the first Adversarial Images for Db2 z/OS !!!

April 10, 2019January 14, 2021

2019-04 ZOWE for Continuous Delivery – It’s worse than that – he’s tested Jim!

Terrible quote usage I know… But now, with the all new and Agile Db2 12 being picked up and used globally, I think it is time to review a few things that companies and individuals must do, or at least plan, in this “Brave New World” of ours.

To wrap up at the end, I will delve into the idea of using ZOWE to help you test!

This Db2 12 testing checklist, helps to navigate into the Db2 Continuous Delivery world, well supported by Zowe, the modern IBM z/OS GUI

Table of contents – Db2 12-testing-checklist

The Db2 12 Testing Checklist

1. Do not be afraid of the clones!

To even begin to start testing Db2 12, you’ll need to start with a complete production clone. I know a lot of people get suddenly scared about this due to audit requirements or, more commonly, space requirements. I mean, who can possibly support a complete 100% data clone of their productive system?

A real clone or a partial clone?

Of course you do not actually need all of your data! All you must have is the Db2 Catalog and Directory, all of the user objects should be DEFINE NO style empty shells. After all, do you really need TBs of productive data to do a test run? Nope! So now we have removed two major problems (Space and Audit) from the list of testing problems.

2. Collect as much workload as you can

You’ll need to gather as much of your current executed SQL as you possibly can. At best 13 months is a good target. “Not possible” I hear you all scream – “Oh yes it is!” say I.

3. Dynamic & Static SQL: Store it cleverly away

You do not need the same SQL statement 12,000,000 times. Just once, but with an execution count. Store the dynamic SQLs and the retrieved Static SQLs away somewhere nice and safe with a COMPRESS YES to save room, and then you can easily get 13 months of data. Another major problem gone from the list of testing problems.

4. Gather your friends close but your enemies closer – IFCID376

Make sure you are running with IFCID 376 always on and everywhere! Test, QA and Prod. The moment it starts spitting out possible “problems”, start trying to track down the causing event (SQL) and verify that it is OK, or get a code change implemented. This only has a little to do with agile Db2 12 but should be on at every shop anyway!

But I don’t get the Dynamic SQL!

You do if you are fast and good enough! If you are permanently snapping the DSC and catching the flushed statements, then you can indeed find out the SQL that caused the IFCID. Another major problem disappears from the list of testing problems.

5. Access Paths a go go

Now it really makes a lot of sense to EXPLAIN all of the SQL that you have before and after the Db2 12 Functional Level (FL) change to see if just flipping to, for example, FL 504 caused access paths to head south… This is something you should all be doing now as well of course. Always validate the access path before being forced to do a REBIND or PREPARE. This way you will not get any nasty surprises… Another major problem vanishes from the list of testing problems.

6. Execute the SQLs

I have one last thing for you…

Take all of the SQL that you have just processed and transform it so it can actually execute.

Logarithmically reduce the execution count and execute all of the SQL while monitoring all of the Db2 KPIs – On the Clone, of course.
Then reset the world back to your starting Clone, do the FL change and then re-execute all the SQL again.
Once finished, compare all of the KPIs – looking for outliers as there has only been the FL change then.

Any differences you see should be just “background noise” but there could also be some nasties in there. This will light them up so that the root problem(s) can be found and corrected way before you actually do the FL change in production.

Well done, you’ve made it!

That’s it! With all these processes set up and fully automated you can actually do repetitive pipeline tests within hours instead of weeks or months or never!

7 – Db2 12 test-Review:

What does this mean for me?

Well,…

You need to get a fast, good, automated Cloning system in place.
You need to get a fast, good, automated IFCID 376 system in place.
You need to get a fast, good, automated SQL workload collection system in place.
You need to get a fast, good, automated Access Path comparison system in place.
You need to get a fast, good, automated SQL replay system in place.

… this means :

” Continuous testing in a continuous development world “…

You might, in fact, need to actually buy some software that does all this for you…

May I introduce you to our newest member of the SEGUS/SEG Family:

Db2 Continuous Delivery Deployment Check

is the software that does all of this, and more!

Check out our website for details of how this software really helps in :

Reducing the time to test

Enabling you to actually go forward with a good, clear conscience!

Visit our CDDC

CDDC for Db2 z/OS - Continuous Delivery Deployment Check - Agile & Environment simulation, Zowe IBM mainframe GUI

8 – ZOWE – Open Source zUI – to the rescue?

You have probably heard of ZOWE by now, the first open source software on z/OS from IBM, Rocket Software and CA Technologies (A Broadcom company). It was launched last year and is growing very quickly.

Introduction to Zowe:

Zowe – The zGUI (r)evolution – First hands-on experience and best practices
This presentation shows you the basics you need to understand ZOWE, summarizes the steps you need to use it and illustrates how ISPF online and batch applications are supported from the GUI world. Overview

https://www.youtube.com/watch?v=NX20ZMRoTtk

My firm is also using it now and all future developments will also be ZOWE enabled.

In a nutshell,

it allows users to interact with the Mainframe using a modern GUI.

This is based upon z/OSMF and uses work flows to actually do stuff. You define a “micro service” to do one thing and then you can string as many of these together as you like to get something done like, e.g. Provisioning a Db2 system or Cloning a Db2 subsystem etc.

Zowe at SEGUS and

SOFTWARE ENGINEERING?

We will be using ZOWE for two things:

1 – For the Installation and Maintenance of our software at customer sites

ZOWE enables people to do things that normally require “green screen” and we are keenly aware that green screen people are disappearing. If we want the z/OS platform to survive, it must be dragged, kicking and screaming, into the modern world.

2 – For using ZOWE as the front end for the users of our products

The CDDC product that I described in this newsletter will be built around ZOWE, thus expanding its usability in the market of tomorrow.

We are convinced that ZOWE is the way to go – are you?

TTFN,
Roy Boxwell
Senior Architect

April 4, 2019January 14, 2021

2019-03 EXPLAIN data review

A quick history of EXPLAIN and a new standard Db2 EXPLAIN Query to bring out the best and most underused columns.

EXPLAIN has been with us from nearly the start of Db2 (DB2 V2R1 when the b was B!) and, over the years, the number of Explain tables has mushroomed up to 20 in Db2 12. Not *all* of this data is really useful but there are bits and pieces that are well worth adding to your standard repertoire of explain queries!

Table of contents

PLAN_TABLE

This can be used to see in which release the PLAN_TABLE was created:

V2.1 – 25 column format
V2.2 – 28 column format
V2.3 – 30 column format
V3.1 – 34 column format
V4.1 – 43 column format
V5.1 – 46 column format
V6.1 – 49 column format
V7.1 – 51 column format
V8.1 – 58 column format
V9.1 – 59 column format
10 – 64 column format ending with MERGN
11 – 66 Column format ending with EXPANSION_REASON
12 – 67 column format ending with PER_STMT_ID

What could be interesting here?? Well what about:

PREFETCH D, S, L, U, or Blank?

D for Dynamic
S for pure Sequential
L for through a page list
U for unsorted RID list
Blank for Unknown or no prefetch

COLUMN_FN_EVAL R, S, X, Y, or Blank?

R for while data is read
S for while performing a sort
X for while data is read but using OFFSET
Y for while performing a sort but using OFFSET
Blank for after data retrieval and any sort

PAGE_RANGE Y or Blank.

Y for yes the table qualifies for page range screening
so that only the needed partitions are scanned
Blank for no

PRIMARY_ACCESSTYPE D, P, S, T, Blank. Is direct row access attempted first:

D it tries to use direct row access with a rowid column. If it cannot do this it uses the access path that is described in the ACCESSTYPE column
P it uses a DPSI and a part-level operation to access the data
S it uses sparse index access for a sideways table reference
T the base table or result file is materialized into a work file, and the work file is accessed via sparse index access. If a base table is involved, then ACCESSTYPE indicates how the base table is accessed
Blank it does not try to use direct row access by using a rowid column or sparse index access for a work file.

DSN_DETCOST_TABLE

There are a ton of interesting columns here but most are “IBM internal only”, however, these are available for our enjoyment:

ONECOMPROWS	The number of rows qualified after applying local predicates.
IMLEAF	The number of index leaf pages scanned by Data Manager.
IMFF	The filter factor of matching predicates only.
IMFFADJ	The filter factor of matching and screening
DMROWS	The number of data manager rows returned (after all stage 1 predicates are applied).
RDSROW	The number of data manager rows returned (after all stage 1 predicates are applied).
IXSCAN_SKIP_DUPS	Whether duplicate index key values are skipped during an index scan. Y Duplicate key values are skipped N Duplicate key values are not skipped
IXCAN_SKIP_SCREEN	Whether key ranges that are disqualified by index screening predicates are skipped during an index scan. Y Disqualified key ranges are skipped N Key ranges are not skipped
EARLY_OUT	Whether fetching from the table stops after the first qualified row. Y Internal fetching stops after the first qualified row N Internal fetching continues after the first qualified row
BLOCK_FETCH	Y or N Was block fetch used?

DSN_FILTER_TABLE

Sometimes it is really interesting to see when the predicate is applied and whether or not it could be pushed down.

STAGE	The stage that the predicate was evaluated. MATCHING, SCREENING, PAGERANGE, STAGE1, or STAGE2.
PUSHDOWN	Whether the predicate was pushed down. I for the Index Manager evaluates it D for the Data Manager evaluates it Blank means no push down was used

DSN_PREDICAT_TABLE

Here lives the really good stuff that most people do not use! The Bubble Up is normally an SQL coding error and the T is “forgetfulness”…

ADDED_PRED	If this column is non-blank it means that Db2 has rewritten the query to some extent. This is not good as it adds CPU to the process. In my opinion any “added predicate” should *already* be coded correctly in the SQL! Blank Db2 did not add the predicate B for bubble up C for correlation J for join K for LIKE for expression-based index L for localization P for push down R for page range S for simplification T for transitive closure
CLAUSE	Is this a SELECT, HAVING, ON, or WHERE clause?
ORIGIN	Where did it come from? C for a column mask R for a Row Permission U specified by User Blank generated by Db2

New Standard Explain SQL

Just adding these to your standard Explain query like this:

SET CURRENT SQLID = 'BOXWELL' ;                                   
 SELECT SUBSTR(P.PROGNAME, 1 , 8 )        AS PROGNAME              
       ,SUBSTR(DIGITS(P.QUERYNO), 6, 5)   AS LINE                  
       ,SUBSTR(DIGITS(P.QBLOCKNO), 4, 2)  AS QNO                   
       ,SUBSTR(DIGITS(P.PLANNO), 4, 2)    AS PNO                   
       ,SUBSTR(DIGITS(P.MIXOPSEQ), 4, 2)  AS SQ                    
       ,SUBSTR(DIGITS(P.METHOD), 5, 1)    AS M                     
       ,SUBSTR(P.TNAME, 1, 18)            AS TABLE_NAME            
       ,P.ACCESSTYPE                      AS A                     
       ,P.PRIMARY_ACCESSTYPE              AS PA                    
       ,SUBSTR(DIGITS(P.MATCHCOLS), 4, 2) AS CS                    
       ,SUBSTR(P.ACCESSNAME, 1, 12)       AS INDEX                 
       ,P.INDEXONLY                       AS IO                    
       ,SUBSTR(CASE SORTN_UNIQ    WHEN 'N' THEN '-' ELSE 'Y' END ||
               CASE SORTN_JOIN    WHEN 'N' THEN '-' ELSE 'Y' END ||
               CASE SORTN_ORDERBY WHEN 'N' THEN '-' ELSE 'Y' END ||
               CASE SORTN_GROUPBY WHEN 'N' THEN '-' ELSE 'Y' END   
                                 , 1 , 4) AS UJOG                  
       ,SUBSTR(CASE SORTC_UNIQ    WHEN 'N' THEN '-' ELSE 'Y' END ||
               CASE SORTC_JOIN    WHEN 'N' THEN '-' ELSE 'Y' END ||
               CASE SORTC_ORDERBY WHEN 'N' THEN '-' ELSE 'Y' END ||
               CASE SORTC_GROUPBY WHEN 'N' THEN '-' ELSE 'Y' END   
                                 , 1 , 4) AS UJOGC                 
       ,P.PREFETCH                        AS P                     
       ,P.COLUMN_FN_EVAL                  AS CE                    
       ,P.PAGE_RANGE                      AS PR                    
       ,P.QBLOCK_TYPE                     AS TYPE                  
       ,P.MERGE_JOIN_COLS                 AS MJC                   
       ,S.PROCMS                          AS MS                    
       ,S.PROCSU                          AS SU                    
       ,D.EARLY_OUT                       AS EO                    
       ,D.BLOCK_FETCH                     AS BF                    
       ,F.ORDERNO                         AS ON                    
       ,F.PREDNO                          AS PN                    
       ,F.STAGE                           AS STAGE                 
       ,F.PUSHDOWN                        AS PD                    
       ,R.TYPE                            AS TYPE                  
       ,R.ADDED_PRED                      AS AP                    
       ,R.CLAUSE                          AS CLAUSE                
       ,R.ORIGIN                          AS OR                    
       ,R.REDUNDANT_PRED                  AS RP                    
       ,R.TEXT                            AS TRANSFORMED_PREDICATE 
 FROM PLAN_TABLE         P                                         
 FULL OUTER JOIN                                                   
      DSN_STATEMNT_TABLE S               
 ON    P.QUERYNO      = S.QUERYNO        
   AND P.APPLNAME     = S.APPLNAME       
   AND P.PROGNAME     = S.PROGNAME       
   AND P.COLLID       = S.COLLID         
   AND P.GROUP_MEMBER = S.GROUP_MEMBER   
   AND P.SECTNOI      = S.SECTNOI        
   AND P.VERSION      = S.VERSION        
   AND P.EXPLAIN_TIME = S.EXPLAIN_TIME   
 FULL OUTER JOIN                         
      DSN_DETCOST_TABLE  D               
 ON    P.QUERYNO      = D.QUERYNO        
   AND P.APPLNAME     = D.APPLNAME       
   AND P.PROGNAME     = D.PROGNAME       
   AND P.COLLID       = D.COLLID         
   AND P.GROUP_MEMBER = D.GROUP_MEMBER   
   AND P.SECTNOI      = D.SECTNOI        
   AND P.VERSION      = D.VERSION        
   AND P.EXPLAIN_TIME = D.EXPLAIN_TIME   
   AND P.QBLOCKNO     = D.QBLOCKNO       
   AND P.PLANNO       = D.PLANNO         
 FULL OUTER JOIN                         
      DSN_FILTER_TABLE   F               
 ON    P.QUERYNO      = F.QUERYNO        
   AND P.APPLNAME     = F.APPLNAME       
   AND P.PROGNAME     = F.PROGNAME       
   AND P.COLLID       = F.COLLID         
   AND P.GROUP_MEMBER = F.GROUP_MEMBER   
   AND P.SECTNOI      = F.SECTNOI        
   AND P.VERSION      = F.VERSION        
   AND P.EXPLAIN_TIME = F.EXPLAIN_TIME   
   AND P.QBLOCKNO     = F.QBLOCKNO       
   AND P.PLANNO       = F.PLANNO         
 FULL OUTER JOIN                         
      DSN_PREDICAT_TABLE R               
 ON    F.QUERYNO      = R.QUERYNO        
   AND F.APPLNAME     = R.APPLNAME       
   AND F.PROGNAME     = R.PROGNAME       
   AND F.COLLID       = R.COLLID         
   AND F.GROUP_MEMBER = R.GROUP_MEMBER   
   AND F.SECTNOI      = R.SECTNOI        
   AND F.VERSION      = R.VERSION        
   AND F.EXPLAIN_TIME = R.EXPLAIN_TIME   
   AND F.QBLOCKNO     = R.QBLOCKNO       
   AND F.PREDNO       = R.PREDNO         
 WHERE 1 = 1                             
   AND P.QUERYNO  IN (1 , 2 )                               
 ORDER BY 1 , 2 , 3 , 4 , 5 , 24 , 25                       
 ;

Here I limit it to just the QUERYNO 1 and 2 as these were the numbers used for the EXPLAIN command:

EXPLAIN ALL SET QUERYNO = 1 FOR              
 SELECT INSTANCE, CLONE                       
 FROM SYSIBM.SYSTABLESPACE A                  
 WHERE ( SELECT B.TSNAME                      
         FROM SYSIBM.SYSTABLES  B             
             ,SYSIBM.SYSINDEXES C             
         WHERE C.CREATOR   = ?                
           AND C.NAME      = ?                
           AND C.TBCREATOR = B.CREATOR        
           AND C.TBNAME    = B.NAME ) = A.NAME
 AND A.DBNAME = ?                             
 ;                                            
 EXPLAIN ALL SET QUERYNO = 2 FOR              
 SELECT A.INSTANCE, A.CLONE                   
 FROM SYSIBM.SYSTABLESPACE A                  
     ,SYSIBM.SYSTABLES     B                  
     ,SYSIBM.SYSINDEXES    C                  
 WHERE C.CREATOR   = ?                        
   AND C.NAME      = ?                        
   AND C.TBCREATOR = B.CREATOR                
   AND C.TBNAME    = B.NAME                   
   AND A.DBNAME    = ?                        
   AND A.NAME      = B.TSNAME                 
   AND A.DBNAME    = B.DBNAME                 
 WITH UR                                      
 ;

--------+--------+------+--------+---------+---------+--------+------+---
PROGNAME LINE QNO PNO SQ M  TABLE_NAME    A PA CS INDEX    IO UJOG  UJOGC
--------+--------+------+--------+---------+---------+--------+------+---
DSNESM68 00001 01 01  00 0  SYSTABLESPACE I    02  DSNDSX01 N  ----  ---- 
DSNESM68 00001 01 01  00 0  SYSTABLESPACE I    02  DSNDSX01 N  ----  ---- 
DSNESM68 00001 02 01  00 0  SYSINDEXES    I    02  DSNDXX01 N  ----  ---- 
DSNESM68 00001 02 01  00 0  SYSINDEXES    I    02  DSNDXX01 N  ----  ---- 
DSNESM68 00001 02 02  00 1  SYSTABLES     I    02  DSNDTX01 N  ----  ---- 
DSNESM68 00001 02 02  00 1  SYSTABLES     I    02  DSNDTX01 N  ----  ---- 
DSNESM68 00002 01 01  00 0  SYSINDEXES    I    02  DSNDXX01 N  ----  ---- 
DSNESM68 00002 01 01  00 0  SYSINDEXES    I    02  DSNDXX01 N  ----  ---- 
DSNESM68 00002 01 02  00 1  SYSTABLES     I    02  DSNDTX01 N  ----  ---- 
DSNESM68 00002 01 02  00 1  SYSTABLES     I    02  DSNDTX01 N  ----  ---- 
DSNESM68 00002 01 02  00 1  SYSTABLES     I    02  DSNDTX01 N  ----  ---- 
DSNESM68 00002 01 03  00 1  SYSTABLESPACE I    02  DSNDSX01 N  ----  ---- 
DSNESM68 00002 01 03  00 1  SYSTABLESPACE I    02  DSNDSX01 N  ----  ---- 
DSNE610I NUMBER OF ROWS DISPLAYED IS 13

--+---------+---------+---------+---------+---------+---------+--------
 P  CE  PR  TYPE      MJC            MS           SU  EO  BF        ON
--+---------+---------+--------+---------+---------+---------+---------
            SELECT  ------            1            2  N   N          1
            SELECT  ------            1            2  N   N          2
            NCOSUB  ------            1            2  N   N          1
            NCOSUB  ------            1            2  N   N          2
            NCOSUB  ------            1            2  N   N          1
            NCOSUB  ------            1            2  N   N          2
            SELECT  ------            1            2  N   N          1
            SELECT  ------            1            2  N   N          2
            SELECT  ------            1            2  N   N          1
            SELECT  ------            1            2  N   N          2
            SELECT  ------            1            2  N   N          3
            SELECT  ------            1            2  N   N          1
            SELECT  ------            1            2  N   N          2

---------+---------+---------+---------+---------+---------+---------+-----
PN  STAGE  PD TYPE  AP  CLAUSE    OR  RP  TRANSFORMED_PREDICATE 
---------+---------+---------+---------+---------+---------+---------+-----
3  MATCHING   EQUAL     WHERE     U   N   "A"."DBNAME"=(EXPR) 
2  MATCHING   EQUAL     WHERE     U   N   "A"."NAME"=(SELECT "B"."TSNAME"
5  MATCHING   EQUAL     WHERE     U   N   "C"."CREATOR"=(EXPR)           
6  MATCHING   EQUAL     WHERE     U   N   "C"."NAME"=(EXPR)              
7  MATCHING   EQUAL     WHERE     U   N   "C"."TBCREATOR"="B"."CREATOR"
8  MATCHING   EQUAL     WHERE     U   N   "C"."TBNAME"="B"."NAME"        
2  MATCHING   EQUAL     WHERE     U   N   "C"."CREATOR"=(EXPR)           
3  MATCH      EQUAL     WHERE     U   N   "C"."NAME"=(EXPR)              
4  MATCH      EQUAL     WHERE     U   N   "C"."TBCREATOR"="B"."CREATOR"  
5  MATCHING   EQUAL     WHERE     U   N   "C"."TBNAME"="B"."NAME"        
9  STAGE1     EQUAL  T  WHERE         N   "B"."DBNAME"=(EXPR)            
6  MATCHING   EQUAL     WHERE     U   N   "A"."DBNAME"=(EXPR)            
7  MATCHING   EQUAL     WHERE     U   N   "A"."NAME"="B"."TSNAME"

It shows you a lot more data than the normal bare bones Explain PLAN_TABLE data especially here the TYPE, STAGE and ADDED_PREDICATE (AP Column).

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

February 20, 2019January 14, 2021

2019-02 Global Temporary Confusion

Global Temporary Table usage in Db2 z/OS

Learn, through examples, why CGTT (CREATEd Global Temporary Tables) are possibly better than DGTT (DECLAREd Global Temporary Tables) and how to use them to speed up performance.

When Global Temporary Tables were first introduced (DB2 V5.1) it was all pretty clear! You used normal DDL to CREATE a GLOBAL TEMPORARY Table that existed in the Db2 Catalog, but every process got “their” own copy. There were limitations like No UPDATE, No Single DELETE, no Default Values and ROLLBACK / COMMIT and data reuse was not simple but it all sort of worked…

Global – but not as you know it!

Then along came the brand new Global Temporary Table (DB2 V6.1) which, just to make matters clearer, they prefixed with DECLARE so that it was 100% obvious that it had *nothing* to do with the “other” temporary table… I tell ya, if I ever meet the developer who came up with these names in a dark alley one night…. But I digress… So with the new DECLARE version you suddenly *could* do Update and Delete and it came with an ON COMMIT clause to make it simpler to handle data with COMMIT – Hoorah!

Global Temporary Table in Db2 12 – All new and improved

As of Db2 12, the full list of things you can do with a CREATE GLOBAL TEMPORARY TABLE (from now on simply CGTT) as opposed to a DECLARE GLOBAL TEMPORARY TABLE (from now on simply DGTT) is:

 What            CGTT           DGTT
 MERGE           No             Yes
 UPDATE          No             Yes
 Single DELETE   No             Yes
 DEFAULTs        No             Yes
 ROLLBACK        Deletes all    Yes (Plus NOT LOGGED ON ROLLBACK)
 COMMIT          Deletes all    Depends on ON COMMIT
                 (unless WITH HOLD)
 Indexes?        No             Yes
 WITH RETURN     Yes            Yes <- This is very good news
 TO CLIENT

CGTT why?

So, looking at the list, you have to wonder – “Why would anyone be using CGTTs?”
Well the reason is “Performance!” –
DGTTs might do all the tricks but they also run like a dog!

Belief is ok, test is trust!

I wrote a couple of small COBOL programs that did the same thing: one using a DGTT and one using a CGTT. All the programs did, was call a section 1000 times that DECLARED the table (or obviously not!) and then called another section that inserted 10 rows and opened a cursor to then fetch these rows right back with an ORDER BY. I chose this test as most DGTT/CGTT usage is low-volume but high called rate (Think stored procedures here!) so I wanted to see what sort of overhead 1000 DGTTs caused.

No EDMPOOL here!

Monitoring is not easy as DGTT usage is not in the EDMPOOL, so I did it the “good old way” by running my tests when I was alone on the system and five times each. I was a bit shocked by the results…

What               CGTT                   DGTT
DBM1 EXCPs          0                      8,817
MSTR EXCPs          0                     59,653
Db2 CPU             0.140 secs             3.546 secs
Program cpu         4.506 secs           101.124 secs
Program elapsed     4.958 secs           118.730 secs

I then changed the DGTT to also be NOT LOGGED

What                   DGTT 
DBM1 EXCPs               66
MSTR EXCPs               47
Db2 CPU               2.446 secs
Program cpu         100.818 secs
Program elapsed     116.426 secs

Not that much better! But remember, I did not do any DELETE or UPDATE processing so your performance data may be better than mine.

Impressive!

Now the reason for all of the I/O and CPU is
all of the internals that Db2 must do for the DGTT, remember the CGTT has *already* been declared and exists in the catalog. The DGTT naturally not…
This can, as be seen, add a lot of overhead!

ROT still correct

Now the good old Rule of Thumb still holds true:

A CGTT is great for doing zero updates and sequential access!

Db2 10 improvement

When a cursor is declared with the attribute RETURN TO CLIENT then this cursor is available from the originating calling client even if the current stored procedure is down line. This is very neat as the cursor is “invisible” to all the intermediate procedures.

Death by DGTT

The problem in the past was the “hand shaking” of the final stored procedure result set. It had to be read out and put into another DGTT and then this handling was repeated all the way back up the calling chain. Now, if you have a “final cursor”, you can simply declare it as RETURN TO CLIENT – This saves tons of CPU and elapsed time!

Final tip!

Remember if you use STATIC SCROLLABLE cursors then they *must* use DGTTs in the background to actually work!

Even more work…

Then a colleague who proof-reads all my newsletters, a nasty job but someone must do it, asked me:

“I wonder what the numbers would be if you declared the GTT only once, then INSERTED/SELECTED/DELETED a bunch of rows 1000 times? Because that’s how I’m using it …“

So then I changed the programs to only DECLARE once, insert a 1000 rows, Select a 1000 rows and then do a MASS DELETE of the data.

What             CGTT           DGTT            DGTT NOT LOGGED
DBM1 EXCPs       7,572          7,720           7,769
MSTR EXCPs       0                672             238
Db2 CPU          0.220 secs     0.310 secs      0.410 secs
Program cpu      0.346 secs     0.482 secs      0.468 secs
Program elapsed  0.704 secs     0.836 secs      0.818 secs

Much closer, in terms of performance, but still the CGTT is quicker – Time to check your usage of these Temporary Tables if you ask me!

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

August 29, 2018January 14, 2021

2018-08 SOUNDEX and other “cool” features – Part six All new for Db2 12

Part six of my – everlasting – walk through of new Aggregate and Scalar functions.
Following with : HASH & WRAP

Previous “SOUNDEX” Newsletters

Part 5: 2018-07: SOUNDEX and other „cool“ features – Part five All new for Db2 12
Part 4: 2015-06: SOUNDEX and other „cool” features – Part four for Db2 10 & DB2 11
Part 3: 2012-06: SOUNDEX and other „cool“ features – Part three for Db2 10
Part 2: 2012-05: SOUNDEX and other „cool“ features – Part two for Db2 V9
Part 1: 2012-04: SOUNDEX and other „cool“ features – Part one for Db2 V8

SCALARS

Now I move on to the new scalar functions, really only two new ones came with Db2 12, HASH and WRAP.

HASH it

Four new HASH Scalar functions are supplied. The first runs every time:

SELECT HEX(HASH_CRC32  ('ROY LIKES BEER A LOT!')) 
FROM SYSIBM.SYSDUMMYU;                            
---------+---------+---------+---------+---------+----------+---------+
4FA13CD
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Then I tried MD5

SELECT HEX(HASH_MD5    ('ROY LIKES BEER A LOT!'))
FROM SYSIBM.SYSDUMMYU;
---------+---------+---------+---------+---------+---------+---------+--
DSNE610I NUMBER OF ROWS DISPLAYED IS 0
DSNT408I SQLCODE = -20223, ERROR:  THE ENCRYPT_TDES OR DECRYPT FUNCTION
FAILED.  ENCRYPTION FACILITY NOT AVAILABLE 12 0
DSNT418I SQLSTATE   = 560BF SQLSTATE RETURN CODE
DSNT415I SQLERRP    = DSNXRBIN SQL PROCEDURE DETECTING ERROR
DSNT416I SQLERRD    = 1032 0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD    = X'00000408'  X'00000000'  X'00000000'  X'FFFFFFFF'
X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION

Oops! What is that?

– 20223 THE OPERATION FAILED. ENCRYPTION FACILITY NOT AVAILABLE return-code, reason-code

Explanation: The encryption facility is not available, or not able to service the encryption or decryption request.

System action: The statement cannot be processed.

Programmer response: If the encryption facility is not installed, install it before you use the ENCRYPT_TDES, DECRYPT, HASH_CRC32, HASH_MD5, HASH_SHA1, HASH_SHA256 functions or data set encryption. If the encryption facility is installed, verify that it is working correctly. The returncode and reason-code might give further information about why this message was issued.

Digging deeper into the “ICSF and cryptographic coprocessor return and reason codes” docu

Reason codes for return code C (12)
0 (0)	ICSF is not available. One of the following situations is possible: ICSF is not started

Reason codes for return code C (12)

Reason Code Hex (Decimal)

Description

0 (0)

ICSF is not available. One of the following situations is possible:

ICSF is not started

Yep – We have no crypto here in the labs, so no big surprise! But hey, at least the CRC32 worked!

Let’s WRAP it up

WRAP is useful for people like me that write software that other people buy! If you are using any of these:

```
CREATE FUNCTION (compiled SQL scalar)
```
```
CREATE FUNCTION (inlined SQL scalar)
```
```
CREATE PROCEDURE (SQL - native)
```
```
CREATE PROCEDURE (SQL table)
```
```
CREATE TRIGGER (basic)
```
```
CREATE TRIGGER (advanced)
```

Then you are aware that the code is in clear text and easy to read – Not good for my intellectual property rights! WRAP solves this dilemma by letting me obfuscate the code completely.

Here are two examples with the problems they can pose:

SELECT WRAP ('CREATE FUNCTION SALARY(WAGE DECFLOAT) RETURNS DECFLOAT RETURN WAGE * 40 * 52')
FROM SYSIBM.SYSDUMMY1
;
SELECT WRAP('
CREATE TRIGGER BOXWEL2.TESTTRIGH
   AFTER
   UPDATE OF COL1 ON BOXWEL2.TESTTRIG
   REFERENCING NEW AS NTABLE
   FOR EACH ROW
   MODE DB2SQL
   WHEN (NTABLE.COL1 = ''2'')
   UPDATE MVNXTEST.MVNXT80 A
   SET UT_STATUS = ''X''
')
FROM SYSIBM.SYSDUMMY1
;

The output looks like this:

CREATE FUNCTION SALARY(WAGE DECFLOAT) WRAPPED DSN12100 
ablGWmdiWmtGTmdCTmtaTmtmUntqUmJeUmZi2mdKZidaWmdaWmdaWmZG1mIaGicaGi6TRm8Eu4e30 AmlGUcEdToUnWtd2:hIfdEN_bBCpEmtl1lKH:UAzaqaa

One very long line that you must then “reverse” parse to get it back into, e.g. 72 byte chunks for SPUFI use.

The second example is even better, because of the whitespace being not encoded/encoded as well:

                                                            CREATE TRIGGER
BOXWEL2.TESTTRIGH                                           WRAPPED DSN1210
ablGWmdiWmtGTmdCTmtaTmtmUntqUmJeUmZm0otKXidaWmdaWmda1nJKGicaGicaGQ:TRO87Axb:VP8p_d8E9N887FL:EnRZ:8ltM:4sMnV5iyw3QGPUu0hDP4uwUK4lkDz0xvk3PU6lCMBGOCZar_sbJSQyaHYumRRSkATFXus8DesyjqaYmOLMD2HWdLt6GFDTxkNr4g8ht874tPZXN5ZIpJW4Xx15CI:VZ4f7ENGxV6_jxL4tVN0MFF2:tFR6EcD:g3nZurpPzOd2PYTuslWXKWXEWsWk2Q1KuT1VjPlY_MeNhCnpppEFEoQgp:dudFUcPhFyesEPIrPYTZxKAHlj1sDOKQTp1fYWoQ5nQBXOZGGtBreN2j9oSdSJNgf3roCpVJS8EFxdZ5DuoI_PDP5t7d1DPTL7Vtlp2EMsESTnl6s80KH4O13Wr72s4y56iQwBwB0KuOUYKxUZb:zNdchpguBUNZom2p2yyTpOXXkcHhKGnFnbBaUd2rlMbRmKZdgDiNwi7rEKHTLYulpLOa7kqrHKLymXZKurj64TGpF2IjXUmYkMbgbn4CFnOLk20rArSOIWIzbpajemWcmyHWmWdWfhyr8dJWZ8ghXJ5mF_nKr3ZyNUNw7Co9OqUxuEPuocFy4vUswVtOmonhr23d4VgBQz3Zf8nV2p5AVOEYLsJ7QDnNq7Bb132t3R7Rn603gF0PUi_PdCH6ef8Kmk4a8uWr2hkicHj8apO7hQf5w3Tc2tsvOCvOe5RaopCLKgg5BmnnYYzAl7lLiNGAvob1_Dd7PkoX51LDZ3QlBJUugQm5WsPJKKtwDjM81Wa

Notice that I had to double up the apostrophes due to this being passed as input to WRAP now – so you do have to watch out for that.
Remember too, that this is not encryption, it is just encoding and can be broken – It just makes it a tick harder to read your code, that is all. IBM writes:
Important
The encoding of the statement is meant to obfuscate the content and is not considered a form of strong encryption.

Well that wraps up this newsletter! Sorry….

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

July 16, 2018January 14, 2021

2018-07 – SOUNDEX and other “cool” features – Part five All new for Db2 12

Part five of my – everlasting – walk through of new Aggregate and Scalar functions. This time I will start with the new AGGREGATE functions introduced in Db2 12.

LISTAGG

This is mainly of interest due to the fact that it was the very first Agile Function Level feature in Db2. You must be at FL501 to use this in SQL otherwise you get a

DSNT408I SQLCODE = -4700 ATTEMPT TO USE NEW FUNCTION BEFORE FUNCTION LEVEL IS ACTIVATED

Or a

DSNT408I SQLCODE = -4743, ERROR:  ATTEMPT TO USE A FUNCTION WHEN THE APPLICATION COMPATIBILITY SETTING IS SET FOR A PREVIOUS LEVEL

What this function does, is basically a horizontal recursive join which can be sorted in a different sequence. Here’s the SQL example:

SELECT WORKDEPT,                                       
LISTAGG(LASTNAME, ', ') WITHIN GROUP(ORDER BY LASTNAME)
AS EMPLOYEES                                           
FROM SAMPLE.EMP                                        
GROUP BY WORKDEPT;

Which returns:

---------+---------+---------+---------+---------+---------+---------+-----+
WORKDEPT  EMPLOYEES                                                        
---------+---------+---------+---------+---------+---------+---------+-----+
A00       HAAS, LUCCHESI, O'CONNELL 
B01       THOMPSON 
C01       KWAN, NICHOLLS, QUINTANA    
D11       ADAMSON, BROWN, JONES, LUTZ, PIANKA, SCOUTTEN, STERN, WALKER,
          YOSHIMURA
D21       JEFFERSON, JOHNSON, MARINO, PEREZ, PULASKI, SMITH
E01       GEYER  
E11       HENDERSON, PARKER, SCHNEIDER, SETRIGHT, SMITH  
E21       GOUNOT, LEE, MEHTA, SPENSER 
DSNE610I NUMBER OF ROWS DISPLAYED IS 8

So you can see it adds the LASTNAME and a comma repeatedly for each WORKDEPT value.

Db2 Catalog Docu

This function is actually very handy for one of the things I create for our programmers here in the labs. It is always a good idea to “know the catalog” and so I created a little word document years ago with all the tablespaces, tables, indexes and index columns with sort order so that they did not have to look in three separate books to find the data. Here is the SQL for listing out the tables with indexes:

SELECT SUBSTR(A.DBNAME, 1, 8) AS DB      
      ,SUBSTR(A.NAME, 1, 8 ) AS TS       
      ,SUBSTR(B.NAME, 1, 18) AS TABLE    
      ,SUBSTR(C.NAME, 1, 8 ) AS INDEX    
      ,B.TYPE                            
      ,CASE C.UNIQUERULE                 
        WHEN 'D' THEN '-'                
        ELSE 'Y'                         
       END AS U                          
      ,D.COLSEQ                          
      ,SUBSTR(D.COLNAME, 1, 18) AS COL   
      ,CASE D.ORDERING                   
        WHEN ' ' THEN 'I'                
        WHEN 'A' THEN '-'                
        WHEN 'D' THEN 'D'                
        WHEN 'R' THEN 'R'                
       END AS ORDER                      
FROM SYSIBM.SYSTABLESPACE A              
    ,SYSIBM.SYSTABLES     B              
    ,SYSIBM.SYSINDEXES    C              
    ,SYSIBM.SYSKEYS       D              
WHERE A.DBNAME IN ('DSNDB01','DSNDB06')  
  AND A.DBNAME  = B.DBNAME               
  AND A.NAME    = B.TSNAME               
  AND B.CREATOR = 'SYSIBM'               
  AND B.NAME    = C.TBNAME               
  AND B.CREATOR = C.TBCREATOR            
  AND C.NAME    = D.IXNAME                
  AND C.CREATOR = D.IXCREATOR            
ORDER BY DB, TS , TABLE , INDEX, D.COLSEQ ;

It returns 855 rows of data on my test Db2 12 Fl501 system:

---------+-------+-------+---+-------+------+--------+------------------+--
DB        TS      TABLE         INDEX     TYPE  U  COLSEQ  COL       ORDER
---------+-------+-------+---+-------+------+--------+------------------+--
DSNDB01   DBD01   DBDR          DSNDB01X   T   Y      1   DBID         -
DSNDB01   DBD01   DBDR          DSNDB01X   T   Y      2   SECTION      -
DSNDB01   SCT02   SCTR          DSNSCT02   T   Y      1   SCTNAME      -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      1   SPTLOCID     -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      2   SPTCOLID     -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      3   SPTNAME      -

And ending with

DSNDB06  SYSXML   SYSXMLRELS    DSNXRX02   T   -      1  XMLTBOWNER    -
DSNDB06  SYSXML   SYSXMLRELS    DSNXRX02   T   -      2  XMLTBNAME     -
DSNDB06  SYSXML   SYSXMLSTRINGS DSNXSX01   T   Y      1  STRINGID      -
DSNDB06  SYSXML   SYSXMLSTRINGS DSNXSX02   T   Y      1  STRING        -
DSNE610I NUMBER OF ROWS DISPLAYED IS 855

Then I used my amazing WORD skills to move all the COL columns after each other to get a DOC file that looks like this:

DSNDB0n.     SYSIBM.       SYSIBM.      U  INDEX FIELD
DBD01        DBDR          DSNDB01X     Y  DBID.SECTION
SCT02        SCTR          DSNSCT02     Y  SCTNAME
SPT01        SPTR          DSNSPT01     Y  SPTLOCID.SPTCOLID.SPTNAME.etc.

Which, of course, the programmers loved, as it enabled them to quickly find which columns, in which sort order, (I convert the ORDER D COL column to be in bold by the way), are available on the catalog and directory tables. I started this with DB2 V5…

Now with LISTAGG the query looks like:

SELECT SUBSTR(A.DBNAME, 1, 8)                                 
      ,SUBSTR(A.NAME, 1, 8 )                                  
      ,SUBSTR(B.NAME, 1, 18)                                  
      ,SUBSTR(C.NAME, 1, 8 )                                  
      ,B.TYPE AS T                                            
      ,CASE C.UNIQUERULE                                      
        WHEN 'D' THEN '-'                                     
        ELSE 'Y'                                              
       END AS U                                               
      ,LISTAGG( CHAR(SUBSTR(D.COLNAME, 1, 18) CONCAT           
                     CASE D.ORDERING                          
                     WHEN ' ' THEN 'I'                        
                     WHEN 'A' THEN '-'                        
                     WHEN 'D' THEN 'D'                        
                     WHEN 'R' THEN 'R'                        
                     END                                      
                    )                                         
               , ' ') WITHIN GROUP (ORDER BY D.COLSEQ ASC)    
FROM SYSIBM.SYSTABLESPACE A                                   
    ,SYSIBM.SYSTABLES     B                                   
    ,SYSIBM.SYSINDEXES    C                                   
    ,SYSIBM.SYSKEYS       D                                   
WHERE A.DBNAME IN ('DSNDB01','DSNDB06')                       
  AND A.DBNAME  = B.DBNAME                                    
  AND A.NAME    = B.TSNAME                                    
  AND B.CREATOR = 'SYSIBM'                                    
  AND B.NAME    = C.TBNAME                                    
  AND B.CREATOR = C.TBCREATOR                                 
  AND C.NAME    = D.IXNAME
  AND C.CREATOR = D.IXCREATOR                                 
GROUP BY A.DBNAME, A.NAME, B.NAME, C.NAME, B.TYPE, C.UNIQUERULE
;

And the output:

---------+---------+-------+-----------+---------------------------+---
                                            T  U 
---------+---------+-------+-----------+---------------------------+---
DSNDB01   DBD01   DBDR            DSNDB01X  T  Y  DBID     - SECTION  -     
DSNDB01   SCT02   SCTR            DSNSCT02  T  Y  SCTNAME  -  
DSNDB01   SPT01   SPTR            DSNSPT01  T  Y  SPTLOCID - SPTCOLID - SPTNAME

And ends

DSNDB06   SYSXML  SYSXMLRELS      DSNXRX02  T  -  XMLTBOWNER - XMLTBNAME
DSNDB06   SYSXML  SYSXMLSTRINGS   DSNXSX01  T  Y  STRINGID   -         
DSNDB06   SYSXML  SYSXMLSTRINGS   DSNXSX02  T  Y  STRING     -         
DSNE610I NUMBER OF ROWS DISPLAYED IS 287

Which is a lot less data for my poor little fingers to work with!

Of course it is not perfect… the really big GOTCHA, is ORDER BY is *not* allowed!

PERCENTILES

For the next couple of examples, I will be using the table SAMPLE.EMP that contains these rows and columns of interest:

---------+---------+---------+---------+---------+-----
EMPNO   WORKDEPT       SALARY        BONUS         COMM
---------+---------+---------+---------+---------+-----
000210  D11          18270.00       400.00      1462.00
000190  D11          20450.00       400.00      1636.00
000180  D11          21340.00       500.00      1707.00
000160  D11          22250.00       400.00      1780.00
000170  D11          24680.00       500.00      1974.00
000150  D11          25280.00       500.00      2022.00
000200  D11          27740.00       600.00      2217.00
000220  D11          29840.00       600.00      2387.00
000060  D11          32250.00       600.00      2580.
DSNE610I NUMBER OF ROWS DISPLAYED IS 9

PERCENTILE_CONT

Calculates the requested percentile as a continuous value. Use this if you want a calculated value based upon your input. E.g.:

SELECT PERCENTILE_CONT(0,50)
WITHIN GROUP (ORDER BY SALARY ASC )
FROM SAMPLE.EMP
WHERE WORKDEPT = 'D11'
;
---------+---------+---------+---------+---------+---------

---------+---------+---------+---------+---------+---------
+0.2468000000000000E+05                    (EMPLOYEE 150 row 5)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Here you can see that I require the 50th percentile, and as the data happens to have nine rows it would be the 5th row If you look at the data you will see that that is indeed the case.

Now, reversing the direction of the percentile, I want the 90th descending value. This row does not exist in the input, so the function computes the value that will probably best fit

DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
---------+---------+---------+---------+---------+---------
SELECT PERCENTILE_CONT(0,90)                              
       WITHIN GROUP (ORDER BY SALARY DESC)                
FROM SAMPLE.EMP                                           
WHERE WORKDEPT = 'D11'                                    
;                                                         
---------+---------+---------+---------+---------+---------
                                                           
---------+---------+---------+---------+---------+---------
+0.2001400000000000E+05     (value between first and second rows)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

PERCENTILE_DISC

Calculates the requested percentile as a discrete value. Use this if you do not want a calculated value based upon your input. E.g.:

SELECT PERCENTILE_DISC(0,50)             
       WITHIN GROUP (ORDER BY SALARY ASC )
FROM SAMPLE.EMP                          
WHERE WORKDEPT = 'D11'                   
;                                        
---------+---------+---------+---------+--
                                        
---------+---------+---------+---------+--
   24680.00                 (EMPLOYEE 150 row 5)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Here you can see that I still require the 50th percentile, and, as the data happens to still have nine rows, it would be the 5th row again.

Now, reversing the direction of the percentile, I want the 90th descending value. This row does not exist in the input so the function returns the nearest input value.

DSNE616I STATEMENT EXECUTION WAS SUCCESSFU
---------+---------+---------+---------+--
SELECT PERCENTILE_DISC(0,90)             
       WITHIN GROUP (ORDER BY SALARY DESC)
FROM SAMPLE.EMP                          
WHERE WORKDEPT = 'D11'                   
;                                        
---------+---------+---------+---------+--
                                       
---------+---------+---------+---------+--
   18270.00        (90th is not in the input data but this is the nearest)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

March 22, 2018January 14, 2021

2018-03 RTS RUNSTATS

„Breaking the rules is often fun, although we cannot condone it. But breaking the rules of Real Time Statistics (RTS) in Db2 12 can really land you in hot water.

We provide two queries that may give you a Get Out of Jail Free Card – at least as far as RTS is concerned…”

In this short newsletter, I wish to briefly discuss a change in behavior of the Real Time Statistics (RTS) tables that I have now seen at customer sites. I am not too happy about it! Before we start, a quick resumé of the RTS is required.

The Arrival of RTS

The Real Time Statistics were basically created by Dr Jim Teng in Db2 V7 and have always obeyed a few quite simple rules.

Real Time Statistics (RTS) rules:

Rule Number one
If in doubt set to NULL.
This might read a bit odd but the idea from Jim was that if any value is not 100% known then the column must be set to the NULL value.
Rule Number two
Externalize when asked, or by timer.
Rule Number three
No DEFINE NO data.
Rule Number four
Utilities will always update the RTS unless it cannot – see Rule Number one.
Rule Number five
SQL updates the counters unless they cannot – Think Mass Delete in a multi-table tablespace. Totalrows cannot be updated.

Rules are made to be broken

A couple of years into RTS usage and the clamor for changing various insert values got too big, so IBM enhanced the RTS so that on creation the REORGLASTTIME got set to the created timestamp and all the counter columns got set to zero instead of NULL. Now this was a good change as adding 1,000,000 to NULL you get NULL, whereas adding 1,000,000 to zero means you get 1,000,000 which is obviously better for working out whether or not you need a utility to run. The REORGLASTTIME was also accepted as when you create a nice empty object or you insert one row into an object, then by definition, it is in a perfectly reorganized state!

RTS rule one broken

Mass delete always caused problems, as mentioned earlier, so IBM then made a “half way” fix for the INDEX statistics in Db2 9 by zeroing the TOTALENTRIES when there is a Mass Delete as Db2 knows that the index is now empty.

(PM34730: RTS SYSINDEXSPACESTATS TOTALENTRIES INCORRECT AFTER MASS DELETE.)

Of course it did not update the TOTALROWS, as it didn’t “know” how many rows were mass deleted or truncated. This causes “drift” between the TS and IX statistics, but is only a minor annoyance.

Db2 12 breaks rule one

Now in Db2 12 Rule one has been broken again. Not that much of a break, but still not brilliant! What they have done, is change the STATSLASTTIME to now also be, nearly, the created TS – as if creating an object sets the runstats columns to valid data!

Naturally, the Db2 Catalog is still all -1’s. This makes generating utilities based on the RTS a little bit tricky, as you cannot trust the STATSLASTTIME to now ever actually be the time a RUNSTATS utility really ran – which was the *whole*, and only, purpose of this column! If only they had set the STATSLASTTIME to ‘0001-01-01-00.00.00.00.000000’ then all would be well…

Who woke the dogs up? (Or: Who let the dogs out? I.e. American jargon)

PI79234: SYSIBM.SYSTABLESPACESTATS.STATSINSERTS IS NOT UPDATED BY RTS EXTERNALIZATION SINCE OBJECT IS CREATED.
Is the APAR (PTF UI48494) that caused me all the trouble.

Where’s the beef?

Well, the problem is, if you are generating RUNSTATS based on the RTS – and who isn’t these days? – Then you are probably using this column. Now it *looks* like a RUNSTATS has been run at least once.

This is false and can lead to the not running of RUNSTATS when it most definitely should have been run.

Check all your home-grown RUNSTATS checkers. Remember DSNACCOX is also a little bit broken, as it uses these predicates:

(STATSLASTTIME IS NULL OR
 STATSLASTTIME<LOADRLASTTIME OR
 STATSLASTTIME<REORGLASTTIME OR
 STATSLASTTIME< latest PROFILE_UPDATE for the table space1 OR

Make sure you do not use the STATSLASTTIME, but instead join across to the SYSINDEXPART or SYSTABLEPART and pull out the STATSTIME column. This data is still ok and not fake news!

The scope of the problem

To find out the scope of the problem at your site, or even if you have this problem, you can run these two queries which simply list out all the objects that have, according to the Db2 Catalog, never been RUNSTATted, but according to the RTS they have been RUNSTATted:

Query 1

-- SELECT LIST OF TABLESPACES THAT ACCORDING TO DB2 CATALOG HAVE NOT 
-- BEEN RUNSTATTED BUT ACCORDING TO RTS HAVE BEEN.  
-- IGNORE DSNDB01 AS NO RUNSTATS FOR DIRECTORY OBJECTS 
-- IGNORE WORK DEFINED DATABASES AS NO RUNSTATS FOR THESE 
-- ONLY CHECK FOR TABLE TYPES H,M,P,R AND T 
-- DISTINCT IT FOR MULTI TABLE TABLESPACES 
SELECT DISTINCT A.DBNAME, A.NAME, A.PARTITION 
      ,A.REORGLASTTIME, A.STATSLASTTIME 
      ,B.STATSTIME, B.CREATEDTS 
FROM SYSIBM.SYSTABLESPACESTATS A 
   ,SYSIBM.SYSTABLEPART        B 
   ,SYSIBM.SYSDATABASE         C 
   ,SYSIBM.SYSTABLES           D 
WHERE B.STATSTIME = TIMESTAMP('0001-01-01-00.00.00.000000') 
  AND NOT COALESCE(A.STATSLASTTIME 
                   ,TIMESTAMP('0001-01-01-00.00.00.000000')) 
                  = TIMESTAMP('0001-01-01-00.00.00.000000') 
  AND NOT A.DBNAME = 'DSNDB01' 
  AND A.DBNAME = C.NAME 
  AND NOT C.TYPE = 'W' 
  AND A.DBNAME = B.DBNAME 
  AND A.NAME = B.TSNAME 
  AND A.PARTITION = B.PARTITION 
  AND A.DBNAME = D.DBNAME 
  AND A.NAME = D.TSNAME 
  AND D.TYPE IN ('H' , 'M' , 'P' , 'R' , 'T') 
ORDER BY 1 , 2 , 3 
;

Query 2

-- SELECT LIST OF INDEXSPACES THAT ACCORDING TO DB2 CATALOG HAVE NOT 
-- BEEN RUNSTATTED BUT ACCORDING TO RTS HAVE BEEN. 
-- IGNORE DSNDB01 AS NO RUNSTATS FOR DIRECTORY OBJECTS 
-- IGNORE HASH INDEXES AS NO RUNSTATS ALLOWED 
-- ONLY CHECK FOR TABLE TYPES H,M,P,R AND T 
SELECT A.DBNAME, A.INDEXSPACE, A.PARTITION 
      ,A.REORGLASTTIME, A.STATSLASTTIME 
      ,B.STATSTIME, B.CREATEDTS 
FROM SYSIBM.SYSINDEXSPACESTATS A 
    ,SYSIBM.SYSINDEXPART       B 
    ,SYSIBM.SYSINDEXES         C 
    ,SYSIBM.SYSTABLES          D 
WHERE B.STATSTIME = TIMESTAMP('0001-01-01-00.00.00.000000') 
  AND NOT COALESCE(A.STATSLASTTIME 
                  ,TIMESTAMP('0001-01-01-00.00.00.000000')) 
                  = TIMESTAMP('0001-01-01-00.00.00.000000') 
  AND NOT A.DBNAME = 'DSNDB01' 
  AND A.CREATOR = B.IXCREATOR 
  AND A.NAME = B.IXNAME 
  AND A.PARTITION = B.PARTITION 
  AND A.CREATOR = C.CREATOR 
  AND A.NAME = C.NAME 
  AND NOT C.HASH = 'Y' 
  AND C.TBCREATOR = D.CREATOR 
  AND C.TBNAME = D.NAME 
  AND D.TYPE IN ('H' , 'M' , 'P' , 'R' , 'T') 
ORDER BY 1 , 2 , 3 -
;

You can run these in any version of Db2 you like, but you will only get results from a Db2 12 system with this APAR applied and you have created an index or a tablespace. When I run them here on my Db2 10 and 11 systems I get zero rows back and on my test Db2 12 FL501 system I get 172 rows.

The good news is…

Our product for helping you all get “agile” in Db2 12, CDDC – ContinuousDelivery DeploymentCheck, detects and reports these data constellations as does our Statistics Health Check, of course.

I would be very interested to hear your opinions about this “little change in behavior”. Have you come across this at your site?

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell

February 20, 2018January 14, 2021

2018-02 Db2 Catalog Statistics revisited

Db2 Optimizer & access path selection for Db2 11 & Db2 12 :

Db2 Catalog data | Problematic default values | Correlations in the Db2 Catalog

It has been six years since the last update so I thought, after Terry Purcell’s excellent presentation in January 2018, it would be a good point in time to go over and rake the coals again—especially as a couple of things have changed for Db2 12!

Terry Purcell – Db2 12 for z/OS Optimizer and RUNSTATS improvements

Webcast replay Abstract

Are you a RUNSTATS Master?

Every now and again, I hold a little presentation called “Are you a RUNSTATS Master?” Actually these days it’s called “Db2 z/OS Lies, Damn Lies, and Statistics…” where I describe in detail, what the Db2 Optimizer uses for access path selection in relation to the Db2 Catalog data.

Surprised? You will be!

Personally, I am always surprised at how often people say “just that data?” or “is that it?” (the various other reasons for access path selection like CP speed, RID Pool size, Sort Pool size, Max data caching size, and, of course, the 80 bufferpools are also mentioned, but these have nothing to do with RUNSTATS).

So generally the answer is “Yes.” However, the permutations and combinations make the devil in the detail – The Db2 Optimizer’s algorithms are top secret, but the input data it uses is fully described in the documentation.

Just the facts ma’am

What I want to do, is show :
the Db2 Catalog data that is used
the default values that can cause surprising things to happen
the problem of correlations in the Db2 Catalog

Which data are used by the Db2 Optimizer and which are updated by RUNSTATS?

Here is a complete list of the eleven tables used by the Db2 Optimizer:

SYSIBM.SYSCOLDIST
SYSIBM.SYSCOLSTATS *
SYSIBM.SYSCOLUMNS
SYSIBM.SYSINDEXES
SYSIBM.SYSINDEXPART
SYSIBM.SYSKEYTARGETS (same as SYSCOLUMNS)
SYSIBM.SYSKEYTGTDIST (same as SYSCOLDIST)
SYSIBM.SYSROUTINES
SYSIBM.SYSTABLES
SYSIBM.SYSTABLESPACE
SYSIBM.SYSTABSTATS

* degree of parallelism only and, after APAR PK62804, also „sometimes“ used to bound filter factor estimates…

By the Columns

Now we can also list out all of the columns (obviously not including the key columns) which are used by the Db2 Optimizer:

SYSCOLDIST
CARDF, COLGROUPCOLNO, COLVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, QUANTILENO, STATSTIME

SYSCOLSTATS
COLCARD, HIGHKEY, LOWKEY

SYSCOLUMNS
COLCARDF, HIGH2KEY, LOW2KEY

SYSINDEXES
CLUSTERING*, CLUSTERRATIO, CLUSTERRATIOF, DATAREPEATFACTORF, FIRSTKEYCARDF, FULLKEYCARDF, NLEAF, NLEVELS

SYSINDEXPART
LIMITKEY*

SYSKEYTARGETS
CARDF, HIGH2KEY, LOW2KEY, STATS_FORMAT

SYSKEYTGTDIST
CARDF, KEYGROUPKEYNO, KEYVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMKEYS, QUANTILENO, STATSTIME

SYSROUTINES
CARDINALITY*, INITIAL_INSTS*, INITIAL_IOS*, INSTS_PER_INVOC*, IOS_PER_INVOC*

SYSTABLES
CARDF, EDPROC*, NPAGES, NPAGESF, PCTROWCOMP

SYSTABLESPACE
NACTIVE, NACTIVEF

SYSTABSTATS
CARD, CARDF, NPAGES

Notes: * Columns are not updated by RUNSTATS and _ Columns are not updatable at all. The column STATSTIME is used only if there are duplicates in the SYSCOLDIST table, and then the Db2 Optimizer will use the “newer” data that was probably inserted by a User.

Know your defaults

Which default column values trigger the Db2 Optimizer to use its own internal default values?

SYSCOLUMNS

If COLCARDF = -1 then use 25

SYSINDEXES

If CLUSTERRATIOF = 0 then use CLUSTERRATIO

If CLUSTERRATIO = 0 then use 0.95 if the index is CLUSTERing = ‘Y’ otherwise 0.00

DATAREPEATFACTORF = -1 then is ignored

If FIRSTKEYCARDF = -1 then use 25

If FULLKEYCARDF = -1 then use 25

If NLEAF = -1 then use 33 (Which is SYSTABLES.CARDF / 300)

If NLEVELS = -1 then use 2

SYSROUTINES

If CARDINALITY = -1 then use 10,000

If INITIAL_INSTS = -1 then use 40,000

If INITIAL_IOS = -1 then use 0

If INSTS_PER_INVOC = -1 then use 4,000

If IOS_PER_INVOC = -1 then use 0

SYSTABLES

If CARDF = -1 then use 10,000

If NPAGESF <= 0 then use NPAGES

If NPAGES = -1 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))

Here you must be very careful if using NPGTHRSH ZPARM as 501 is more than the default value in most shops. This is one of the little changes in Db2 12 where the value -1 is treated as -1 for the NPGTHRSH check.

SYSTABLESPACE

If NACTIVEF = 0 then use NACTIVE

If NACTIVE = 0 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))

SYSTABSTATS

If CARDF = -1 then use 10,000

If NPAGES = -1 then use 501 (Which is CEILING (1 + SYSTABSTATS.CARDF / 20))

So now you can see that non-floating point “old” data, may still be used today and this may cause access path headaches!

Never ever say never

Now to top it all, the data in the SYSCOLDIST and SYSKEYTGTDIST never gets simply “deleted”.

Well, actually, in Db2 12 you can now do a RUNSTATS xxx.yyy RESET ACCESSPATH to delete all SYSCOLDIST and SYSKEYTGTDIST data and set all other relevant columns to their respective defaults, but you must time this RUNSTATS very wisely! If you run it and then forget to do a normal full RUNSTATS…

Oldie but a goldie

Once the data are inserted, they stay there, until they are overwritten by new data, a RUNSTATS RESET, or the object is dropped. This all leads to some very old data in these two tables that can and do cause the Db2 Optimizer a ton of grief! One of the first things I do is to simply select the MIN(STATSTIME) from these tables just to see how old the data really is. Do it yourself and be surprised! I have seen sites with eight-year old data in the SYSCOLDIST and that cannot be good!

Correlate the world

Now onto correlations… There are lots of little tricks that DBAs use to “massage” access path choice. One of these, is to just set NLEVELS to 15 for a given index. Then lots of queries simply refuse to touch it as it would appear to be HUGE. Now, just simply updating columns can cause the Db2 Optimizer, in the best case, to ignore the updates or, perhaps, makes things even worse! So here is a list of the correlations (In other words, if you change xxx remember to change yyy and zzz as well):

Relationships exist among certain columns of certain tables:

Columns within SYSCOLUMNS
Columns in the tables SYSCOLUMNS and SYSINDEXES
Columns in the tables SYSCOLUMNS and SYSCOLDIST
Columns in the tables SYSCOLUMNS, SYSCOLDIST, and SYSINDEXES

If you plan to update some values, keep in mind the following correlations:

COLCARDF and FIRSTKEYCARDF/FULLKEYCARDF (They must be equal for the 1st column and full, if a single column index)
COLCARDF, LOW2KEY and HIGH2KEY. (For non-default COLCARDF LOW2KEY and HIGH2KEY key must be filled with data) and if the COLCARDF is 1 or 2 Db2 uses LOW2KEY and HIGH2KEY as domain statistics to generate frequencies.
CARDF in SYSCOLDIST. CARDF is related to COLCARDF and FIRSTKEYCARDF and FULLKEYCARDF. It must be at a minimum
A value between FIRSTKEYCARDF and FULLKEYCARDF if the index contains the same set of columns
A value between MAX(colcardf of each col) and the product of all the columns COLCARDFs in the group
CARDF in SYSTABLES. CARDF must be equal or larger than any other cardinalities, such as COLCARDF, FIRSTKEYCARDF, FULLKEYCARDF, and CARDF in SYSCOLDIST
FREQUENCYF and COLCARDF or CARDF. The number of frequencies collected must be less than or equal to COLCARDF for the column or CARDF for the column group
FREQUENCYF. The sum of frequencies collected for a column or column group must be less than or equal to 1

New in Db2 11

In Db2 11, the table SYSSTATFEEDBACK was introduced giving us the first chance to see what the optimizer thinks is missing. This is truly awesome, as then we can tailor our RUNSTATS to generate exactly what the optimizer needs to really validate and generate a good, stable access path. Of course, you should be a little bit careful with this data as too much of a good thing can be bad for you!

New in Db2 12

(Not just the lowercase b!)

In Db2 12, the SYSSTATFEEDBACK was made even more interesting by now externalizing the required RUNSTATS options *directly* into the already existing RUNSTATS profile or, indeed, actually creating a RUNSTATS profile for you.

I think that is really dangerous, as then you could easily flood your system with bogus stats for end user QMF/SPUFI queries that were run “by accident,” or so called “boss queries” where someone with *no* idea of SQL clicks together a highly complex and badly written SQL before letting it run for a weekend. Naturally the SQL gets rewritten by a helpful ever present DBA, but the statistics recommendations have now landed in the profile and will be updated and kept from this point on.

My personal recommendation is to switch off this feature as it is sadly *on* by default!

Here are the ZPARMs of interest

ZPARM STATFDBK_SCOPE set to ALL by default.
ZPARM STATFDBK_PROFILE set to YES by default.
Plus, in table SYSIBM.SYSTABLES column STATS_FEEDBACK is set to Y by default.

Out-of-the box it starts automatically creating (for TYPE=’C’ with NUMCOLS > 1 and TYPE=’F’ or ‘H‘) profiles and updating existing profiles…Here you must manually check the size of your profiles every now and again just to make sure everything is ok!

One other new thing in Db2 12, is that XML columns can get statistics now to help XMLEXISTS get a better access path.

and finally

Do not forget that our little Freeware tool StatisticsHealthCheck will find all bad correlations, old data and badly updated data for you and it is FREE!

So I hope this little round-up of Db2 Catalog Statistics data was interesting, and, as usual, if you have any comments or questions, then please, feel free to mail me!

TTFN

Roy Boxwell

November 27, 2017January 14, 2021

2017-11 Db2 APAR list: An APAR a day keeps the bugs at bay

Db2 12 APAR – time saving list

With all the talk about “agile” going on, and referring to one of my older 2017-09 newsletters* on APARS, I think it is time to tell you all about another little service that our company offers.

(completely free, simple and no marketing spam)

Db2 12 Agile & APAR previous newsletter:

2017-09: Db2 12 SQL Access path: Death by APAR : How many APARs really can affect access paths?

APAR Database

The APAR database can be accessed by anyone with an IBM Userid and you can merrily search to see if the problem that you have hit is already found and fixed, or a fix is in the works.

The problem is: What about the bugs that you do not *know* you have hit?

Get someone else to do the donkey work

The answer is to get someone else to do all the research for you, and deliver the answer by e-mail every month for all current releases of Db2 going back two years.

Who is that donkey? You will never guess… 😉

Three is the key

There are three Excel spread sheets here.

SQL Performance
The first is RTS. This lists any and all APARs to do with the Real-Time Statistics tables. If you rely on these tables to decide when to run REORG, COPY and RUNSTATS, then you want to make sure that they are being correctly updated, don’t you?
SQL Access Path
The second is RUNSTATS. RUNSTATS is a critical utility program and contains bugs like any other non-trivial program. I count RUNSTATS bugs as *always* a personal HIPER. I rely on statistics and so does the Db2 Optimizer. If there is bad data here, then your access paths have no chance!
Performance PTFs
Finally, and the biggest list, is those APARs that have anything to do with SQL Performance and SQL Access paths (If not already in the RUNSTATS list of course!) Performance PTFs are pretty important!

Going hyper over HIPER

All the APARs have their related PTF. A handy little HIPER column tells you whether or not this is a really important fix. As I mentioned, I treat the RUNSTATS ones as personal HIPERS. Also added is a PE indicator when a PTF goes bad on you (PTF in Error) so you can see if you introduced a problem by correcting another.

One Excel Example: The SQL Performance spread sheet

Db2 12 APAR list free for Db2 z/OS: SQL Performance (RTS) - Performance PTFs - SQL Access Path (RUNSTATS) -

This is from the SQL Performance spread sheet and you can see how it looks

The same table in HTML:

APAR	CLOSED	STATUS	Db2 10	Db2 11	Db2 12	HYPER	Description
PI85305	2017-11-01	Closed	N/A	UI51606	UI51601		INEFFICIENT INDEX CHOSEN WHEN INDEX CAN DO INDEX SKIPPING…
PI85418	2017-19-25	Modified	UI50098	UI50099	UI50582		PREPARE TAKES LONG TIME AND HIGH CPU IF THE QUERY CONTAINS MA…
PI85463		New & Closed	N/A	N/A	UI51342		TABLESPACE SCAN INSTEAD OF INDEX ACCESS – WITH INDEX HAVIN…
…	…	…	…	…	…	…	…

Interested in making your life a little bit easier and safer?

If you are interested in Db2 APARs and you have *no* time to go and scrabble around the internet trying to find out which APARs are needed, worthwhile, or dangerous, then

just email our technical support techsupport@seg.de and ask to be added to the APAR Mailing list.
It is completely free, you will not get marketing spam and it really is as simple as that!

Get AGILE now!

With these handy lists, you can quickly and easily review the state of your Db2, and react quickly and in a timely manner when you have to.

As usual, if you have any comments or queries please feel free to drop me a line!

TTFN

Roy Boxwell

November 13, 2017January 14, 2021

2017-10 Db2 log size: How big is your LOG?

How to validate that your Db2 LOGs are OK ?

Sizing of LOGs changed quite a bit in Db2 11 and Db2 12, so I thought a little recap about the correct sizing and ZPARMS was in order this month.

– Db2 Active logs – how many?

The perennial favourite “just enough” is of course the correct answer! Naturally you want to make sure that about 24 hours’ worth of data is available on your active logs as reading the archives, especially if they have been migrated to tape, is a real performance killer during restart and rollback processing!

The range of active logs you can have is from 2 to 93 with 3 being the default.

– Db2 Archive logs – how many?

Another perennial favourite “just enough” is of course the correct answer! (See more details later in this newsletter.)

The range of archive logs you can have is from 10 to 10,000, with 10,000 being the default.

Where am I?

How can you find out what your system is doing? Simply run the DSNJU004 BSDS stand-alone print routine to look at the current data. Here is an example JCL to do it all for you:

//BSDS     EXEC PGM=DSNJU004
//STEPLIB  DD DSN=<your.exit.lib>,DISP=SHR
//         DD DSN=<your.load.lib>,DISP=SHR
//SYSUT1   DD DSN=<your.BSDS 01>,DISP=SHR
//SYSUT2   DD DSN=<your.BSDS 02>,DISP=SHR
//SYSPRINT DD SYSOUT=*

And here is the output from my little Db2 12 subsystem, cut down to show just the bits I want to use!

ACTIVE LOG COPY 1 DATA SETS
START RBA/TIME       END RBA/TIME          DATE/LTIME DATA SET INFORMATION
-------------------- --------------------- ---------- --------------------
000000000013516C0000  0000000000135387FFFF  2016.068 DSN=DC10.LOGCOPY1.DS01
2017.307  12:32:08.6  2017.308  20:54:21.1  17:27    STATUS=REUSABLE
00000000001353880000  00000000001355A3FFFF  2016.068 DSN=DC10.LOGCOPY1.DS02
2017.308  20:54:21.1  2017.310  07:31:00.0  17:27    STATUS=REUSABLE
00000000001355A40000  00000000001357BFFFFF  2016.068 DSN=DC10.LOGCOPY1.DS03
2017.310  07:31:00.0  ........  ..........  17:27    STATUS=NOTREUSABLE

ARCHIVE LOG COPY 1 DATA SETS
START RBA/TIME        END RBA/TIME         DATE/LTIME DATA SET INFORMATION
--------------------  -------------------- ---------- --------------------
00000000000000000000  000000000000021BFFFF  2016.068  DSN=DC10.ARCHLOG1.D16068.T1743011.A0000001
2016.068  18:37:21.8  2016.068  18:43:00.9  17:43     VOL=SE121D UNIT=SYSALLDA
                                                                 CATALOGUED
000000000000021C0000  0000000000000437FFFF  2016.068  DSN=DC10.ARCHLOG1.D16068.T1743391.A0000002
2016.068  18:43:00.9  2016.068  18:43:39.1  17:43     VOL=SE121D UNIT=SYSALLD
                                                                 CATALOGUED
00000000000004380000  0000000000000653FFFF  2016.069  DSN=DC10.ARCHLOG1.D16069.T0756574.A0000003
2016.068  18:43:39.1  2016.069  08:56:57.3  7:56      VOL=SE121D UNIT=SYSALLD
                                                                 CATALOGUED

From this you can see that I have three active logs that last about one to two days, and the complete archives from the first install are still available – that will probably not be true in your case!

Now, much later in the output, you can see where I am today:

000000000013516C0000  0000000000135387FFFF  2017.308   
DSN=DC10.ARCHLOG1.D17308.T1954212.A0002638
2017.307  12:32:08.6   2017.308  20:54:21.1 19:54 VOL=SE123 UNIT=SYSALLD
                                                            CATALOGUED
00000000001353880000 00000000001355A3FFFF   2017.310
DSN=DC10.ARCHLOG1.D17310.T0631002.A0002639
2017.308 20:54:21.1    2017.310  07:31:00.0 6:31 VOL=SE123F UNIT=SYSALLD
                                                            CATALOGUED

Then the COPY 2 data where you can see that I do not even archive the COPY 2 logs. Again this should not be the case in your shop.

ACTIVE LOG COPY 2 DATA SETS
START RBA/TIME        END RBA/TIME         DATE/LTIME  DATA SET INFORMATION
--------------------  -------------------- ----------  -------------------- 
000000000013516C0000  0000000000135387FFFF  2016.068  DSN=DC10.LOGCOPY2.DS01
2017.307  12:32:08.6  2017.308  20:54:21.1  17:27     STATUS=REUSABLE
00000000001353880000  00000000001355A3FFFF  2016.068  DSN=DC10.LOGCOPY2.DS02
2017.308  20:54:21.1  2017.310  07:31:00.0  17:27     STATUS=REUSABLE
00000000001355A40000  00000000001357BFFFFF  2016.068  DSN=DC10.LOGCOPY2.DS03
2017.310  07:31:00.0    ........  ........  17:27     STATUS=NOTREUSABLE
ARCHIVE LOG COPY 2 DATA SETS
NO ARCHIVE DATA SETS DEFINED FOR THIS COPY

From this data you can work out a few simple things:

1) How often do you fill an active log?

2) How many archive logs do you have in the BSDS, and do they actually still exist?

You can now also check that all your active logs have the same dataset characteristics (Primary Quantity, Secondary, etc.).
If you are filling up more than one log per 10 minutes, you may want to check the size of your logs. The size of the active log can be set to an absolute physical maximum of 5,825 cylinders (87,375 TRKs) for non-EAV and 5,817 cylinders (87,255 TRKs) for EAV. Why the strange difference? Because EAV can only allocate in blocks of 21 cylinders due to the way the EAV volumes work. (See later for more information about using more than 65,535 TRKs for an archive log though!)
Space allocation:
The initial allocation of space is done through the Install CLIST and the tailored sample JCL in the DSNTIJIN member. That is where you can override whatever values are generated by Db2 into “better” ones for your system. In the bad old days the allocation was in RECORDS and today I see KILOBYTES most of the time – which is still a bit of a headache when you want to allocate in CYLINDERS but it is all doable, as the DEFINE CLUSTER syntax allows these size definitions:

CYLINDERS(primary)
KILOBYTES(primary)
MEGABYTES(primary)
RECORDS(primary)
TRACKS(primary)

Log ZPARM of Interest

For the log, there is really only one ZPARM of any real importance when it comes to performance and that is OUTBUFF. Set this to be the biggest you can “afford” in terms of real memory, as it takes the memory as fixed! 400,000 KB is the current max, and this is the IBM recommended value. Reading from memory for rollback is a lot quicker than from active log, which is a lot faster than reading the archive log.

From the documentation:

The larger the output buffer, the more likely that a log read request can return the required log records without needing I/O operations. This is a particularly important consideration if the DB2 subsystem is to be used with a data replication product that uses IFCID 306. In DB2 12, log buffers are permanently page-fixed. When you estimate real storage usage, you must use the entire size that you specify for the OUTBUFF parameter. To avoid page-fixing more storage than necessary, carefully choose the setting for OUTBUFF.

Choose the largest size that your system can tolerate for the log output buffer. Because the pages for the log output buffer are permanently fixed in real storage, choose the largest size that you can dedicate in real storage. A larger size for the log output buffer might decrease the number of forced I/O operations that occur because additional buffers are unavailable, and can also reduce the number of wait conditions.

Archive logs are different

The size of the archive logs and how many are recorded in the BSDS are input parameters on installation panel DSNTIPA:

DSNTIPA INSTALL DB2 - ARCHIVE LOG DATA SET PARAMETERS
===> _
Enter data below:
1 PRIMARY QUANTITY ===> 125    Primary space allocation in cylinders
2 SECONDARY QTY    ===> 15     Secondary space allocation in cylinders
3 DEVICE TYPE 1    ===> TAPE   Unit name for COPY1 archive logs
4 DEVICE TYPE 2    ===>        Unit name for COPY2 archive logs
5 BLOCK SIZE       ===> 24576  Rounded up to 4096 multiple
6 READ TAPE UNITS  ===> 2      Number of allocated read tape units
7 DEALLOC PERIOD   ===> 0      Time interval to deallocate tape units
8 RECORDING MAX    ===> 10000  Number of data sets recorded in BSDS

In Db2 10 and Db2 11 you also had an allocation unit (ALCUNIT ZPARM) input field with valid values of BLK, TRK or CYL with a default of BLK.

The PRIQTY now has a range from 1 to 4,369 and a default of 125. (Remember the absolute physical maximum earlier of 5,825/5,817 CYLs? That is why you can/should edit the DSNTIJIN member!)

If migrating from Db2 11, the ALCUNIT value is used to convert from BLK or TRK to CYLs in Db2 12. If not yet in Db2 12 then the ALCUNIT tells you what the allocation unit is for the PRIQTY in either BLK, TRK or CYL with a range of 1 to 999,999.

Some further info about the size of the archive logs here. Remember that they are flat files *not* VSAM Linear Datasets:

By default, DFSMS Direct Access Device Space Management (DADSM) limits the space allocation on a single volume to less than 65535 tracks. Therefore, if the archive log data set size can be greater than or equal to 65535 tracks, you need to specify a primary space quantity of less than 65535 tracks. This action forces the archive log data set to extend to a second volume.

Alternatively, the archive log data sets can be allocated by a DFSMS data class that has a DSNMTYP (data set name type) setting of LARGE or EXT. In this case, you can specify a primary space quantity larger than 65535 tracks without extending to a second volume. LARGE indicates that data sets in the data class are to be allocated in large physical sequential format. EXT indicates that data sets in the data class are to be allocated in extended physical sequential format. A setting of EXT is recommended, and it is required for striping of data sets. If you specify EXT, also set the IFEXT (if extended) parameter to R (required) rather than P (preferred).

The SECQTY has a range from 1 to 4,369 and a default of 15. Same rules for ALCUNIT and switching to CYL in Db2 12 apply here as per the PRIQTY.

The recording max just tells you how many archive log dataset names the BSDS can remember. The documentation mentions two very important facts about this number:

When this number is exceeded, recording resumes at the beginning of the BSDS.
You must create image copies of all DB2 objects, probably several times, before the archive log data sets are discarded. If you fail to retain an adequate number of archive log data sets for all the image copies, you might need to cold start or reinstall DB2. In both cases, data is lost.

So there are two top tips for you! Check if you have had a wrap around and check if you have datasets hanging around that Db2 can no longer allocate and use.

Striped or not?

Finally, for logs, the striping of the active logs is nowadays not recommended, as devices are so much faster, but striping and/or compression of the archive logs is still recommended if space is an issue.

From the Db2 Documentation:

In most cases, do not stripe active log data sets. You can use DFSMS to the stripe the logs, but striping is generally unnecessary with the latest devices. Striping increases the number of I/Os, which can increase CPU time and lead to potentially greater DB2 commit times. Striping might improve the performance of batch insert jobs, but it might also harm the performance of online transaction processing. Striping is especially risky for performance if you replicate the logs over long distances.
Consider striping and compressing archive log data sets by using DFSMS. Doing so might speed up the time to offload the logs and the time to recover by using archive logs. However, the performance of DFSMS striping and compression depends on the z/OS release and the types of hardware that you use.

Check your CHECKPOINTS

Getting back to the output of the DSNJU004 job, a few more things like conditional restart recs etc. are output and then the Checkpoint Queue:

                    CHECKPOINT QUEUE                
                08:00:31 NOVEMBER 06, 2017          
TIME OF CHECKPOINT       07:31:03 NOVEMBER 06, 2017 
BEGIN CHECKPOINT RBA            00000000001355B750DC
END CHECKPOINT RBA              00000000001355B81B18
END CHECKPOINT STCK             00D365BB89179A529200
TIME OF CHECKPOINT       20:54:23 NOVEMBER 04, 2017 
BEGIN CHECKPOINT RBA            000000000013538813C5
END CHECKPOINT RBA              00000000001353889D76
END CHECKPOINT STCK             00D363EB5D1777CFB000
TIME OF CHECKPOINT       12:32:09 NOVEMBER 03, 2017 
BEGIN CHECKPOINT RBA            00000000001351723326
END CHECKPOINT RBA              0000000000135172D899
END CHECKPOINT STCK             00D362393D4E6F748C00
TIME OF CHECKPOINT       07:30:11 NOVEMBER 03, 2017 
BEGIN CHECKPOINT RBA            0000000000134F5915DD
END CHECKPOINT RBA              0000000000134F59C445
END CHECKPOINT STCK             00D361F5BEB5A941CA00
TIME OF CHECKPOINT       07:42:59 NOVEMBER 01, 2017 
BEGIN CHECKPOINT RBA            0000000000134D34EB53
END CHECKPOINT RBA              0000000000134D357022
END CHECKPOINT STCK             00D35F74E054AE870800

You can see how busy my Db2 system is, anyway your data will be a lot different I am sure!

Checkpoint ZPARMS of interest

CHKTYPE

LOGRECS/MINUTES/BOTH MINUTES is default and is what I recommend. This parameter can still have the old value “SINGLE” which then respects the CHKFREQ value as if it were minutes or log records depending on the size of the Number. Change this to Minutes!

CHKFREQ

If LOGRECS how many log records before a checkpoint 1,000 – 16,000,000
If MINUTES how many minutes before a checkpoint 1 – 60

CHKLOGR

If BOTH how many log records before a checkpoint 1,000 – 99,999,999

CHKMINS

If BOTH how many minutes before doing a Checkpoint. 1 – 1439

IBM, and I, recommend a checkpoint every five minutes or so to get optimum rollback and log usage. Naturally your mileage may vary. Also remember that Db2 writes a checkpoint at active log change as well, so try and guarantee that at least ten checkpoints’ “worth” of data fits onto one log dataset.

One extra note arrived in Db2 11:

If the CHECKPOINT TYPE field is set to BOTH, DB2 prevents log checkpoints from being taken too frequently and degrading performance. Log checkpoints are scheduled, but not necessarily immediately taken, when the number of minutes that passes reaches the value of CHKMINS.

Following all this advice should mean that your logs settle down and your system runs a little bit smoother!

As usual, if you have any comments or queries please feel free to drop me a line!

TTFN

Roy Boxwell

Now that the Db2 Optimizer has gone all Artificial Intelligent (AI from now on) on us, I thought it would be interesting to review where we are in this brave new world!

Keeps you out of a BIND

Apples and Oranges

ML and AI – The new buzzwords

Do you “know” your data?

Adversarial Images and the fun you can have with them…

Fooling Deep Learning “

Disappearing Humans

Would you sit in a driverless car?

Room full of CTO’s

Flying is strangely different

They will come…

AI everywhere

I had to do a real double take and think “How on earth could AI help here?”

Don’t get me wrong

The money being spent is also amazing, in 2019 it is estimated, by Syncsort,

that the AI budget will be $35.8 Billion 44% more than in 2018.

What do you think?

Terrible quote usage I know… But now, with the all new and Agile Db2 12 being picked up and used globally, I think it is time to review a few things that companies and individuals must do, or at least plan, in this “Brave New World” of ours.

To wrap up at the end, I will delve into the idea of using ZOWE to help you test!

The Db2 12 Testing Checklist

1. Do not be afraid of the clones!

A real clone or a partial clone?

2. Collect as much workload as you can

3. Dynamic & Static SQL: Store it cleverly away

4. Gather your friends close but your enemies closer – IFCID376

But I don’t get the Dynamic SQL!

5. Access Paths a go go

6. Execute the SQLs

Well done, you’ve made it!

7 – Db2 12 test-Review:

What does this mean for me?

” Continuous testing in a continuous development world “…

Reducing the time to test Enabling you to actually go forward with a good, clear conscience!

8 – ZOWE – Open Source zUI – to the rescue?

Zowe at SEGUS and

SOFTWARE ENGINEERING?

We will be using ZOWE for two things:

1 – For the Installation and Maintenance of our software at customer sites

2 – For using ZOWE as the front end for the users of our products

A quick history of EXPLAIN and a new standard Db2 EXPLAIN Query to bring out the best and most underused columns.

PLAN_TABLE

DSN_DETCOST_TABLE

DSN_FILTER_TABLE

DSN_PREDICAT_TABLE

New Standard Explain SQL

Global Temporary Table usage in Db2 z/OS

Learn, through examples, why CGTT (CREATEd Global Temporary Tables) are possibly better than DGTT (DECLAREd Global Temporary Tables) and how to use them to speed up performance.

Global – but not as you know it!

Global Temporary Table in Db2 12 – All new and improved

CGTT why?

Belief is ok, test is trust!

No EDMPOOL here!

Impressive!

ROT still correct

Db2 10 improvement

Death by DGTT

Final tip!

Even more work…

Part six of my – everlasting – walk through of new Aggregate and Scalar functions. Following with : HASH & WRAP

Previous “SOUNDEX” Newsletters

SCALARS

HASH it

Then I tried MD5

Oops! What is that?

Let’s WRAP it up

The output looks like this:

Part five of my – everlasting – walk through of new Aggregate and Scalar functions. This time I will start with the new AGGREGATE functions introduced in Db2 12.

LISTAGG

Db2 Catalog Docu

PERCENTILES

PERCENTILE_CONT

PERCENTILE_DISC

„Breaking the rules is often fun, although we cannot condone it. But breaking the rules of Real Time Statistics (RTS) in Db2 12 can really land you in hot water.

We provide two queries that may give you a Get Out of Jail Free Card – at least as far as RTS is concerned…”

The Arrival of RTS

Real Time Statistics (RTS) rules:

Rules are made to be broken

RTS rule one broken

Db2 12 breaks rule one

Reducing the time to test

Enabling you to actually go forward with a good, clear conscience!

Part six of my – everlasting – walk through of new Aggregate and Scalar functions.
Following with : HASH & WRAP