2018-07 – SOUNDEX and other “cool” features – Part five All new for Db2 12

Part five of my – everlasting – walk through of new Aggregate and Scalar functions. This time I will start with the new AGGREGATE functions introduced in Db2 12.

LISTAGG

This is mainly of interest due to the fact that it was the very first Agile Function Level feature in Db2. You must be at FL501 to use this in SQL otherwise you get a

DSNT408I SQLCODE = -4700 ATTEMPT TO USE NEW FUNCTION BEFORE FUNCTION LEVEL IS ACTIVATED

Or a

DSNT408I SQLCODE = -4743, ERROR:  ATTEMPT TO USE A FUNCTION WHEN THE APPLICATION COMPATIBILITY SETTING IS SET FOR A PREVIOUS LEVEL

What this function does, is basically a horizontal recursive join which can be sorted in a different sequence. Here’s the SQL example:

SELECT WORKDEPT,                                       
LISTAGG(LASTNAME, ', ') WITHIN GROUP(ORDER BY LASTNAME)
AS EMPLOYEES                                           
FROM SAMPLE.EMP                                        
GROUP BY WORKDEPT;

Which returns:

---------+---------+---------+---------+---------+---------+---------+-----+
WORKDEPT  EMPLOYEES                                                        
---------+---------+---------+---------+---------+---------+---------+-----+
A00       HAAS, LUCCHESI, O'CONNELL 
B01       THOMPSON 
C01       KWAN, NICHOLLS, QUINTANA    
D11       ADAMSON, BROWN, JONES, LUTZ, PIANKA, SCOUTTEN, STERN, WALKER,
          YOSHIMURA
D21       JEFFERSON, JOHNSON, MARINO, PEREZ, PULASKI, SMITH
E01       GEYER  
E11       HENDERSON, PARKER, SCHNEIDER, SETRIGHT, SMITH  
E21       GOUNOT, LEE, MEHTA, SPENSER 
DSNE610I NUMBER OF ROWS DISPLAYED IS 8

So you can see it adds the LASTNAME and a comma repeatedly for each WORKDEPT value.

Db2 Catalog Docu

This function is actually very handy for one of the things I create for our programmers here in the labs. It is always a good idea to “know the catalog” and so I created a little word document years ago with all the tablespaces, tables, indexes and index columns with sort order so that they did not have to look in three separate books to find the data. Here is the SQL for listing out the tables with indexes:

SELECT SUBSTR(A.DBNAME, 1, 8) AS DB      
      ,SUBSTR(A.NAME, 1, 8 ) AS TS       
      ,SUBSTR(B.NAME, 1, 18) AS TABLE    
      ,SUBSTR(C.NAME, 1, 8 ) AS INDEX    
      ,B.TYPE                            
      ,CASE C.UNIQUERULE                 
        WHEN 'D' THEN '-'                
        ELSE 'Y'                         
       END AS U                          
      ,D.COLSEQ                          
      ,SUBSTR(D.COLNAME, 1, 18) AS COL   
      ,CASE D.ORDERING                   
        WHEN ' ' THEN 'I'                
        WHEN 'A' THEN '-'                
        WHEN 'D' THEN 'D'                
        WHEN 'R' THEN 'R'                
       END AS ORDER                      
FROM SYSIBM.SYSTABLESPACE A              
    ,SYSIBM.SYSTABLES     B              
    ,SYSIBM.SYSINDEXES    C              
    ,SYSIBM.SYSKEYS       D              
WHERE A.DBNAME IN ('DSNDB01','DSNDB06')  
  AND A.DBNAME  = B.DBNAME               
  AND A.NAME    = B.TSNAME               
  AND B.CREATOR = 'SYSIBM'               
  AND B.NAME    = C.TBNAME               
  AND B.CREATOR = C.TBCREATOR            
  AND C.NAME    = D.IXNAME                
  AND C.CREATOR = D.IXCREATOR            
ORDER BY DB, TS , TABLE , INDEX, D.COLSEQ ;

It returns 855 rows of data on my test Db2 12 Fl501 system:

---------+-------+-------+---+-------+------+--------+------------------+--
DB        TS      TABLE         INDEX     TYPE  U  COLSEQ  COL       ORDER
---------+-------+-------+---+-------+------+--------+------------------+--
DSNDB01   DBD01   DBDR          DSNDB01X   T   Y      1   DBID         -
DSNDB01   DBD01   DBDR          DSNDB01X   T   Y      2   SECTION      -
DSNDB01   SCT02   SCTR          DSNSCT02   T   Y      1   SCTNAME      -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      1   SPTLOCID     -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      2   SPTCOLID     -
DSNDB01   SPT01   SPTR          DSNSPT01   T   Y      3   SPTNAME      -

And ending with

DSNDB06  SYSXML   SYSXMLRELS    DSNXRX02   T   -      1  XMLTBOWNER    -
DSNDB06  SYSXML   SYSXMLRELS    DSNXRX02   T   -      2  XMLTBNAME     -
DSNDB06  SYSXML   SYSXMLSTRINGS DSNXSX01   T   Y      1  STRINGID      -
DSNDB06  SYSXML   SYSXMLSTRINGS DSNXSX02   T   Y      1  STRING        -
DSNE610I NUMBER OF ROWS DISPLAYED IS 855

Then I used my amazing WORD skills to move all the COL columns after each other to get a DOC file that looks like this:

DSNDB0n.     SYSIBM.       SYSIBM.      U  INDEX FIELD
DBD01        DBDR          DSNDB01X     Y  DBID.SECTION
SCT02        SCTR          DSNSCT02     Y  SCTNAME
SPT01        SPTR          DSNSPT01     Y  SPTLOCID.SPTCOLID.SPTNAME.etc.

Which, of course, the programmers loved, as it enabled them to quickly find which columns, in which sort order, (I convert the ORDER D COL column to be in bold by the way), are available on the catalog and directory tables. I started this with DB2 V5…

Now with LISTAGG the query looks like:

SELECT SUBSTR(A.DBNAME, 1, 8)                                 
      ,SUBSTR(A.NAME, 1, 8 )                                  
      ,SUBSTR(B.NAME, 1, 18)                                  
      ,SUBSTR(C.NAME, 1, 8 )                                  
      ,B.TYPE AS T                                            
      ,CASE C.UNIQUERULE                                      
        WHEN 'D' THEN '-'                                     
        ELSE 'Y'                                              
       END AS U                                               
      ,LISTAGG( CHAR(SUBSTR(D.COLNAME, 1, 18) CONCAT           
                     CASE D.ORDERING                          
                     WHEN ' ' THEN 'I'                        
                     WHEN 'A' THEN '-'                        
                     WHEN 'D' THEN 'D'                        
                     WHEN 'R' THEN 'R'                        
                     END                                      
                    )                                         
               , ' ') WITHIN GROUP (ORDER BY D.COLSEQ ASC)    
FROM SYSIBM.SYSTABLESPACE A                                   
    ,SYSIBM.SYSTABLES     B                                   
    ,SYSIBM.SYSINDEXES    C                                   
    ,SYSIBM.SYSKEYS       D                                   
WHERE A.DBNAME IN ('DSNDB01','DSNDB06')                       
  AND A.DBNAME  = B.DBNAME                                    
  AND A.NAME    = B.TSNAME                                    
  AND B.CREATOR = 'SYSIBM'                                    
  AND B.NAME    = C.TBNAME                                    
  AND B.CREATOR = C.TBCREATOR                                 
  AND C.NAME    = D.IXNAME
  AND C.CREATOR = D.IXCREATOR                                 
GROUP BY A.DBNAME, A.NAME, B.NAME, C.NAME, B.TYPE, C.UNIQUERULE
;

And the output:

---------+---------+-------+-----------+---------------------------+---
                                            T  U 
---------+---------+-------+-----------+---------------------------+---
DSNDB01   DBD01   DBDR            DSNDB01X  T  Y  DBID     - SECTION  -     
DSNDB01   SCT02   SCTR            DSNSCT02  T  Y  SCTNAME  -  
DSNDB01   SPT01   SPTR            DSNSPT01  T  Y  SPTLOCID - SPTCOLID - SPTNAME

And ends

DSNDB06   SYSXML  SYSXMLRELS      DSNXRX02  T  -  XMLTBOWNER - XMLTBNAME
DSNDB06   SYSXML  SYSXMLSTRINGS   DSNXSX01  T  Y  STRINGID   -         
DSNDB06   SYSXML  SYSXMLSTRINGS   DSNXSX02  T  Y  STRING     -         
DSNE610I NUMBER OF ROWS DISPLAYED IS 287

Which is a lot less data for my poor little fingers to work with!

Of course it is not perfect… the really big GOTCHA, is ORDER BY is *not* allowed!

PERCENTILES

For the next couple of examples, I will be using the table SAMPLE.EMP that contains these rows and columns of interest:

---------+---------+---------+---------+---------+-----
EMPNO   WORKDEPT       SALARY        BONUS         COMM
---------+---------+---------+---------+---------+-----
000210  D11          18270.00       400.00      1462.00
000190  D11          20450.00       400.00      1636.00
000180  D11          21340.00       500.00      1707.00
000160  D11          22250.00       400.00      1780.00
000170  D11          24680.00       500.00      1974.00
000150  D11          25280.00       500.00      2022.00
000200  D11          27740.00       600.00      2217.00
000220  D11          29840.00       600.00      2387.00
000060  D11          32250.00       600.00      2580.
DSNE610I NUMBER OF ROWS DISPLAYED IS 9

PERCENTILE_CONT

Calculates the requested percentile as a continuous value. Use this if you want a calculated value based upon your input. E.g.:

SELECT PERCENTILE_CONT(0,50)
WITHIN GROUP (ORDER BY SALARY ASC )
FROM SAMPLE.EMP
WHERE WORKDEPT = 'D11'
;
---------+---------+---------+---------+---------+---------

---------+---------+---------+---------+---------+---------
+0.2468000000000000E+05                    (EMPLOYEE 150 row 5)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Here you can see that I require the 50th percentile, and as the data happens to have nine rows it would be the 5th row If you look at the data you will see that that is indeed the case.

Now, reversing the direction of the percentile, I want the 90th descending value. This row does not exist in the input, so the function computes the value that will probably best fit

DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
---------+---------+---------+---------+---------+---------
SELECT PERCENTILE_CONT(0,90)                              
       WITHIN GROUP (ORDER BY SALARY DESC)                
FROM SAMPLE.EMP                                           
WHERE WORKDEPT = 'D11'                                    
;                                                         
---------+---------+---------+---------+---------+---------
                                                           
---------+---------+---------+---------+---------+---------
+0.2001400000000000E+05     (value between first and second rows)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

PERCENTILE_DISC

Calculates the requested percentile as a discrete value. Use this if you do not want a calculated value based upon your input. E.g.:

SELECT PERCENTILE_DISC(0,50)             
       WITHIN GROUP (ORDER BY SALARY ASC )
FROM SAMPLE.EMP                          
WHERE WORKDEPT = 'D11'                   
;                                        
---------+---------+---------+---------+--
                                        
---------+---------+---------+---------+--
   24680.00                 (EMPLOYEE 150 row 5)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Here you can see that I still require the 50th percentile, and, as the data happens to still have nine rows, it would be the 5th row again.

Now, reversing the direction of the percentile, I want the 90th descending value. This row does not exist in the input so the function returns the nearest input value.

DSNE616I STATEMENT EXECUTION WAS SUCCESSFU
---------+---------+---------+---------+--
SELECT PERCENTILE_DISC(0,90)             
       WITHIN GROUP (ORDER BY SALARY DESC)
FROM SAMPLE.EMP                          
WHERE WORKDEPT = 'D11'                   
;                                        
---------+---------+---------+---------+--
                                       
---------+---------+---------+---------+--
   18270.00        (90th is not in the input data but this is the nearest)
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2015-06 SOUNDEX and other “cool” features – Part four: DB2 10-DB2 11

 

Update for DB2 10 and all new for DB2 11

 

This newsletter completes my walk through of new Scalar functions that I began a few years ago. I will start today with two new ones that were introduced by APAR PM56631 in May 2012 to DB2 10. PACK (a Scalar Function), and its opposite number UNPACK (a Row Function).

 

New Scalar functions in DB2 10:

Some PACK examples

PACK basically “packs” together a list of columns, or literals, into one variable length binary string in UNICODE format. Here’s some real PACK examples—note that SYDUMMYU is used!

SELECT PACK(CCSID 1208, 'ALICE', DATE('2014-07-22') , DOUBLE(8.6))      
FROM SYSIBM.SYSDUMMYU;                                                  
---------+---------+---------+---------+---------+---------+---------+--
00000301C0018001E004B80005414C494345201407224021333333333333            
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
SELECT PACK(CCSID 1208, 'ROY', BIGINT(123456789) ,                      
TIMESTAMP('2014-07-22-07.43.23.123456'))                                
FROM SYSIBM.SYSDUMMYU;                                                  
---------+---------+---------+---------+---------+---------+---------+--
00000301C001EC018804B80003524F5900000000075BCD15000620140722074323123456
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
SELECT PACK(CCSID 1208,                                                 
TIMESTAMP('2014-07-22-07.55.23.123456'))                                
FROM SYSIBM.SYSDUMMYU;                                                  
---------+---------+---------+---------+---------+---------+---------+--
0000010188000620140722075523123456                                      
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

You can see the format that it creates is actually pretty standard and can have uses in normal processing!

Using the second example above, we can see what is in the output:

The VARBIN string starts with a control flag byte which we can happily ignore

00

Then comes how many elements are in the data

0003

Then comes a set of “number of element” SQLTYPES. If any are odd numbered then the data value is *not* in the string as it is in fact NULL

01C0   Varying Character data 448 in decimal
01EC   BIGINT data 492 in decimal
0188   TIMESTAMP data 392 in decimal

 

The first data element CHAR containing “ROY” looks like

04B8   CCSID of String 1208 in decimal
0003   Length of string 
524F59 Data in UNICODE

 

The second data element BIGINT containing 123456789 looks like

00000000075BCD15      BIGINT data big endian always!

The third data element TIMESTAMP containing ‘2014-07-22-07.43.23.123456’ looks like

0006                  TIMESTAMP precision
20140722074323123456  TIMESTAMP data

UNPACK statements

Now, these VARBIN strings can get very long indeed, but what can you do with all of the data? Well, obviously UNPACK springs to mind… Here’s the above data again, but getting input to an UNPACK statement you have to create a FUNCTION first… something like this:

--#SET TERMINATOR $                                        
CREATE FUNCTION PACKUDF ()                                 
        RETURNS VARBINARY(128)                             
        VERSION V1                                         
        LANGUAGE SQL                                       
        DETERMINISTIC                                      
        NO EXTERNAL ACTION                                 
        CONTAINS SQL                                       
        CALLED ON NULL INPUT                               
        STATIC DISPATCH                                    
        NOT SECURED                                        
        DISABLE DEBUG MODE                                 
        BEGIN                                              
        DECLARE PACKSTR VARBINARY(128);                    
        SET PACKSTR =                                      
               PACK(CCSID 1208                             
                   ,VARCHAR('ROY' , 40)                    
                   ,BIGINT(123456789)                      
                   ,TIMESTAMP('2014-07-22-07.43.23.123456')
                   );                                      
        RETURN PACKSTR;                                    
        END                                                
$   

This just does what the second example did, but buried in a Scalar SQL Function. This can then easily be called by just doing this:

SELECT UNPACK(PACKUDF()).*                                                           
          AS (USER_NAME    VARCHAR(40) CCSID UNICODE                    
             ,LARGE_NUMBER BIGINT                                       
             ,WHEN_USED    TIMESTAMP)                                   
FROM SYSIBM.SYSDUMMY1 ;                                                 
---------+---------+---------+-..--------+---------+---------+---------+---------+---
USER_NAME                      ..            LARGE_NUMBER  WHEN_USED                 
---------+---------+---------+-..--------+---------+---------+---------+---------+---
ROY                                             123456789  2014-07-22-07.43.23.123456
DSNE610I NUMBER OF ROWS DISPLAYED IS 1   
            

I deleted some white space (..) above just to get it all to fit on the page, by the way! Now quite *what* you would do with PACK and UNPACK I’ll leave up to you!!!

 

On now to DB2 11: Aggregate Functions and 11 new scalar functions for new ARRAY data type

Here we got three new Aggregate Functions and 11 new Scalar Functions. Most of these are for the new ARRAY data type that was introduced in DB2 11. First, here’s how you can create ARRAYs, so that you can understand a bit more about the Functions that follow:

CREATE TYPE BOXWELL.ROYARRAY  AS INTEGER  ARRAY ??(??) ;     
CREATE TYPE BOXWELL.ROYARRAY4 AS CHAR(16) ARRAY ??( 32 ??) ; 
CREATE TYPE BOXWELL.ROYARRAY5 AS CHAR(16)                    
   ARRAY ??( VARCHAR(8) CCSID UNICODE FOR MIXED DATA ??) ;

Note that I use trigraphs ( ??( = [ and ??) = ] ), as I *hate* the square brackets! (I would *love* to meet that developer in a dark alley one night…I know, I know, it is like that in the “other” SQL world but it doesn’t mean I have to use them!).

 

Now arrays are pretty cool for passing data back from SQL Scalar Functions, or for Parameters of SQL Scalar Functions and Native SQL Procedures, as it simplifies the handling of lots of same type parameters. This is really good for all the JAVA programmers out there who use array types all the time!

In fact, all of the ARRAY stuff is *only* allowed within the confines of Stored Procedures and SQL PL.

What also arrived in DB2 11 is the ability to “unravel the array” using the new “collection-derived-table” syntax—otherwise known as UNNEST—and here are the two examples from the doc:

 

Example 1: Suppose that PHONENUMBERS is a user-defined array type that is defined as an ordinary array.

RECENT_CALLS is an array variable of the PHONENUMBERS type. RECENT_CALLS contains the following phone numbers:

9055553907
4165554213
4085553678
The following SELECT statement uses UNNEST to retrieve the list of phone numbers from the Array:

SELECT T.ID, T.NUM FROM UNNEST(RECENT_CALLS) WITH ORDINALITY AS T(NUM, ID);

ID NUM
1 9055553907
2 4165554213
3 4085553678

SET PHONELIST[’Home’] = ’4443051234’;
SET PHONELIST[’Work’] = ’4443052345’;
SET PHONELIST[’Cell’] = ’4447893456’;

The following SELECT statement is executed:

SELECT T.ID, T.PHONE FROM UNNEST(PHONELIST) AS T(ID, PHONE);

The result table looks like this, although the order of rows might differ:

ID PHONE
Cell 4447893456
Home 4443051234
Work 4443052345

 

The Array Aggregate Function, ARRAY_AGG, and all of the Scalar Functions:

ARRAY_DELETE,
ARRAY_FIRST,
ARRAY_LAST,
ARRAY_NEXT,
ARRAY_PRIOR,
CARDINALITY,
MAX_CARDINALITY,
and TRIM_ARRAY,
are all available to use but only within the confines of SQL PL.

 

CHAR9 and VARCHAR9

Also new in DB2 11 are CHAR9 and VARCHAR9, which are the good old functions from DB2 9 but now keeping the way they used to work forever—I hope they will not actually be used!

XSLTRANSFORM was also delivered to do XML transformations, but I have never actually tried it. You must note this bit of text:

 

This user-defined function requires IBM SDK for z/OS, Java Technology Edition Version 6.
This user-defined function uses the XSLT support that is provided by the W3C XSL Transformations V1.0 Recommendation.

 

GROUPING & MEDIAN

Then we got two new Aggregate Functions. The first is GROUPING—used in the OLAP extensions for doing automatic sub- and grand totaling, and then MEDIAN—which *only* works if you have the IDAA switched on!

So, lots of interesting stuff here. You may want to go back to the 3 newsletters  from a few years ago (2012) to recap all of these “cool” features. That way you’ll have plenty to read if you get bored on holiday!

 

Feel free to send me your comments and ask questions!

Link back to general newsletter page