2019-01: Regular Expressions in Db2 SQL – Regulate this!

Impress your JAVA colleagues with Regular Expressions in Db2 SQL – Become a regex specialist with these samples (regex usage in the TSO Editor & regex as a predicate).

The references to blogs focusing on this topic will help round out your knowledge all about “Regular Expressions in Db2 SQL”
(advanced pattern matching, regex testing and REGEXP_LIKE usage).

I have never really been a fan of regular expressions (regex) as, by default, I work on Mainframes and regex was always “just for the small boxes that you can lift”, however, since Db2 12 I have started to play around with them in various abstruse places.

TSO? You gotta be kidding?

Yep, even in the TSO editor you can use regex. This example will find all occurrences of data that starts with an M, contains DB and ends with at least two numerics. After the pipe (an OR in regex) it does the same but for strings that start with an O followed by 2DB and ends with two numerics:

VIEW       SE.MDB2VNEX.TCOBOL(#GOALL) - 01.00              Columns 00001 00072
Command ===> F R'^M[DB].*[0-9]{2}|^O[2DB].*[0-9]{2}' all  Scroll===> 
****** ***************************** Top of Data ********************
001600 MDB2DBRI  EXPD=N                                             
001700 MDB2DBRT  EXPD=N                                              
001800 MDB2DBSC  EXPD=N                                              
001900 MDB2DBSF  EXPD=N                                              
002000 MDB2DBSG  EXPD=N                                              
002100 MDB2DBTA  EXPD=N                                              
002200 MDB2DBTP  EXPD=N                                              
002300 MDB2DBTS  EXPD=N                                              
002400 MDB2DB01  EXPD=N                                              
002500 MDB2DB02  EXPD=N                                              
002600 MDB2DB03  EXPD=N                                              
002700 MDB2DB04  EXPD=N

Note that changing the R to be RC then makes the regex case sensitive.

Using “x all” first, so you can see what it finds…

VIEW SE.MDB2VNEX.TCOBOL(#GOALL) - 01.00 28 CHARS '^M[DB].*[0-9]{
Command ===>                                     Scroll ===> CSR 
002400 MDB2DB01 EXPD=N 
002500 MDB2DB02 EXPD=N 
002600 MDB2DB03 EXPD=N 
002700 MDB2DB04 EXPD=N 
002800 MDB2DB05 EXPD=N 
002900 MDB2DB06 EXPD=N 
003000 MDB2DB07 EXPD=N 
003100 MDB2DB08 EXPD=N 
003200 MDB2DB09 EXPD=N 
003300 MDB2DB10 EXPD=N 
003400 MDB2DB11 EXPD=N 
003500 MDB2DB12 EXPD=N 
003600 MDB2DB13 EXPD=N 
- - - - - - - - - - - - - - - - - - - 7 Line(s) not Displayed
009900 MDB2U001 EXPD=N 
010000 MDB2U003 EXPD=N 
010100 MDB2U005 EXPD=N 
010200 MDB2U006 EXPD=N 
010300 MDB2U007 EXPD=N 
- - - - - - - - - - - - - - - - - - - 4 Line(s) not Displayed
019800 O2DB60 EXPD=N 
019900 O2DB61 EXPD=N 
020000 O2DB62 EXPD=N 
020100 O2DB63 EXPD=N 
020200 O2DB64 EXPD=N 
020300 O2DB65 EXPD=N 
020400 O2DB66 EXPD=N 
020500 O2DB67 EXPD=N 
020600 O2DB68 EXPD=N 
020700 O2DB69 EXPD=N

And the Message looks like:

All pretty cool! Naturally you can use nearly any format of regex wherever you like and so it is very powerful. They are fully supported in Find, Change (But not for the second argument in a Change!) and Exclude.

Further infos about regex


Usage in TSO: For more info about their usage in TSO check out the KC entry

https://www.ibm.com/docs/en/zos/2.1.0?topic=string-regular-expressions-string1#useofr1


REXX Usage: Rexx usage is also available and Martin Packer wrote a great little Blog all about a regex testing routine that you can use to play with as well:
https://mainframeperformancetopics.com/2020/01/05/tso-regular-expression-testing-tool/


So what about Db2?

Well… on the “boxes you can lift” they have had regex for a long time. All built into the Db2 Engine. Fred Sobotka’s article “Advanced Pattern Matching with Regular Expressions in DB2 11.1 for LUW” in the IDUG Blog shows lots of really cool ways of using REGEXP_LIKE and its brethren and is well worth a read.

What about “real” Db2?

Well… we have it… sort of… not really what you would think but it is indeed there! You just have to jump over some fences and avoid some rocks and twist your required data into XML and Bob’s your uncle!

Say what? Regex lists out useful tables like:

Ok, ok, enough delaying… here is a regex that lists out all tables that start with between two and five characters ranging from B to Z and then ends with just two numerics:

SELECT NAME, creator from SYSIBM.SYSTABLES 
WHERE 
XMLEXISTS('$newXDoc[fn:matches(., "^[B-Z]{2,5}[0-9]{2}$")]' PASSING
        XMLQUERY('<doc>{$xmltbname}</doc>' PASSING NAME as "xmltbname")
        as "newXDoc")    
order by 1     
--fetch first 10 rows onl
;
---------+---------+---------+---------+---------+---------+--------
NAME
---------+---------+---------+---------+---------+---------+--------
CERNT01  
CERNT02 
CERNT03
CERNT04  
CERNT05 
CERNT06

It will not win a beauty contest any day soon but, hey, it works!

How does that work?

SELECT NAME, creator from SYSIBM.SYSTABLES
WHERE

Just a run-of-the-mill, bog standard SELECT, FROM and WHERE – Nothing new or changed here!

XMLEXISTS('$newXDoc[fn:matches(., "^[B-Z]{2,5}[0-9]{2}$")]' PASSING 
XMLQUERY('<doc>{$xmltbname}</doc>' PASSING NAME as "xmltbname")
as "newXDoc") 

This must be viewed as two separate statements, first the inner statement:

XMLQUERY('<doc>{$xmltbname}</doc>' PASSING NAME as "xmltbname")

This “translates” the column NAME into an XML construct with the element name xmltbname ready for the regex which we do by “faking it” through an XMLEXISTS:

XMLEXISTS('$newXDoc[fn:matches(., "^[B-Z]{2,5}[0-9]{2}$")]' PASSING 
         …
         as "newXDoc")

This uses the fn:matches function which will actually do the regex.

The data in the regex ^[B-Z]{2,5}[0-9]{2}$ is a normal regex and it uses the xmltbname as input. fn:matches is boolean and so returns True or False. This is then a “normal” predicate as any row that is “true” must have passed the regex.

Clear as mud!

It is a bit weird and I can recommend some more reading,


firstly the excellent regex site where you can learn all about the joys and dangers of these beasts:

http://www.rexegg.com/


And the IBM XQuery docu that describes how the fn:matches works in detail:

https://www.ibm.com/docs/en/db2-for-zos/12?topic=expressions-regular


Both are well worth a read – be especially careful about “explosive quantifiers”!

CPU?

Yep, you guessed it. There is no such thing as a free lunch!

The use of this method is *not* cpu light.

It should only ever be used if normal LIKE, REPLACE or TRANSLATE cannot easily get the job done and if you end up coding a regex like:

^(?=(?!(.)\1)([^\DO:105-93+30])(?-1)(?<!\d(?<=(?![5-90-3])\d))).[^\WHY?]$

Then do not be surprised if your colleagues all start to hate you! 

  😉 😉

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect