2017-07 Let’s make DB2 z/OS IMAGE COPY great again!

War story :

Why is Db2 z/OS IMAGE COPY very useful in the current Db2 z/OS world?

Please excuse the heading, I simply could not resist…

This month is another war story from the trenches of Production DBAs fighting the fight for 24×7 Shops around the world…
This time it was a request to recover some tables to a specific point in time that started the ball rolling…

Staged too soon? Bad updates and deletes

The developers pushed some software to production and then found out, five hours later, that there was a logical error and it was doing really bad updates and deletes in a bunch of tables that it should *not* have done. This is bad news…

Backup system to the Rescue?

This firm uses System Level Backup (SLB), but the DBA group, I kid you not, did not know this little factoid (they had alternate facts to works with), and so when the request to restore tables x, y and z to a specific Point in Time (PiT) arrived, they simply created the required RECOVER control cards and, after checking that the tables, indexes and tablespaces were not being used, submitted the RECOVER jobs. Now, the first job worked fine, but the second and the third “bought the farm” with *very* weird messages that the DBA group did not really understand:

RECOVER Messages you do not want to see – Part one

DSNU1520I   319 08:26:44.64 DSNUCBRT - THE RECOVERY BASE FOR TABLESPACE ROYDB.ROYTS DSNUM 1 IS THE SYSTEM LEVEL BACKUP WITH DATE = 20161212, TIME 041302, AND TOKEN X'FF..FF'

DSNU1522I   319 08:26:46.85 DSNUCBRT - THE DFSMSHSM CALL TO RESTORE TABLESPACE ROYDB.ROYTS DSNUM 1 FAILED WITH RC = X'0000005D' AND REASON CODE = X'00000042' SEE THE JOB LOG FOR DFSMSHSM MESSAGES INDICATING THE CAUSE OF THE ERROR

DSNU832I  )DSJP 319 08:26:44.61 DSNUCARS - INDEX ROYSCHEMA.ROYINDEX PARTITION 1 IS IN REBUILD PENDING STATE

DSNU560I  )DSJP 319 08:26:46.87 DSNUGSRX - TABLESPACE ROYDB.ROYTS PARTITION 1 IS IN RECOVER PENDING STATE

DSNU012I    319 08:26:46.88 DSNUGBAC - UTILITY EXECUTION TERMINATED, HIGHEST RETURN CODE=8

(Some names have been changed to protect the innocent!)

Naturally I have added the red and bold highlights. Now this message *really* upset everyone…Why? The tablespaces were previously all RW and OK, but now they were COPY Pending status!

Go Figure!

So what just happened? Well Db2 detected that there was actually an SLB that could be used for the base of this recovery, and so “asked” HSM to get it “back”. It failed with Return Code (RC) X’5D’ decimal 93 and Reason Code X’42’ decimal 66. I love that 42! And with the great “tip” to “Read the Job Log”. The DBA group were a “tad” unhappy about the fact that a *failed* RECOVER set the tablespace in question to COPY Pending by the way!

Where on Earth?

So the DBAs started trying to read the Job Log and could not see anything. They have a 16-way data-sharing group running on 14 separate LPARs… There is *lots* of “job log”…

Eventually under one of 14 different STCs all called “HSM” I found this info:

RECOVER Messages you do not want to see – Part two

08.26.44 S0998158  ARC1801I FAST REPLICATION DATA SET RECOVERY IS STARTING FOR DATA SET QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001, AT 08:26:44 ON 2016/12/12

08.26.46 S0998158  ARC0624I PHYSICAL DATA SET COPY OF VOLUME QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001 TERMINATED PRIOR TO COMPLETION, DFSMSDSS FAILING RC = 8

08.26.46 S0998158  ARC1860I THE FOLLOWING 0001 DATA SET(S) FAILED DURING FAST REPLICATION DATA SET RECOVERY:QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001, COPYPOOL=DSN$LOCDS0P$DB, DEVTYPE=DASD, VOLUME=WSPS95, ARC1166, RC=0

08.26.46 S0998158  ARC1802I FAST REPLICATION DATA SET RECOVERY HAS COMPLETED FOR DATA SET QA1B.DSNDBC.ROYDB.ROYTS.J0001.A001, AT 08:26:46 ON 2016/12/12, FUNCTION RC=0008, MAXIMUM DATA SET RC=0093

So here we see that it failed for RC=0093 (Now in decimal) but with a DFSMSDSS RC = 08

Confused? You will be…

I have not yet found what an RC = 0093 means – My best guess at the moment, is that the data had been “moved” since the SLB, and so the VOLUME swap failed due to some reason or other. The tablespace and index were unavailable and the “window of change” was closing fast…

Can I run backwards out the door?

Panic was approaching when I mentioned they could do the “backwards” LOGAPPLY (BACKOUT YES) and so with high hopes we attempted it, only to see that it sadly died a death if you have COPY Pending status – We felt pretty recursive at this point…

So that left us with one last chance and that was to tell RECOVER to ignore SLB and go directly to an earlier (RESTOREBEFORE) IC with a TORBA syntax.

Hoorah!

We had to wait awhiles but we got there! The RECOVERs all ran though clean and all was well… <phew>

Where’s the Beef?

So “What’s the point?” I hear you muttering… quite simple really:


1- Never rely on SLB to always work


2- Make sure you test it before switching your objects to COPY Pending by accident


3- Still take good old fashioned Image Copies


4- Use of BACKOUT YES can save your bacon *if* you know about it


As usual, if you have any comments or queries please feel free to drop me a line!

 

TTFN

Roy Boxwell

 


More about Utility Management and Space Management: See our RTDX suite of tools


 

2016-08 – Is it Safe? How to recover accidently dropped tables

Do you have the DDL anywhere?
Was there a “recent” image copy or disk back-up? Who can you call for help?

The newsletter title this month is really nothing to do with the film “Marathon Man”, but sometimes backup and recovery can feel just like having your teeth drilled… Anyway, the title is actually meant to get you to think again about your site’s back-up and recovery definitions—specifically that age-old chestnut about “accidently” dropped objects.

RECOVER from DROP?

I saw in LISTSERV a discussion about recovering from dropped tables and tablespaces. It is a pretty horrible situation when you realize that you just confirmed the drop of a test table to then suddenly see that, as your finger is descending towards the ENTER key, that it is, in fact, a different name…
recover accidently dropped db2 zos table based on DDL extraction

DDL and Back-ups Handy?

Normally, at this point, the air is filled with colorful language and interesting local metaphors. Once it has calmed down a bit the real work starts: Do you have the DDL anywhere? Was there a “recent” image copy, or a disk back-up? Who can you call for help?

Mirror Mirror on the wall

Remember that mirroring etc. will not help you as the DROP has also been successfully mirrored. So within a moment, the data was also dropped at your disaster recovery site. (Argh!)

Newbies then start looking in SYSIBM.SYSCOPY for the last image copies, while us grey-haired oldies more mature experts start looking in production control copy libraries and BETA92!

The heat is on!

At this point a couple of things happen: The telephone starts ringing and a manager-type person materializes to ask annoying questions all the time. Typically: “How long will it take to get the data back?”

Now you have two possibilities:

    1. You are in luck! Someone somewhere extracted all the DDL for the table(s) with DBID, PSID, and OBID(s) so you could generate a DSN1COPY job from the last found Image Copy dataset

recover accidently dropped db2 zos table based on DDL extraction

2. You are *not* in luck! You have no idea how the table looks, and you cannot see if anyone ALTERed it in the last five years or so.

Crashed and Burned

If you are in position 2, it is now a good time to update your CV and make sure your desk is neat and tidy… Now you might have access to some nifty third party vendor tools, but for that you must at least have the dataset name of the last Image Copy and—of course—the third party tool itself! Or the ability to read the DB2 log and “resurrect” the table defs from there—Not a pretty place to go I assure you!

Back up that chain of thought for a Moment

So let’s rewind and imagine that you are doing this all differently… What about beginning today with an extract of all the DDL on your system? Then capture all of the IFCID 62’s that show any DDL changes. Going further: what about getting all of the IFCID 220’s to get dataset allocations correlated with Utility IDs and DBIDs and PSIDs.

Imagine what you could then do?

Wow! Cavalry over the hill

Yep, you have a ”history” of all the DDL that has happened on your machine right up to the point when your object was DROPed *and* you have the dataset name(s) of the last image copies, as well as all the internal IDs to enable a successful DSN1COPY job complete with OBID translation! Cool huh? So suddenly you are now a hero instead of a villain!
recover accidently dropped db2 zos table based on DDL extraction

recover accidently dropped db2 zos table based on DDL extraction

DIY or Buy in?

So much for the theory – What about in practice? Well you can write it all yourself, or you could use this as a sort of useful side effect from our WorkloadExpert (WLX) software, which has all this built in! We already get all these IFCIDs, we already have a DDL Generator bundled with the WLX software for the Audit Use Cases, and so it really kills two birds with one stone!

Restricted movement?

Now, of course, you could argue:“Wait! I have RESTRICT ON DROP set for all my productive tables!” Now this works really well for accidental drops, but I have seen lots of places where it should be used but was in fact forgotten. How can you check? Run this little SQL to validate that what you think is true really is true:

recover accidently dropped db2 zos table based on DDL extraction

SELECT COUNT(*) AS TABLES                                     
       ,COUNT(CASE WHEN CLUSTERTYPE = 'Y' THEN 1 ELSE NULL END) 
                 AS DROP_RESTRICT                               
       ,COUNT(CASE WHEN CLUSTERTYPE = ' ' THEN 1 ELSE NULL END) 
                 AS NOT_DROP_RESTRICT                           
FROM SYSIBM.SYSTABLES                                         
WHERE     TYPE   IN ('T' , 'M' , 'H' , 'R' )

  AND NOT DBNAME IN ('DSNDB01' , 'DSNDB06' )

WITH UR
 ;

 

I hope the results do‘nt have you feeling like Dustin Hoffman in the movie .

As usual any questions or comments are welcome,

TTFN Roy Boxwell