How to validate that your Db2 LOGs are OK ?
Sizing of LOGs changed quite a bit in Db2 11 and Db2 12, so I thought a little recap about the correct sizing and ZPARMS was in order this month.
– Db2 Active logs – how many?
The perennial favourite “just enough” is of course the correct answer! Naturally you want to make sure that about 24 hours’ worth of data is available on your active logs as reading the archives, especially if they have been migrated to tape, is a real performance killer during restart and rollback processing!
The range of active logs you can have is from 2 to 93 with 3 being the default.
– Db2 Archive logs – how many?
Another perennial favourite “just enough” is of course the correct answer! (See more details later in this newsletter.)
The range of archive logs you can have is from 10 to 10,000, with 10,000 being the default.
Where am I?
How can you find out what your system is doing? Simply run the DSNJU004 BSDS stand-alone print routine to look at the current data. Here is an example JCL to do it all for you:
//BSDS EXEC PGM=DSNJU004 //STEPLIB DD DSN=<your.exit.lib>,DISP=SHR // DD DSN=<your.load.lib>,DISP=SHR //SYSUT1 DD DSN=<your.BSDS 01>,DISP=SHR //SYSUT2 DD DSN=<your.BSDS 02>,DISP=SHR //SYSPRINT DD SYSOUT=*
And here is the output from my little Db2 12 subsystem, cut down to show just the bits I want to use!
ACTIVE LOG COPY 1 DATA SETS START RBA/TIME END RBA/TIME DATE/LTIME DATA SET INFORMATION -------------------- --------------------- ---------- -------------------- 000000000013516C0000 0000000000135387FFFF 2016.068 DSN=DC10.LOGCOPY1.DS01 2017.307 12:32:08.6 2017.308 20:54:21.1 17:27 STATUS=REUSABLE 00000000001353880000 00000000001355A3FFFF 2016.068 DSN=DC10.LOGCOPY1.DS02 2017.308 20:54:21.1 2017.310 07:31:00.0 17:27 STATUS=REUSABLE 00000000001355A40000 00000000001357BFFFFF 2016.068 DSN=DC10.LOGCOPY1.DS03 2017.310 07:31:00.0 ........ .......... 17:27 STATUS=NOTREUSABLE ARCHIVE LOG COPY 1 DATA SETS START RBA/TIME END RBA/TIME DATE/LTIME DATA SET INFORMATION -------------------- -------------------- ---------- -------------------- 00000000000000000000 000000000000021BFFFF 2016.068 DSN=DC10.ARCHLOG1.D16068.T1743011.A0000001 2016.068 18:37:21.8 2016.068 18:43:00.9 17:43 VOL=SE121D UNIT=SYSALLDA CATALOGUED 000000000000021C0000 0000000000000437FFFF 2016.068 DSN=DC10.ARCHLOG1.D16068.T1743391.A0000002 2016.068 18:43:00.9 2016.068 18:43:39.1 17:43 VOL=SE121D UNIT=SYSALLD CATALOGUED 00000000000004380000 0000000000000653FFFF 2016.069 DSN=DC10.ARCHLOG1.D16069.T0756574.A0000003 2016.068 18:43:39.1 2016.069 08:56:57.3 7:56 VOL=SE121D UNIT=SYSALLD CATALOGUED
From this you can see that I have three active logs that last about one to two days, and the complete archives from the first install are still available – that will probably not be true in your case!
Now, much later in the output, you can see where I am today:
000000000013516C0000 0000000000135387FFFF 2017.308 DSN=DC10.ARCHLOG1.D17308.T1954212.A0002638 2017.307 12:32:08.6 2017.308 20:54:21.1 19:54 VOL=SE123 UNIT=SYSALLD CATALOGUED 00000000001353880000 00000000001355A3FFFF 2017.310 DSN=DC10.ARCHLOG1.D17310.T0631002.A0002639 2017.308 20:54:21.1 2017.310 07:31:00.0 6:31 VOL=SE123F UNIT=SYSALLD CATALOGUED
Then the COPY 2 data where you can see that I do not even archive the COPY 2 logs. Again this should not be the case in your shop.
ACTIVE LOG COPY 2 DATA SETS START RBA/TIME END RBA/TIME DATE/LTIME DATA SET INFORMATION -------------------- -------------------- ---------- -------------------- 000000000013516C0000 0000000000135387FFFF 2016.068 DSN=DC10.LOGCOPY2.DS01 2017.307 12:32:08.6 2017.308 20:54:21.1 17:27 STATUS=REUSABLE 00000000001353880000 00000000001355A3FFFF 2016.068 DSN=DC10.LOGCOPY2.DS02 2017.308 20:54:21.1 2017.310 07:31:00.0 17:27 STATUS=REUSABLE 00000000001355A40000 00000000001357BFFFFF 2016.068 DSN=DC10.LOGCOPY2.DS03 2017.310 07:31:00.0 ........ ........ 17:27 STATUS=NOTREUSABLE ARCHIVE LOG COPY 2 DATA SETS NO ARCHIVE DATA SETS DEFINED FOR THIS COPY
From this data you can work out a few simple things:
1) How often do you fill an active log?
2) How many archive logs do you have in the BSDS, and do they actually still exist?
- You can now also check that all your active logs have the same dataset characteristics (Primary Quantity, Secondary, etc.).
- If you are filling up more than one log per 10 minutes, you may want to check the size of your logs. The size of the active log can be set to an absolute physical maximum of 5,825 cylinders (87,375 TRKs) for non-EAV and 5,817 cylinders (87,255 TRKs) for EAV. Why the strange difference? Because EAV can only allocate in blocks of 21 cylinders due to the way the EAV volumes work. (See later for more information about using more than 65,535 TRKs for an archive log though!)
- Space allocation:
The initial allocation of space is done through the Install CLIST and the tailored sample JCL in the DSNTIJIN member. That is where you can override whatever values are generated by Db2 into “better” ones for your system. In the bad old days the allocation was in RECORDS and today I see KILOBYTES most of the time – which is still a bit of a headache when you want to allocate in CYLINDERS but it is all doable, as the DEFINE CLUSTER syntax allows these size definitions:
CYLINDERS(primary) KILOBYTES(primary) MEGABYTES(primary) RECORDS(primary) TRACKS(primary)
Log ZPARM of Interest
For the log, there is really only one ZPARM of any real importance when it comes to performance and that is OUTBUFF. Set this to be the biggest you can “afford” in terms of real memory, as it takes the memory as fixed! 400,000 KB is the current max, and this is the IBM recommended value. Reading from memory for rollback is a lot quicker than from active log, which is a lot faster than reading the archive log.
From the documentation:
|The larger the output buffer, the more likely that a log read request can return the required log records without needing I/O operations. This is a particularly important consideration if the DB2 subsystem is to be used with a data replication product that uses IFCID 306. In DB2 12, log buffers are permanently page-fixed. When you estimate real storage usage, you must use the entire size that you specify for the OUTBUFF parameter. To avoid page-fixing more storage than necessary, carefully choose the setting for OUTBUFF.
Choose the largest size that your system can tolerate for the log output buffer. Because the pages for the log output buffer are permanently fixed in real storage, choose the largest size that you can dedicate in real storage. A larger size for the log output buffer might decrease the number of forced I/O operations that occur because additional buffers are unavailable, and can also reduce the number of wait conditions.
Archive logs are different
The size of the archive logs and how many are recorded in the BSDS are input parameters on installation panel DSNTIPA:
DSNTIPA INSTALL DB2 - ARCHIVE LOG DATA SET PARAMETERS ===> _ Enter data below: 1 PRIMARY QUANTITY ===> 125 Primary space allocation in cylinders 2 SECONDARY QTY ===> 15 Secondary space allocation in cylinders 3 DEVICE TYPE 1 ===> TAPE Unit name for COPY1 archive logs 4 DEVICE TYPE 2 ===> Unit name for COPY2 archive logs 5 BLOCK SIZE ===> 24576 Rounded up to 4096 multiple 6 READ TAPE UNITS ===> 2 Number of allocated read tape units 7 DEALLOC PERIOD ===> 0 Time interval to deallocate tape units 8 RECORDING MAX ===> 10000 Number of data sets recorded in BSDS
In Db2 10 and Db2 11 you also had an allocation unit (ALCUNIT ZPARM) input field with valid values of BLK, TRK or CYL with a default of BLK.
The PRIQTY now has a range from 1 to 4,369 and a default of 125. (Remember the absolute physical maximum earlier of 5,825/5,817 CYLs? That is why you can/should edit the DSNTIJIN member!)
If migrating from Db2 11, the ALCUNIT value is used to convert from BLK or TRK to CYLs in Db2 12. If not yet in Db2 12 then the ALCUNIT tells you what the allocation unit is for the PRIQTY in either BLK, TRK or CYL with a range of 1 to 999,999.
Some further info about the size of the archive logs here. Remember that they are flat files *not* VSAM Linear Datasets:
|By default, DFSMS Direct Access Device Space Management (DADSM) limits the space allocation on a single volume to less than 65535 tracks. Therefore, if the archive log data set size can be greater than or equal to 65535 tracks, you need to specify a primary space quantity of less than 65535 tracks. This action forces the archive log data set to extend to a second volume.
Alternatively, the archive log data sets can be allocated by a DFSMS data class that has a DSNMTYP (data set name type) setting of LARGE or EXT. In this case, you can specify a primary space quantity larger than 65535 tracks without extending to a second volume. LARGE indicates that data sets in the data class are to be allocated in large physical sequential format. EXT indicates that data sets in the data class are to be allocated in extended physical sequential format. A setting of EXT is recommended, and it is required for striping of data sets. If you specify EXT, also set the IFEXT (if extended) parameter to R (required) rather than P (preferred).
The SECQTY has a range from 1 to 4,369 and a default of 15. Same rules for ALCUNIT and switching to CYL in Db2 12 apply here as per the PRIQTY.
The recording max just tells you how many archive log dataset names the BSDS can remember. The documentation mentions two very important facts about this number:
When this number is exceeded, recording resumes at the beginning of the BSDS.
You must create image copies of all DB2 objects, probably several times, before the archive log data sets are discarded. If you fail to retain an adequate number of archive log data sets for all the image copies, you might need to cold start or reinstall DB2. In both cases, data is lost.
So there are two top tips for you! Check if you have had a wrap around and check if you have datasets hanging around that Db2 can no longer allocate and use.
Striped or not?
Finally, for logs, the striping of the active logs is nowadays not recommended, as devices are so much faster, but striping and/or compression of the archive logs is still recommended if space is an issue.
From the Db2 Documentation:
|In most cases, do not stripe active log data sets. You can use DFSMS to the stripe the logs, but striping is generally unnecessary with the latest devices. Striping increases the number of I/Os, which can increase CPU time and lead to potentially greater DB2 commit times. Striping might improve the performance of batch insert jobs, but it might also harm the performance of online transaction processing. Striping is especially risky for performance if you replicate the logs over long distances.
Consider striping and compressing archive log data sets by using DFSMS. Doing so might speed up the time to offload the logs and the time to recover by using archive logs. However, the performance of DFSMS striping and compression depends on the z/OS release and the types of hardware that you use.
Check your CHECKPOINTS
Getting back to the output of the DSNJU004 job, a few more things like conditional restart recs etc. are output and then the Checkpoint Queue:
CHECKPOINT QUEUE 08:00:31 NOVEMBER 06, 2017 TIME OF CHECKPOINT 07:31:03 NOVEMBER 06, 2017 BEGIN CHECKPOINT RBA 00000000001355B750DC END CHECKPOINT RBA 00000000001355B81B18 END CHECKPOINT STCK 00D365BB89179A529200 TIME OF CHECKPOINT 20:54:23 NOVEMBER 04, 2017 BEGIN CHECKPOINT RBA 000000000013538813C5 END CHECKPOINT RBA 00000000001353889D76 END CHECKPOINT STCK 00D363EB5D1777CFB000 TIME OF CHECKPOINT 12:32:09 NOVEMBER 03, 2017 BEGIN CHECKPOINT RBA 00000000001351723326 END CHECKPOINT RBA 0000000000135172D899 END CHECKPOINT STCK 00D362393D4E6F748C00 TIME OF CHECKPOINT 07:30:11 NOVEMBER 03, 2017 BEGIN CHECKPOINT RBA 0000000000134F5915DD END CHECKPOINT RBA 0000000000134F59C445 END CHECKPOINT STCK 00D361F5BEB5A941CA00 TIME OF CHECKPOINT 07:42:59 NOVEMBER 01, 2017 BEGIN CHECKPOINT RBA 0000000000134D34EB53 END CHECKPOINT RBA 0000000000134D357022 END CHECKPOINT STCK 00D35F74E054AE870800
You can see how busy my Db2 system is, anyway your data will be a lot different I am sure!
Checkpoint ZPARMS of interest
LOGRECS/MINUTES/BOTH MINUTES is default and is what I recommend. This parameter can still have the old value “SINGLE” which then respects the CHKFREQ value as if it were minutes or log records depending on the size of the Number. Change this to Minutes!
If LOGRECS how many log records before a checkpoint 1,000 – 16,000,000
If MINUTES how many minutes before a checkpoint 1 – 60
If BOTH how many log records before a checkpoint 1,000 – 99,999,999
If BOTH how many minutes before doing a Checkpoint. 1 – 1439
IBM, and I, recommend a checkpoint every five minutes or so to get optimum rollback and log usage. Naturally your mileage may vary. Also remember that Db2 writes a checkpoint at active log change as well, so try and guarantee that at least ten checkpoints’ “worth” of data fits onto one log dataset.
One extra note arrived in Db2 11:
|If the CHECKPOINT TYPE field is set to BOTH, DB2 prevents log checkpoints from being taken too frequently and degrading performance. Log checkpoints are scheduled, but not necessarily immediately taken, when the number of minutes that passes reaches the value of CHKMINS.|
Following all this advice should mean that your logs settle down and your system runs a little bit smoother!
As usual, if you have any comments or queries please feel free to drop me a line!