Compressed CKD Dasd Emulation


Contents


Description

A compresssed CKD Dasd file is an additional alternative file for emulated CKD Dasd devices. The space required for a compressed file can be significantly less than a regular file. For example, an os/390 sysres volume that was using 90% of a 3390-3 occupied 450M (as opposed to 2.4G). Each track image is compressed using zlib or bzip2, and each compressed track image occupies only the space necessary in the file. Additionally, unused or null tracks (tracks that contain only an end-of-file marker) do not occupy any space at all in the file.

Method

The logical byte offset of a track image in a regular file is directly calculated from the track number:
offset = 512 + trk_nbr * max_trk_sz;
This is a logical offset because there may be more than one real file emulating the CKD Dasd device. The offset of a compressed track image in a compressed file is determined by performing a two table lookup using the track number. This is an actual offset because only a single compressed file is supported. The quotient of the track number divided by 256 indexes into the primary lookup table. The primary lookup table entry contains the offset of the secondary lookup table. The remainder of the track number divided by 256 indexes into the secondary lookup table. The secondary lookup table entry contains the compressed track's offset and length. The primary lookup table resides in memory during execution of the program, file i/o is performed for entries in the secondary lookup tables. For example:
lseek (fd, prime[trk_nbr/256] + (trk_nbr % 256) * sizeof(second), SEEK_SET);
read (fd, &second, sizeof(second));
offset = second.offset;
Notice that a compressed file contains a single primary lookup table but contains a number of secondary lookup tables. This number depends on the number of tracks for the emulated CKD device. Since each secondary lookup table references 256 tracks, the maximum number of tables is the total number of tracks divided by 256, rounded up. For example, a 3390-3 contains 50085 tracks and would require up to 196 secondary lookup tables. However, if all 256 tracks for any secondary lookup table are null tracks, then that secondary lookup table doesn't exist and the corresponding entry in the primary lookup table is zero.

Implementation

The compressed function is implemented in ckddasd.c by replacing the read()/write()/lseek() library calls with intermediate function calls to ckd_read()/ckd_write()/ckd_lseek(). If the emulation file is a regular (uncompressed) file, then these routines simply call the library routines; otherwise the routines will call functions cckd_read()/cckd_write()/cckd_lseek() located in cckddasd.c.

Note that cckd_read() and cckd_write() functions do not perform file i/o; they merely cause data to be copied from/to the current uncompressed track image buffer. A call to cckd_lseek(), however, if it causes a track switch, will cause the current track image buffer to be scheduled for compression and writing (if it has been written to) and will cause the new compressed track image to be read and uncompressed.

A regular or uncompressed emulation file is distinguished from a compressed emulation file by the eye-catcher in the device header at the beginning of the file. A regular file contains "CKD_P370" and a compressed file contains "CKD_C370".

The data areas required for a compressed CKD Dasd device are in an extension pointed to by a field in the CKD DASD section of the DEVBLK, cckd_ext. If this field is NULL then the emulation file is a regular file.


Writing

When a track image needs to be written, it is scheduled. That is, it is placed in a queue, called the deferred write queue (dfwq) and the deferred write thread is signalled. The deferred write thread compresses the track image (usually compression is more expensive than uncompression), finds space in the file for the compressed image, writes the compressed image, and updates the lookup tables. A compressed track image in the file may contain imbedded free space; this allows the compressed track image to increase in size without having to free the current space and obtain new space. Generally, the smaller the track image, the larger the imbedded free space is (based on the assumption that smaller track images will tend to grow). If the new track image is too large to fit in its current space, then this space is freed and new space is acquired within or at the end of the file. Conversely, if the size of a compressed track image decreases enough, some of its imbedded free space will also be freed.

Reading

When a new track image is to be read, it is possible that the image is still pending writing, so the deferred write queue is searched first. If found, a bit is set telling the deferred write thread not to free the buffer. Otherwise, the two table lookup is performed, the compressed track image is read and is synchronously uncompressed (fortunately, uncompression as a rule is less expensive than compression). If, however, either offset in the primary or secondary lookup table is zero, then a null track image is constructed instead (a null track image only contains an end-of-file marker).

Free Space

The compressed CKD Dasd emulation file contains two types of free space: imbedded free space, which allows a compressed track image to increase in size, and regular free space, or just free space, which is available for allocation. Total free space is just the sum of all imbedded free space and all (regular) free space. Free space within the file has the following attributes:

Garbage Collection

A garbage collection thread is started whenever the first track image is written. Simply, the garbage collector moves free space towards the end of the file until it just falls off. This is called percolation in the program. Additionally, the garbage collector may trim imbedded free space if the ratio of imbedded free space to total free space reaches a certain threshold. If the ratio of total free space to file size is severe enough, the garbage collector will invoke a more intensive algorithm called combination, which attempts to combine near free spaces (to satisfy allocation requests) and to push them off the end of the file (to reduce the file size). In this situation the responsiveness of the emulated device will be degraded until the ratio becomes more acceptable.

By default, the following garbage collection parameters are set:
ratio state algorithm size iterations interval
50%-100% critical combination 256K 8 2
25%-50% severe combination 128K 4 4
12.5%-25% moderate percolation max_trk_sz 4 8
6.25%-12.5% light percolation max_trk_sz 2 10
0%-6.25% none percolation 32K 1 20

The ratio (and state) is hard-coded in the program and is determined by continually dividing the file size by 2 until this number is less than the total free space. size indicates how far a free space is moved towards the end of the file in a single iteration; max_trk_sz is the maximum track size for the emulated device. iterations indicate how many times the algorithm is called within a particular interval. Note - the emulation file lock is released and reacquired between each iteration; this allows i/o operations to proceed while the garbage collecter is active. interval is the number of seconds the collector sleeps before starting over.


Organization

Like a regular CKD emulation file, the first 512 bytes of the compressed file contains a CKDDASD_DEVHDR block. The eye-catcher (devid) is slightly different (CKD-C370 vs CKD_P370) to distinguish it from a regular file. The next 512 bytes contain a compressed device header or CCKDDASD_DEVHDR block. This contains space statistics, options, and garbage collection parameters. Next is the primary lookup table or the L1TAB. Each 4 byte entry in the L1TAB contains the offset of a secondary lookup table (or L2TAB) and represents 256 tracks. The size of the L1TAB is dependent on the number of tracks of the emulated device.
CKDDASD_DEVHDR



CCKDDASD_DEVHDR



L1TAB

.  .  .


Following the L1TAB, in no particular order, are L2TABs, compressed track images, and free spaces.

L2TABs contain 256 8-byte entries,and each are, consequently, 2048 bytes in length. Each entry contains the offset, length, and size of a compressed track image. length is the amount of space that is actually occupied by the compressed track image. size is the total amount of space occupied by the track image, including imbedded free space (sometimes called fudge). size must, then, always be greater than or equal to the length.

L2TAB entry
offset
4 bytes
length
2 bytes
size
2 bytes

A compressed track image contains 3 fields in the following order:

HA
5 bytes
track image (compressed or uncompressed)
length bytes
imbedded free space
(size - length) bytes

The HA contains 0CCHH, that is, a byte of zeroes, 2 bytes indicating the cylinder of the track, and 2 bytes indicating the head of the track on the cylinder. Both CC and HH are stored in big-endian byte order. The track number is computed by

trk_nbr = (CC * trks_per_cyl) + HH

Since the first byte of the HA is always 0x00 (at least in emulated CKD files), this byte as stored in the compressed CKD Dasd emulation file actually indicates the compression algorithm used for the remainder of the track image (0 = no compression, 1 = zlib compression, 2 = bzip2 compression) and is set back to 0x00 after the track image is uncompressed.
The HA is uncompressed for the following reasons:

Free space contains a 4-byte offset to the next free space, a 4-byte length of the free space, and zero or more bytes of residual (ie unpredictable) data following.

Free Space entry
offset
4 bytes
length
4 bytes
residual
(length - 8) bytes

The minimum length of a free space, then, is 8 bytes. Since free space is ordered by file offset and no two free spaces are adjacent, offset in the free space entry is always greater than the current free space offset + the current free space length, unless the offset is zero, which indicates the free space list is terminated.


Byte Order

As described above, a number of fields in the various blocks that comprise the spaces in a compressed CKD Dasd emulation file contain offsets and lengths that are more than 1 byte in length. Values in multiple bytes may be stored in either little-endian or big-endian byte order. For example, Intel architecture stores values in little-endian byte order and S390 architecture stores values in big-endian byte order. Consider the value 0x00010203; stored in little-endian byte order, we would see "03020100"; stored in big-endian byte order, we would see "00010203". The values in the compressed CKD Dasd emulation file are stored in byte order of the host machine; a bit in the CCKDDASD_DEVHDR indicates which order its values are stored. If a file is opened with the wrong byte order, then the initialization routine will automatically reverse all the values before continuing.

Recovery/Corruption

When a track image is written to a new offset in the file, the secondary lookup table must also be updated with the new offset, length and size. If the secondary lookup table doesn't exist yet, then space must be obtained for the table and the primary lookup table must be updated. If the program is interrupted before this update(s), then the compressed CKD Dasd emulation file is corrupted. Possible scenarios for interruptions are power failure, operating system failure, or program failure. Once a track has been written, the garbage collection thread is created and runs at unpredictable times. While the garbage collector is active, it is continually moving track images and secondary lookup tables towards the beginning of the file while moving free space towards the end of the file. This, in turn, requires synchronization with the lookup tables. The quit command will cause an orderly shutdown of the garbage collection thread and the deferred write thread for the device.

The program updates the compressed file in a sequence that allows recovery to occur. For example, when a track image is written to a new offset in the compressed CKD Dasd emulation file, the following sequence occurs:

1Space is obtained for the new track image
state A
2New track image is written to the new offset
state B
3Secondary lookup table is updated with the new offset
state C
4Space for the old track image is released
If a failure occurs at state A or at state B then free space will need to be rebuilt and the new track image will be lost. However, the old track image is still intact. If the failure occurs at state C then free space will need to be rebuilt and the new track image will be used.
The program that performs compressed CKD Dasd emulation file integrity checking and repair is cckdcdsk.c.

Utilities

  • cckd2ckd [options] source-file target-file
  • cckdcdsk [-level] file-name
  • cckdfix file-name
  • cckddump

    Quick Start

    Warning Compressed CKD Dasd emulation file support should be considered beta status. Ensure that you have backups of any virtual CKD volumes that you wish to compress.

    Compressed emulation is currently disabled for Windows32. I need to do some research to see if there is a Windows32 api for zlib.

    The following steps should get you up and running:

    1. Install Hercules as usual.
    2. Review the makefile. If you wish to add support for bzip2, uncomment the indicated lines in the makefile.
    3. Make hercules:
      make make install
    4. Create some compressed CKD Dasd emulation files:
          ./ckd2cckd    source-file     target-file
      creates a compressed file using zlib. To create a compressed CKD Dasd emulation file using bzip2:
          ./ckd2cckd -c 2    source-file     target-file
    5. Update the hercules.cnf file to point to the new compressed CKD Dasd emulation files.
    At this point you are ready to bring Hercules up and IPL.


    FAQ

    Q. What devices are supported ?
    A. 2311, 2314, 3330, 3340, 3350, 3375, 3380 and 3390. However, I have only tested using 3390 devices.

    Q. Is a 3390 model 9 supported ?
    A. The short answer is "no". Long answer, "sort of". A 3390-9 should compress to a file size less than the 2G limit. However, the compressed dasd program "hooks" into ckddasd.c by replacing the lseek, read and write library calls with a call to an intermediate function. The file offset parameter passed to lseek is a 32-bit signed number. For a compressed file, the cckd code treats this number as unsigned (for SEEK_SET) and uses this number to calculate the dasd track and offset. That is, for a compressed file, the file offset maintained by ckddasd.c is just a number that indicates a track and the offset into the track. That means that the largest offset is 4G-1, which is not a problem for a 3390-3 but only references about half of a 3390-9. It would be possible to modify ckddasd.c to use long long when dealing with file offsets, but I wanted to minimize changes to ckddasd.c and this change seemed a little too intrusive.

    Q. When I start hercules, I get these messages showing all this free space in my compressed files. How do I get rid of that free space ?
    A. Once the total amount of free space falls below 6% of the total file size, the garbage collector is not very aggressive about eliminating free space. To remove all free space from the file, copy the compressed file to a regular file using the cckd2ckd utility and then rebuild the compressed file by using the ckd2cckd utility.

    Q. How can I display the space statistics for a compressed file ?
    A. The statistics are displayed when the compressed file is opened. Currently, there is no supplied method to display these statistics at any other time. However, it shouldn't be too hard to write a shell script (similar to dasdlist) to display these statistics. The statistics are contained in the CCKDDASD_DEVHDR which is at offset 512 in the compressed file; the header is mapped in hercules.h.

    Q. What is a "null track" anyway ?
    A. The term "null track" is just something I made up. It is what is returned when a zero offset is found in either the primary or secondary lookup table for the track. It contains the folllowing fields:
    0CCHHHome address
    CCHH0008 00000000standard R0
    CCHH1000end-of-file marker
    ffffffffend-of-track marker
    When a null track is written, space previously occupied by the track is freed and the offset in the secondary lookup table is set to zero. If all offsets in the secondary lookup table are zero, then the secondary lookup table is freed and the primary lookup table entry is zeroed.

    Q. I want to try bzip2 but I'm getting compiler errors. What am I doing wrong ?
    A. Probably bzip2 is not installed or is not installed properly. You can obtain bzip2 from here. If bzip2 is installed, then you need to find the directory where bzlib.h is installed and the directory where libbz2.a is installed. You can then add "-I bzlib.h-directory" to the CFLAGS in the make file and add "-L libbz2.a-directory" to the LFLAGS.

    Q. Which is better, zlib or bzip2 ?
    A. This is a religious question. I have no actual preference, I just wanted to make a choice available.

    Q. Can other compression programs be used ?
    A. Yes. The program is architecturally structured so that other compression algorithms can be added rather painlessly. This will require, of course, an update to the source.

    Q. Can this compression scheme be used for FBA devices too ?
    A. I have not worked with FBA devices for over 20 years. However, it seems to me that a similar program for FBA devices should be simpler than this program for CKD devices (none of those count/key/data fields mucking everything up). Since an FBA block is 512 bytes, it might not be efficient to have each block compressed individually; it might be better to compress blocks in 32K or 64K chunks. If someone asks very nicely, I may consider looking into it;-)


    TODO


    BUGS

    This code should be considered beta status. I have tested only using 3390-3 devices and only using hercules-390. I have IPLed os/390 2.8 using an 8 volume system, all compressed CKD Dasd, and I am not currently aware of any outstanding problems. That doesn't mean that there aren't any logic errors out there that could irrevocably corrupt a compressed file.

    Actually, I have found a couple of bugs:


    cckddump os/390 hlasm program

    The cckddump program (supplied in file cckddump.hla) is an os/390 assembler language program that creates a compressed CKD Dasd emulation file from a real DASD volume. This program must be APF-authorized since it modifies the DEB to be able to read all tracks from the real device. The program executes 16 or so instructions while in supervisor state/key 0; otherwise the program runs entirely in problem state/key 8. It is not the prettiest assembler language program I've ever written, and there are plenty of enhancements that I originally intended to put into the program that I haven't yet; once I got the program working good enough, I spent the rest of my time writing the fun stuff, the Hercules part.

    The real CKD Dasd volume that is dumped must be an ECKD device (ie support 'Locate Record' and 'Read Track' CCWs); this shouldn't be a problem because I don't think any os/390 release supports a non-ECKD device. The output file must be a DASD file; its characteristics are LRECL=4096, BLKSIZE=4096, RECFM=F. The program only dumps allocated tracks (plus track 0) and only dumps tracks up to DS1LSTAR for DSORG=PS and DSORG=PO files. The program will call zlib to compress the track images if the zlib routines have been linked with the program; however, I don't think the program will be advantageous if it can't call zlib.

    Preparing zlib

    Assemble and linkedit cckddump

    Executing cckddump

    Make the file available to Hercules


    Feedback

    Questions ?? Problems ?? Comments ?? Suggestions ?? Corrections ?? Bugs ??
    Let me know at gsmith@nc.rr.com

    greg smith

    Last updated 29 October 2000