offset = 512 + trk_nbr * max_trk_sz;
This is a logical offset because there may be more than one
real file emulating the CKD Dasd device. The offset of a
compressed track image in a compressed file is determined
by performing a two table lookup using the track number.
This is an actual offset because only a single compressed
file is supported. The quotient of the track number
divided by 256 indexes into the primary lookup table.
The primary lookup table entry contains the offset of the
secondary lookup table. The remainder of the track
number divided by 256 indexes into the secondary lookup table.
The secondary lookup table entry contains the compressed
track's offset and length. The primary lookup table resides
in memory during execution of the program, file i/o is
performed for entries in the secondary lookup tables.
For example:
lseek (fd, prime[trk_nbr/256] + (trk_nbr % 256) * sizeof(second), SEEK_SET);
read (fd, &second, sizeof(second));
offset = second.offset;
Notice that a compressed file contains a single primary lookup table
but contains a number of secondary lookup tables. This number
depends on the number of tracks for the emulated CKD device.
Since each secondary lookup table references 256 tracks, the
maximum number of tables is the total number of tracks divided by
256, rounded up. For example, a 3390-3 contains 50085 tracks
and would require up to 196 secondary lookup tables. However,
if all 256 tracks for any secondary lookup table are null tracks,
then that secondary lookup table doesn't exist and the corresponding
entry in the primary lookup table is zero.
ckddasd.c
by
replacing the
read()
/write()
/lseek()
library calls with intermediate function calls to
ckd_read()
/ckd_write()
/ckd_lseek()
.
If the emulation file is a regular (uncompressed) file, then
these routines simply call the library routines; otherwise the
routines will call functions
cckd_read()
/cckd_write()
/cckd_lseek()
located in cckddasd.c
.
Note that cckd_read()
and cckd_write()
functions
do not perform file i/o; they merely cause data to be copied from/to
the current uncompressed track image buffer. A call to
cckd_lseek()
, however, if it causes
a track switch, will cause the current track image buffer to be
scheduled for compression and writing (if it has been written
to) and will cause the new compressed track image to be read
and uncompressed.
A regular or uncompressed emulation file is distinguished from a compressed emulation file by the eye-catcher in the device header at the beginning of the file. A regular file contains "CKD_P370" and a compressed file contains "CKD_C370".
The data areas required for a compressed CKD Dasd device are in
an extension pointed to by a field in the CKD DASD section
of the DEVBLK, cckd_ext
. If this field
is NULL
then the emulation file is a regular file.
By default, the following garbage collection parameters are set:
ratio | state | algorithm | size | iterations | interval |
---|---|---|---|---|---|
50%-100% | critical | combination | 256K | 8 | 2 |
25%-50% | severe | combination | 128K | 4 | 4 |
12.5%-25% | moderate | percolation | max_trk_sz | 4 | 8 |
6.25%-12.5% | light | percolation | max_trk_sz | 2 | 10 |
0%-6.25% | none | percolation | 32K | 1 | 20 |
The ratio (and state) is hard-coded in the program and is determined by continually dividing the file size by 2 until this number is less than the total free space. size indicates how far a free space is moved towards the end of the file in a single iteration; max_trk_sz is the maximum track size for the emulated device. iterations indicate how many times the algorithm is called within a particular interval. Note - the emulation file lock is released and reacquired between each iteration; this allows i/o operations to proceed while the garbage collecter is active. interval is the number of seconds the collector sleeps before starting over.
CKDDASD_DEVHDR
block. The eye-catcher (devid
) is slightly different
(CKD-C370 vs CKD_P370) to distinguish it from a
regular file. The next 512 bytes contain a compressed device
header or CCKDDASD_DEVHDR
block. This contains space
statistics, options, and garbage collection parameters. Next is
the primary lookup table or the L1TAB
. Each 4 byte
entry in the L1TAB
contains the offset of a secondary
lookup table (or L2TAB
) and represents 256 tracks.
The size of the L1TAB
is dependent on the number
of tracks of the emulated device.
CKDDASD_DEVHDR |
CCKDDASD_DEVHDR |
L1TAB |
Following the L1TAB
,
in no particular order, are L2TAB
s, compressed track
images, and free spaces.
L2TAB
s contain 256 8-byte
entries,and each are, consequently, 2048 bytes in length. Each entry
contains the offset, length, and size of a
compressed track image. length is the amount of space that
is actually occupied by the compressed track image. size is
the total amount of space occupied by the track image, including
imbedded free space (sometimes called fudge). size
must, then, always be greater than or equal to the length.
L2TAB
entry
offset 4 bytes |
length 2 bytes |
size 2 bytes |
A compressed track image contains 3 fields in the following order:
HA 5 bytes |
track image (compressed or uncompressed)length bytes | imbedded free space (size - length) bytes |
The HA contains 0CCHH
, that is, a byte of zeroes, 2 bytes indicating
the cylinder of the track, and 2 bytes indicating the head of the track
on the cylinder. Both CC
and HH
are stored in
big-endian byte order. The track number is computed by
trk_nbr = (CC * trks_per_cyl) + HH
HA
is always 0x00 (at least in emulated
CKD files), this byte as stored in the compressed CKD Dasd emulation file actually
indicates the compression algorithm used for the remainder of the track image
(0 = no compression, 1 = zlib compression, 2 = bzip2 compression)
and is set back to 0x00 after the track image is uncompressed.L2TAB
entries. By calculating the track number using the
formula above, the collector can read a single l2TAB
entry and
determine if the space is a track image (that is, the offset in the
L2TAB
entry for the track (as calculated by the formula above)
should match the current file offset for the space.
cckdcdsk.c
) can determine using the
HA if the current space is a compressed track image.
Free space contains a 4-byte offset to the next free space, a 4-byte length of the free space, and zero or more bytes of residual (ie unpredictable) data following.
offset 4 bytes |
length 4 bytes |
residual (length - 8) bytes |
The minimum length of a free space, then, is 8 bytes. Since free space is ordered by file offset and no two free spaces are adjacent, offset in the free space entry is always greater than the current free space offset + the current free space length, unless the offset is zero, which indicates the free space list is terminated.
0x00010203
;
stored in little-endian byte order, we would see "03020100
"; stored in big-endian
byte order, we would see "00010203
". The values in the compressed CKD Dasd emulation
file are stored in byte order of the host machine; a bit in the CCKDDASD_DEVHDR
indicates which order its values are stored. If a file is opened with the wrong
byte order, then the initialization routine will automatically reverse all the values
before continuing.
The program updates the compressed file in a sequence that allows recovery to occur. For example, when a track image is written to a new offset in the compressed CKD Dasd emulation file, the following sequence occurs:
1 | Space is obtained for the new track image |
state A | |
2 | New track image is written to the new offset |
state B | |
3 | Secondary lookup table is updated with the new offset |
state C | |
4 | Space for the old track image is released |
cckdcdsk.c
.
Compressed emulation is currently disabled for Windows32. I need to do some research to see if there is a Windows32 api for zlib.
The following steps should get you up and running:
makefile
. If you wish to add support for
bzip2,
uncomment the indicated lines in the makefile
.
make
make install
./ckd2cckd
    source-file     target-file./ckd2cckd -c 2
    source-file     target-filehercules.cnf
file to point to the
new compressed CKD Dasd emulation files.
Q. | What devices are supported ? | ||||||||
A. |
2311, 2314, 3330, 3340, 3350, 3375, 3380 and 3390.
However, I have only tested using 3390 devices.
| ||||||||
Q. | Is a 3390 model 9 supported ? | ||||||||
A. |
The short answer is "no". Long answer, "sort of".
A 3390-9 should compress to a file size less than
the 2G limit. However, the compressed dasd program
"hooks" into ckddasd.c by replacing
the lseek, read and write library calls with a call
to an intermediate function. The file offset parameter
passed to lseek is a 32-bit signed number. For a
compressed file, the cckd code treats this number as
unsigned (for SEEK_SET) and uses this number to
calculate the dasd track and offset. That is, for a
compressed file, the file offset maintained by
ckddasd.c is just a number that indicates
a track and the offset into the track. That means
that the largest offset is 4G-1, which is not a problem
for a 3390-3 but only references about half of a 3390-9.
It would be possible to modify ckddasd.c
to use long long when dealing with file
offsets, but I wanted to minimize changes to
ckddasd.c and this change seemed a
little too intrusive.
| ||||||||
Q. | When I start hercules, I get these messages showing all this free space in my compressed files. How do I get rid of that free space ? | ||||||||
A. |
Once the total amount of free space falls below 6% of
the total file size, the garbage collector is not very
aggressive about eliminating free space. To remove
all free space from the file, copy the compressed
file to a regular file using the
cckd2ckd utility
and then rebuild the compressed file by using the
ckd2cckd utility.
| ||||||||
Q. | How can I display the space statistics for a compressed file ? | ||||||||
A. |
The statistics are displayed when the compressed file
is opened. Currently, there is no supplied method to
display these statistics at any other time. However,
it shouldn't be too hard to write a shell script
(similar to dasdlist ) to display these
statistics. The statistics are contained in the
CCKDDASD_DEVHDR which is at offset 512
in the compressed file; the header is mapped in
hercules.h .
| ||||||||
Q. | What is a "null track" anyway ? | ||||||||
A. |
The term "null track" is just something I made up. It is
what is returned when a zero offset is found in either the
primary or secondary lookup table for the track. It contains
the folllowing fields:
| ||||||||
Q. | I want to try bzip2 but I'm getting compiler errors. What am I doing wrong ? | ||||||||
A. |
Probably bzip2 is not installed or is not installed
properly. You can obtain bzip2 from
here.
If bzip2 is installed, then you need to find the directory
where bzlib.h is installed and the
directory where libbz2.a is installed.
You can then add "-I bzlib.h-directory" to the
CFLAGS in the make file and add "-L libbz2.a-directory"
to the LFLAGS.
| ||||||||
Q. | Which is better, zlib or bzip2 ? | ||||||||
A. |
This is a religious question. I have no actual preference,
I just wanted to make a choice available.
| ||||||||
Q. | Can other compression programs be used ? | ||||||||
A. |
Yes. The program is architecturally structured so that other
compression algorithms can be added rather painlessly. This
will require, of course, an update to the source.
| ||||||||
Q. | Can this compression scheme be used for FBA devices too ? | ||||||||
A. |
I have not worked with FBA devices for over 20 years.
However, it seems to me that a similar program for FBA
devices should be simpler than this program for CKD devices
(none of those count/key/data fields mucking everything
up). Since an FBA block is 512 bytes, it might not
be efficient to have each block compressed individually;
it might be better to compress blocks in 32K or 64K chunks.
If someone asks very nicely, I may consider looking into it;-)
|
cckddump.hla
has a number of potential enhancements
that could be added.
Actually, I have found a couple of bugs:
cckddump.hla
) is an
os/390 assembler language program that creates a compressed CKD Dasd emulation
file from a real DASD volume. This program must be APF-authorized since
it modifies the DEB to be able to read all tracks from the real device.
The program executes 16 or so instructions while in supervisor state/key 0;
otherwise the program runs entirely in problem state/key 8.
It is not the prettiest assembler language program I've ever written, and
there are plenty of enhancements that I originally intended to put into the
program that I haven't yet; once I got the program working good enough, I
spent the rest of my time writing the fun stuff, the Hercules part.
The real CKD Dasd volume that is dumped must be an ECKD device (ie support 'Locate Record' and 'Read Track' CCWs); this shouldn't be a problem because I don't think any os/390 release supports a non-ECKD device. The output file must be a DASD file; its characteristics are LRECL=4096, BLKSIZE=4096, RECFM=F. The program only dumps allocated tracks (plus track 0) and only dumps tracks up to DS1LSTAR for DSORG=PS and DSORG=PO files. The program will call zlib to compress the track images if the zlib routines have been linked with the program; however, I don't think the program will be advantageous if it can't call zlib.
#endif
, add the following lines:# pragma map(compress,"COMPRESS") # pragma map(compress2,"COMPRES2") # pragma map(uncompress,"UNCOMPRE")
// JOB //CC JCLLIB ORDER=(CBC.SCBCPRC) //* //ADLER32 EXEC EDCC,INFILE='prefix.ZLIB.C(ADLER32)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(ADLER32),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //COMPRESS EXEC EDCC,INFILE='prefix.ZLIB.C(COMPRESS)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(COMPRESS),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //CRC32 EXEC EDCC,INFILE='prefix.ZLIB.C(CRC32)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(CRC32),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //DEFLATE EXEC EDCC,INFILE='prefix.ZLIB.C(DEFLATE)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(DEFLATE),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //EXAMPLE EXEC EDCC,INFILE='prefix.ZLIB.C(EXAMPLE)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(EXAMPLE),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //GZIO EXEC EDCC,INFILE='prefix.ZLIB.C(GZIO)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(GZIO),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //INFBLOCK EXEC EDCC,INFILE='prefix.ZLIB.C(INFBLOCK)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(INFBLOCK),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //INFCODES EXEC EDCC,INFILE='prefix.ZLIB.C(INFCODES)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(INFCODES),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //INFFAST EXEC EDCC,INFILE='prefix.ZLIB.C(INFFAST)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(INFFAST),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //INFLATE EXEC EDCC,INFILE='prefix.ZLIB.C(INFLATE)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(INFLATE),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //INFTREES EXEC EDCC,INFILE='prefix.ZLIB.C(INFTREES)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(INFTREES),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //INFUTIL EXEC EDCC,INFILE='prefix.ZLIB.C(INFUTIL)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(INFUTIL),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //TREES EXEC EDCC,INFILE='prefix.ZLIB.C(TREES)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(TREES),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //UNCOMPR EXEC EDCC,INFILE='prefix.ZLIB.C(UNCOMPR)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(UNCOMPR),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H //* //ZUTIL EXEC EDCC,INFILE='prefix.ZLIB.C(ZUTIL)', // CPARM='RENT,LIST,SOURCE,LONGNAME,AGG,OPT(2)', // OUTFILE='prefix.ZLIB.OBJ(ZUTIL),DISP=SHR' //USERLIB DD DISP=SHR,DSN=prefix.ZLIB.H
// JOB //PLKED EXEC PGM=EDCPRLK //SYSMSGS DD DISP=SHR,DSN=CEE.SCEEMSGP(EDCPMSGE) //SYSLIB DD DISP=SHR,DSN=prefix.ZLIB.OBJ // DD DISP=SHR,DSN=CEE.SCEEOBJ //SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //SYSIN DD DISP=SHR,DSN=prefix.ZLIB.OBJ(ADLER32) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(COMPRESS) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(CRC32) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(DEFLATE) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(GZIO) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(INFBLOCK) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(INFCODES) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(INFFAST) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(INFLATE) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(INFTREES) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(INFUTIL) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(TREES) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(UNCOMPR) // DD DISP=SHR,DSN=prefix.ZLIB.OBJ(ZUTIL) //SYSMOD DD DISP=SHR,DSN=prefix.ZLIB.OBJ(ZLIB)
// JOB //C EXEC PGM=ASMA90 //SYSLIB DD DISP=SHR,DSN=SYS1.MACLIB // DD DISP=SHR,DSN=SYS1.MODGEN //SYSPRINT DD SYSOUT=* //SYSIN DD DISP=SHR,DSN=prefix.cckddump.source(CCKDDUMP) //SYSUT1 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) //SYSLIN DD DISP=(,PASS),DSN=&&OBJ,UNIT=SYSDA,SPACE=(CYL,(1,1)) // LRECL=80,BLKSIZE=3200,RECFM=FB //L EXEC PGM=HEWL //SYSPRINT DD SYSOUT=* //SYSUT1 DD UNIT=SYSDA,SPACE=(CYL,(1,1)) //SYSLIB DD DISP=SHR,DSN=CEE.SCEESPC // DD DISP=SHR,DSN=CEE.SCEELKED //ZLIB DD DISP=SHR,DSN=prefix.ZLIB.OBJ //SYSLMOD DD DISP=SHR,DSN=apfauth.load //SYSLIN DD DISP=(OLD,DELETE),DSN=&&OBJ // DD * INCLUDE ZLIB(ZLIB) INCLUDE SYSLIB(EDCXHOTL) INCLUDE SYSLIB(EDCXHOTU) INCLUDE SYSLIB(EDCXHOTT) ORDER MAIN(P) ENTRY MAIN SETCODE AC(1) NAME CCKDDUMP(R)
// JOB //S1 EXEC PGM=CCKDDUMP //STEPLIB DD DISP=SHR,DSN=apfauth.load //SYSPRINT DD SYSOUT=*,RECFM=VB,LRECL=255,BLKSIZE=4096 //SYSUT1 DD DISP=OLD,UNIT=SYSDA,VOL=SER=volser //SYSUT2 DD DISP=(,CATLG),DSN=prefix.volser.cckd, // UNIT=SYSDA,SPACE=(TRK,(7500,1500),RLSE), // LRECL=4096,BLKSIZE=4096,RECFM=F
greg smith
Last updated 29 October 2000