This is not necessarily the current version of this TIP.
| TIP: | 234 |
| Title: | Add Support For Zlib Compression |
| Version: | $Revision: 1.5 $ |
| Author: | Pascal Scheffers <pascal at scheffers dot net> |
| State: | Draft |
| Type: | Project |
| Tcl-Version: | 8.5 |
| Vote: | Pending |
| Created: | Wednesday, 08 December 2004 |
| Keywords: | Tcl, zip, gzip, deflate |
This TIP proposes a new core package with commands to handle compression and decompression using the Zlib compression library.
The Zlib compression algorithm is a widely used method for compression of files and streams. It is the algorithm used for .gz and (most) .zip files, as well as one of the standard compression algorithms in the HTTP protocol specifications.
Including support for zlib compression in the core would enable the use of compressed VFS files, pure Tcl implementations of gzip and zip utilities and the use of compression in various network protocols.
A compressed VFS would be of great benefit to the new clock implementation TIP #173, which brings a long a large number of small files which contain the timezone data. Although this would also require support for a VFS file format in the core. One possible candidate would be the Tcl Read Only fs (trofs), or perhaps a zip file VFS (only a tclvfs zip handler exists at the time of writing).
The specification and implementation for the package and command originally came from tclkit. This was wrapped in a TEA compliant package as a stand alone package. The reference implementation is a full rewrite, retaining the public API of the tclkit zlib command.
The gzip support and C Language API are not part of the original zlib extension. The streaming decompression is functionaly equivalent to tclkit zlib sinflate, but uses a different command names. Streaming compression is new.
The package version for this release is 2.0 because the private API from the original command has been removed. Alternatively, the package version can be 1.2 indicating new features were added and no existing public APIs were changed.
The package utilizes zlib/libz from the gzip project [1]. The license of this project/library is compatible with the Tcl license, it also compiles on most, if not all, platforms where Tcl compiles.
For ease of use, the core distribution should probably include a copy of libz under tcl/compat, with configuration options available so that people building Tcl on platforms that already come with an installed copy of the zlib library can use that instead.
For large files (where large is a relative value, of course), streaming compression and decompression is required. This is implemented by using temporary commands, which can be fed small amounts of data, yielding small chunks of (de)compressed data.
There are three compressed formats supported by this command:
the output contains raw deflate data, with no zlib/gzip headers or trailers and no checksum value.
the output contains data in zlib format, with zlib header and trailer using an Adler-32 checksum
the output contains data in gzip format, with empty gzip filename, no extra data, no comment, no modification time (set to zero), no header crc and the operating system will be set to 255 (unknown).
Data is treated as binary, meaning that all input and output is going to be converted and treated as byte arrays in Tcl.
zlib compress data ?level?
Returns raw deflate data, at an optional compression level. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).
zlib decompress compressedData ?bufferSize?
Decompresses raw deflate as obtained from zlib compress. The optional buffer size can be used to specify the size of the decompressed data in bytes if it is known before decompression. Otherwise, the buffer starts out at 16Kb and is doubled until the decompressed data fits.
zlib deflate data ?level?
Returns zlib-compressed data, at an optional compression level. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).
zlib inflate deflatedData ?bufferSize?
Decompresses the zlib-compressed data as obtained from zlib deflate. The optional buffer size can be used to specify the size of the decompressed data in bytes if it is known before decompression. Otherwise, the buffer starts out at 16Kb and is doubled until the decompressed data fits.
zlib gzip data ?level?
Returns gzip-compressed data, at an optional compression level. The compression level must be between 0 and 9: 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).
The gzip header will have no file name, no extra data, no comment, no modification time (set to zero), no header crc, and the operating system will be set to 255 (unknown).
zlib gunzip gzipData ?bufferSize?
Decompresses the gzip data as obtained from zlib deflate or any gzip file. The optional buffer size can be used to specify the size of the decompressed data in bytes if it is known before decompression. Otherwise, the buffer starts out at 16Kb and is doubled until the decompressed data fits.
Only the uncompressed gzip data is available, not the original filename, extra data, modification time, header crc or the operating system from the gzip header.
Note that compress/decompress, deflate/inflate and gzip/gunzip must be used in pairs.
Streaming is handled by a worker command which is created by calling the zlib command, stream subcommand:
zlib stream deflate/inflate/compress/decompress/gzip/gunzip ?level?
Returns a command name which will perform the requested operation. Compression level is only used when compressing data.
The stream worker command is used to actually compress and decompress in smaller chunks than the input and/or output.
$stream put ?-flush||-fullflush||-finalize? data
Adds data to be (de)compression. -flush, -fullflush and -finalize are mutually exclusive flags to indicate the desired flushing of the stream. -finalize is used to indicate the last block of data while compressing. After -finalize, no more data can be added to be compressed. For decompression, after -finalize you can still add more data for decompression.
$stream flush
The next get operation will try to get the most data from the stream. While compressing, calling flush often will degrade the compression ratio as it forces all remaining input to be output immediately.
$stream fullflush
Like flush, the next get operation will try to get the most data from the stream. Additionally, the compressor will output extra data to enable recovery from this point in the datastream.
$stream finalize
For compression, this signals the end of the input data, no more data can be added to the stream after 'finalize'. For decompression, this functions the same as flush
$stream get ?count?
Gets (de)compressed data from the stream. The optional count parameter specifies the maximum number of bytes to read from the stream. Especially for decompression, it is strongly recommended to specify a count.
$stream eof
Returns 0 while the end of the compressed stream has not been reached. Returns 1 when the end of compressed stream was reached or the last data has been put to the stream and -finalize was specified, or $stream finalize has been called while compressing data.
When [$stream eof] is true, and [$stream get ?count?] returns an empty string, you will have obtained all data from the stream.
$stream adler32
Returns the adler32 checksum of the uncompressed data. For compressing streams, this value is updated on each $stream put. For decompressing streams, the value will only match the adler32 of the decompressed string after the last $stream get returned an empty string.
$stream close
Deletes the $stream worker command and all storage associated with it. Discards any remaining input and output. After this command, the $stream command cannot be used anymore.
zlib crc32 data ?startValue?
Calculates a standard CRC-32 checksum, with an optional start value for incremental calculations.
zlib adler32 data ?startValue?
Calculates a quick Adler-32 checksum, with an optional start value for incremental calculations.
Tcl_Obj * Zlib_Deflate(interp, format, data, level)
Tcl_Obj * Zlib_Inflate(interp, format, data, bufferSize)
unsigned int Zlib_CRC32(crc, bytes, length)
unsigned int Zlib_Adler32(adler, bytes, length)
int Zlib_StreamInit(interp, mode, format, level, zshandlePtr)
Tcl_Obj * Zlib_StreamGetCommandName(zshandle)
int Zlib_StreamEof(zshandle)
int Zlib_StreamClose(zshandle)
int Zlib_StreamAdler32(zshandle)
int Zlib_StreamPut(zshandle, data, flush)
int Zlib_StreamGet(zshandle, data, count)
Optional interpreter to use for error reporting.
Compressed data format. For compression and decompression either ZLIB_FORMAT_RAW, ZLIB_FORMAT_ZLIB or ZLIB_FORMAT_GZIP. A fourth value, ZLIB_FORMAT_AUTO is available for decompression, which can be used when decompressing either GZIP or ZLIB formatted data. Decompression of RAW data requires specifying the format as RAW.
Compress or decompress mode. Either ZLIB_INFLATE or ZLIB_DEFLATE.
The input data for compression or decompression
The compression level. Must be between 0 and 9; 1 gives best speed, 9 gives best compression, 0 gives no compression at all (the input data is simply copied a block at a time).
The compressed input data for decompression.
input bytes for calculation of checksums.
number of bytes to calculate the checksum on.
start value value for the crc-32 calculation
start value value for the adler-32 calculation
The buffer size can be used to specify the size of the decompressed data in bytes if it is known before decompression. If buffersize is 0, the buffer starts out at 16Kb and is doubled until the decompressed data fits.
Pointer to an integer to receive the handle to the stream. All subsequent Zlib_Stream*() calls require this handle.
Handle for the stream.
Flush parameter. ZLIB_NO_FLUSH, ZLIB_FLUSH, ZLIB_FULLFLUSH or ZLIB_FINALIZE
Maximum number of bytes to be written to the data Tcl_Obj.
Zlib_Deflate()
Depending on the type flag, this function returns a Tcl_Obj* at refcount 0 with the compressed data in either raw deflate format, zlib format or gzip format.
Zlib_Inflate()
This function returns a Tcl_Obj* at refcount 0 with the decompressed data. The buffersize argument may be used as a hint if the decompressed size is know before decompression.
Zlib_CRC32()
This function returns the standard CRC-32 calculation. The startvalue should contain the previously returned value for streaming calculations, or zero for the first block.
Zlib_Adler32()
This function returns a quick Adler-32 calculation. The startvalue should contain the previously returned value for streaming calculations, or zero for the first block.
Zlib_StreamInit()
This function initializes the internal state for compression or decompression and creates the Tcl worker command for use at the script level. Returns TCL_OK when initialization was succesful.
Zlib_StreamGetCommandName()
This function returns a Tcl_Obj* which contains the fully qualified stream worker command name associated with this stream.
Zlib_StreamEof()
This function returns 0 or 1 depending on the state of the (de)compressor. For decompression, eof is reached when the entire compressed stream has been decompressed. For compression, eof is reached when the stream has been flushed with ZLIB_FINALIZE.
Zlib_StreamClose()
This function frees up all memory associated with this stream, deletes the Tcl worker command and discards all remaining input and output data.
Zlib_StreamAdler32()
This function returns the Adler-32 checksum of the uncompressed data up to this point. For decompressing streams, the checksum will only match the checksum of uncompressed data when Zlib_StreamGet returns an empty string.
Zlib_StreamPut()
This function is used to add data to the stream. For compression, the final block of data, which may be an empty string, must be indicated with ZLIB_FINALIZE as the flush parameter.
Zlib_StreamGet()
This function is used to get the data from the stream. A count parameter of -1 will return all available data.
Because zlib is to be implemented as a core package, applications will need to do a [package require zlib] or the C equivalent.
These commands only work on data already available to a safe interpreter and are therefore safe make available in the safe interpreter.
The reference implementation is available at the subversion repository http://svn.scheffers.net/zlib Alternatively, a recent snapshot may be obtained from http://svn.scheffers.net/zlib.tar.gz This reference implementation includes a copy of zlib-1.2.1 from h_ttp://www.gzip.org
[ Insert here please ]
This document has been placed in the public domain.
This is not necessarily the current version of this TIP.