OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: Todd Vierling (tv_at_pobox.com)
Date: Thu Oct 10 2002 - 15:35:07 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    On Sun, 6 Oct 2002, Hubert Feyrer wrote:

    : Uhu. I understand writing esp. in the middle of files is interesting.
    : But for a start, a read-only filesystem that was "pre-compressed" would be
    : enough - what I have in mind is a CD that has more than 650MB.

    Warning: compression internals discussion ahead. Watch for falling bits.

    The biggest problem with any on-the-fly mechanism, even for reading, is
    seeking *backwards*. Most compression algorithms are highly directional
    (towards the end of the stream). Without some mechanism for handling
    backwards motion, any such seek would require a rewind() and restart of
    decompression from the beginning. That would be really ugly and obnoxiously
    slow for any file > 256KB or so.

    Now, this could be accomplished with gzip/zlib compression, provided you
    have some layer to intercept lseek(), through crafty finagling of zlib
    internals (by stashing the decompressor's state machine and Lempel-Ziv
    window periodically as "smart restore points"). That's memory intensive
    depending on the compression level (-z9 is a 32KB LZ window, I believe), and
    how your backoff algorithm chooses when to throw away saved state.

    The other alternative, which has been used in some compression schemes, is
    to partition the uncompressed blocks so that there's a "clean" state machine
    at known points throughout the stream. Typically this consists of
    independently compressed blocks of a fixed uncompressed size, with some sort
    of periodic index of the blocks (such as a N-block table between each N data
    blocks, or a pointer to the next block at the start of each, or similar).
    Obviously, the compression achieved here is far less than you'd get from a
    complete-stream gzip pass, but it doesn't have nearly the requirements of
    changing horses^W^Wrestarting gzip decompression in the middle of a stream.

    The former case is a candidate for a LD_PRELOAD overlay of libc, and would
    never belong in kernel space at all, though it has the highest degree of
    compatibility (given that it would handle plain gzipped files). The latter
    case lends itself much better to a layered filesystem, and could be
    kernelized depending on the memory requirements of the compression scheme
    used (LZO, for instance, is very low on memory consumption).

    Oh, I haven't even addressed writing here, yet. All this is read-only. 8-)

    -- 
    -- Todd Vierling <tvpobox.com>