LZMA - better than bzip2

Everyone sometimes needs to compress something: doing backup, sending files over Internet etc. Most of us uses gzip or bzip2. It's known than bzip2 has a bit better compression ratio but it's much slower. I'm one of that people who only used those two programs, often with conjunction with tar using

tar -czf
or
tar -cjf
and wasn't think I need something better. But one day I wandered if exists something else, some alternative and found LZMA.

Brief description


The description is impressive, in short:
  • Better compression ratio (with best compression level when gzip achieves 38%, bzip2 34%, LZMA has 25%).
  • The compression ratio gain is seen mainly on binary files.
  • Decompress time is much faster (3-4 times) than bzip2.
  • The algorithm allows to be executed in parallel (but the tool I'll describe here is one-thread).

There are also disadvantages:

  • Compression (excluding lower levels) is much slower than bzip2.
  • Memory requirements are much bigger during compression than bzip2.

The tool


There are few tools that can be used to compress LZMA (like P7ZIP archiver), but I chose [url=http://tukaani.org/lzma/[/url]LZMA Utils[/url] because it has a command line compatible with gzip and bzip2, so replacing them with LZMA is simple. The command is called lzma and produces .lzma files by default.

Comparison


First thing I used LZMA for was compressing my mail archive. The spam file (mail in mbox format) I chose is 528MB big and I will use maximum compression ratio. During compression the lzma process was 370MB big, that's much :) bzip2 was below 7MB. It took almost 15 minutes to compress the file by lzma and less than 4 minutes by bzip2. Compression ration was very similar: output file is 373MB for bzip2 and 370MB for lzma. Decompression time is 1m12s for lzma and 1m48s for bzip2.

Not very impressive, but compressing text files is easy. Everyone who tried to implement or invent a simple compression algorithm can achieve good results with text files, so what about binary data? I've created a tar archive from /usr/bin directory on my laptop. It's 308MB big. Bzip2 file is 127MB big (59% ratio) and LZMA is 83MB (73% ratio). This is a real difference!

Integration with software


Since my mail archive is now lzma compressed because of faster access time I has a need to teach mutt to open such mailboxes. This was simple, just copy & paste support for gzip archives into ~/.muttrc because lzma command line is the same:

open-hook       \\.lzma$ "lzma -cd '%f' > '%t'"
close-hook       \\.lzma$ "lzma -c '%t' > '%f'"
append-hook   \\.lzma$ "lzma -c '%t' >> '%f'"

Fresh versions of tar archiver (from 1.20 version) also have --lzma switch.

Summary


LZMA is not perfect but definitely a good tool to make your backup as I do especially when you want to pack as much as you can on a DVD-R. Good compression ratio and better than bzip2 speed caused that LZMA starts to be used in packet managers like rpm or deb. For much faster progessing gzip is still the best solution.

Comments

lrzip

Also check out lrzip. It has even better compression for large files:

http://ck.kolivas.org/apps/lrzip/

FreeArc might be an even better choice

Good:
  • Way faster than LZMA
  • Similar in speed to Bzip2
  • Compresses better than LZMA
Bad:
  • No performance gain on multicpu machines
  • Does not work as a filter
  • Not a drop-in replacement for bzip2/lzma/gzip. But similar to zip
Download: http://freearc.org/Download.aspx