∫ʀeƍueהτʟy Asκeδ Quesτiʘהs
A note to the file data sets used here:
* red colored testsets on (this page!) contain already compressed files
Main Chart
3D Game a free open-sourced 3D Racing game for windows called "TORCS"; downloadable at sourceforge
Remark: None of the archivers recognized the 75 MB .RGB images as compressible bitmap-like raster images.
       
Bitmaps 29 bitmaps from Sachin Garg's Public Testset; 15 greyscale PGM (8 Bit); 14 true color PPM (8 Bit RGB)
  http://www.imagecompression.info/test_images/
[New 2007]
Artificial.pgm 6.291.473 bytes
Big Building.pgm 39.053.009 bytes
Big Tree.pgm 27.700.417 bytes
Bridge.pgm 11.130.718 bytes
Cathedral.pgm 6.016.017 bytes
Deer.pgm 10.677.580 bytes
Fireworks.pgm 7.375.889 bytes
Flower_Foveon.pgm 3.429.233 bytes
HDR.pgm 6.291.473 bytes
Leaves_ISO_200.pgm 6.016.017 bytes
Leaves_ISO_1600.pgm 6.016.017 bytes
Nightshot_ISO_100.pgm 7.375.889 bytes
Nightshot_ISO_1600.pgm 7.375.889 bytes
Spider_Web.pgm 12.121.105 bytes
Zone_Plate.pgm 6.000.017 bytes
Artificial.ppm 18.874.385 bytes
Big Building..ppm 117.158.993 bytes
Big Tree..ppm 83.101.217 bytes
Bridge..ppm 33.392.120 bytes
Cathedral..ppm 18.048.017 bytes
Deer..ppm 32.032.706 bytes
Fireworks..ppm 22.127.633 bytes
Flower_Foveon..ppm 10.287.665 bytes
HDR..ppm 18.874.385 bytes
Leaves_ISO_200..ppm 18.048.017 bytes
Leaves_ISO_1600..ppm 18.048.017 bytes
Nightshot_ISO_100..ppm 22.127.633 bytes
Nightshot_ISO_1600..ppm 22.127.633 bytes
Spider_Web..ppm 36.363.281 bytes
TOTAL 633.482.445 bytes
PhotoJazz 2.0 263.249.866
JPEG-LS (R-=G B-=G) 272.296.987
PackPNM 0.8a 274.816.911
J2K Lossless 278.812.478
BMF / BMFG 2.0 ** 289.265.284 converted .PPM to .BMP (uncompressed)
ERI ERI32 5.1 fre (2002) 305.453.888
BMF / BMFG 1.10 -Q9 -S ** 309.759.880 converted .PPM to .BMP (uncompressed)
HDP HD Photo 310.841.401
PNG level 9 328.328.256
JPEG Lossless 345.105.024
CoBALP fast * crashes crashes after confirming save as
       
CD image a cd image of an PC Game from 1993 that is free now (some videos, vector-based graphics) and without Copy Protections.
* This image does not affect copyright laws as this CD is from March of 1993
and contains an old and free classical game only. I am not nor will I ever
be a person that wants to hurt or ignore copyright laws. This image
was taken only for means of compression benchmarking.
   
CrossPlatform a well-chosen freeware collection of compiled binaries (executables) for different platforms:
Windows XP 32 Bit 21%
OpenSUSE 10.2 x86 64 21%
Apple MacOS X 16%
BeOS System 11%
Vista x86 64 Bit 10%
SymbianOS 7.x, 9.x 8%
Solaris 10 6%
PocketPC/Win CE 4%
ZETA OS 3%
PalmOS 1%
       
D.N.A. Human Genome Project in text format, Chromosome 7 of 24 in the chromosome sequence
just a sample of the 400 GB data mapped by National Center for Biotechnology Information
       
       
Drivers XP the \Windows\Driver Cache\i386\driver.cab extracted (taken from WinXP Home + SP2), 4653 files; .exe, .dll, .icm, .ppd …
       
Encyclopedia dumps of the modern free XML online enycylopedia www.wikipedia.org
there are 10 dumps in the world's most spoken languages:
arwiki-20090209-pages-articles.xml 100.000.000 bytes in Arabic
dewiki-20090311-pages-articles.xml 100.000.000 bytes in German
enwiki-20090306-pages-articles.xml 100.000.000 bytes in English
eswiki-20090124-pages-articles.xml 100.000.000 bytes in Spanish
frwiki-20090224-pages-articles.xml 100.000.000 bytes in French
hiwiki-20090201-pages-articles.xml 100.000.000 bytes in Hindi
ptwiki-20090128-pages-articles.xml 100.000.000 bytes in Portguese
ruwiki-20081228-pages-articles.xml 100.000.000 bytes in Russian
trwiki-20090207-pages-articles.xml 100.000.000 bytes in Turkish
zhwiki-20090116-pages-articles.xml 100.000.000 bytes in Chinese (Mandarin)
       
FreeDB 2002 a CDDB-like and text-oriented database of nearly all released audio cd's (23th of october in 2002), 2296 files
blues' folder: 29.898 entries
classical folder: 65.413 entries
country folder: 19.051 entries
data folder: 4.128 entries
folk folder: 42.301 entries
jazz folder: 49.139 entries
misc folder: 227.094 entries
newage folder: 26.369 entries
reggae folder: 8.995 entries
rock folder: 245.832 entries
soundtrack folder: 23.165 entries
all folders: 741.385 entries
       
Gutenberg a random selection of 409 gutenberg.org ebooks; those documents are in plain text and have both different languages and
character set encodings, which make preprocessing difficult and exceeds most compression word-dictionaries (.dic)
       
Installer Package a selection of about 25% InnoSetup/Nullsoft, 25% InstallShield, 25% Windows Installer/MSI and 25% WISE Installer/GZIP/ZIP SFX setups
this selection simulates a typical software installation collection which can be originally downloaded on the authors websites.
those data is backed-up often in archives and we want to see which archiver reduces them the most (while being lossless)
INNO everestultimate500.exe 9.752.064 bytes
INNO petst_x64.exe 12.627.856 bytes
INNO stellarium-0.10.2.exe 42.911.720 bytes
INNO XnView-win-full-de.exe 10.442.205 bytes
IS 182.08_geforce_winvista_64bit_international_whql.exe 136.040.624 bytes
IS ICQ 6.5.exe 14.208.072 bytes
NSIS gmx_multimessenger.exe 12.590.864 bytes
NSIS kis8.0.0.506de.exe 43.120.232 bytes
NSIS vlc-0.9.8a-win32.exe 16.320.472 bytes
WIN Ad-Aware 2008.exe 19.153.264 bytes
WIN TU2009TrialDE.exe 17.361.664 bytes
MSI POV-Ray for Windows v3.7 beta 31.exe 12.398.080 bytes
MSI QuickTime.msi 27.953.664 bytes
MSI thebat_pro_4-1-11.msi 16.440.832 bytes
MSI UpdateStar_GER.msi 4.683.264 bytes
MSI Virtual_PC_2007_Install.msi 28.158.976 bytes
MSI Windows Live Messenger 2009.msi 24.961.024 bytes
GZIP LINUX ati-driver-installer-9.2-x86.x86_64.run 83.286.827 bytes
WISE copernicagentbasic.exe 3.546.360 bytes
WISE WindowBlinds6_public.exe 21.975.664 bytes
WISE funpix_maker_24mb_d_en.exe 24.985.888 bytes
ZIP Vista Gadgets (8 files) 2.593.515 bytes
ZIP XnView-win-full.zip 16.613.177 bytes
ZIP SFX pdfmachine1218de.exe 7.012.744 bytes
TOTAL 609.139.052 bytes
       
Mobile 877 files; designed to represent common user data on digital cameras, smart phones (those with windows mobile or symbian OS),
[New 2008] MP3Players, USB Flash Drives..
It contains those file formats: AAC, AC3, AMR, GIF, JAR, JPG [EXIF], MP3, MP4, MOV [MotionJPEG], MPG, PDF, PNG, SIS, SWF
All those files have a compressed nature and won't compress well unless special lossless recompression is applied.
So this test set indicates an archiver's skill to serve as lossless backup solution for data from mobile devices.
Backing up data from mobile devices has become more important than personal computer data backup, because not all
devices are safe, can be stolen or damaged..
MOV / MPG 21%
MP3 19%
AAC 14%
JPG [EXIF] 13%
3GP & MP4 13%
GIF 6%
JAR & SIS 6%
PDF & SWF 3%
AC3 2%
AMR 2%
       
Modules 94 amiga sound files of the 80's filetypes: .mod, .s3m, .xm; those files had been the music standard decades ago;
and 24 sound modules from nowadays trackers (Renoise, MED Soundstudio, Skale Tracker, MO3, Unreal)
       
Nokia 12.632 monochrome operator logos and 2.926 monotone ringtones for Nokia LogoManager; some 2 color bitmaps included
       
Office files a collection of 1.081 files in .doc, .xls, .htm, .pdf, .mp3, .eml/.msg, .log, .mht, .txt, .log and .jpg formats;
since 2007 it also includes precompressed files like .wmv, .wma (DRM protected), .ogg, .mp4, .avi(MPEG4), j2k, .pspimage,
.chm, .exe (UPX-compressed), .flv, .jar (Java applets), .mpeg, .docx, .xlsx, .swf, .sxf, .pps/.ppt, .psy, .pcx, .tif, .tga, .gif (and
.gif animations), c4d (3D files from CineBench 9.5), .dic (PAQ8 dictionaries).. Some hints: Some raster images included here
contain identical data: a bitmap was stored in compressed .tif, .pcx, .tga.. format. Microsoft's .mht (Web Archive) format contains
dozens of JPEGs, but they are MIME encoded in every .mht file. This testset also contains some visual style files (such as
WindowBlinds format), but those images are inside of renamed .zip archives. This testset has the highest difficulty level !!!
       
Savegames 1.255 savegame files of 90's games (XCOM 1+2+3, Keen, Nightmare 3D, Command & Conquer 2, Crystal Caves, Raptor..)
including savegames of recent games (Unreal, AOE2, DN3D, Grim Fandango, Half-Life, Heretic 2, Hexen 2, Splinters Cell..)
       
Sourcecodes a mix of C++, C, Pascal, Java and Basic sourcecodes of 36 Programs icluding OpenSource (sourceforge.net) projects
like Lazarus, Gimp, 7-Zip, Stellarium; also contains 9% PNG images. 11.650 files in 681 directories
       
Wavesounds contains 16 files in uncompressed PCM .WAV format (16 Bit @ 44.1 kHz stereo)
[New 2007] all files are created using Poikosoft's Easy CD-DA Extractor 11.5.3.1 and do not contain any tags
4:52 (292sec) 51.614.684 ABBA • The Winner Takes It All (1980)
2:59 (179sec) 31.620.332 Ben E. King • Stand by Me (1961)
2:40 (160sec) 28.306.988 Desmond Dekker • You Can Get It If You Really Want (1970)
3:55 (235sec) 41.477.564 Enya • Marble Halls (1997)
3:41 (221sec) 39.034.412 Hans Zimmer • Mumm Theme (Commercial) (1989)
4:10 (250sec) 44.165.900 Harajuku • Phantom Of The Opera (Techno Remix) (1994)
3:33 (213sec) 37.686.140 J.S. Bach • Air, Suite No. 3 in D (1974)
3:32 (212sec) 37.486.124 Jan Hammer • Crockett's Theme (1991)
7:52 (472sec) 83.385.500 Kenny G • Auld Lang Syne (Millennium Mix)
4:20 (260sec) 45.899.324 Laut Sprecher • Herzschlag (2000)
5:56 (356sec) 62.944.268 Leonard Bernstein • One Hand, One Heart
4:18 (258sec) 45.640.604 Queen • I Want to Break Free (1984)
3:48 (228sec) 40.071.068 Roy Orbison • I Drove All Night 1992)
3:45 (225sec) 39.819.404 Sash! • Adelante (Original 7'') (1999)
3:36 (216sec) 38.246.444 Shakira • Pure Intuition
3:19 (199sec) 35.154.476 Traveling Wilburys • Handle With Care (1988)
702.553.232 TOTAL
OFR 4.600 -max -exp. 362.098.239
LA 0.4b high 380.885.483
TAK 1.0.3 -p5m 388.164.364
Monkey's Audio 4 (insane) 389.592.620
TAK 1.1.1 -p4m 392.984.270
WavPack 4.5 -hx6 405.005.524
FLAC 1.2.0 option 8 418.609.157
  TTA 3.4.0 419.011.428  
Audio Testsets
WAV [CD Quality] a 16 Bit stereo 44.1 kHz techno song
WAV [Mono] a 16 Bit mono 44.1 kHz record of: Laurel & Hardy - Way Down South
WAV [Suround 5.1] a song with 5 channels in 48 kHz @ 24 Bit
WAV [IEEE Float] a 32 bit stereo 44.1 kHz song
WAV [SACD] a 24 bit stereo 192 kHz song
WAV [AudioCD] an image of the Best Love Classics Vol. 4 from 1995
Executable Testsets
EXE [MS-DOS] a file scanner from 1996
EXE [PE32] a file scanner from 2002
EXE [.NET] a file scanner from 2005
EXE [PE64] a benchmark program from 2005
EXE [LINUX ELF] the linux variant of UPX
EXE [ARM] a pocket PC music player
Image (BMP) Testsets
BMP [OCR] a true color scanned page of a children's newspaper
BMP [Greyscale] a greyscale picture of my ancestors
BMP [BiLevel FAX] a 2 color profit & loss analysis
BMP [DiCOM] a greyscale medical image
BMP [Panorama] a true color image of Rome
BMP [Landscape] a true color image of Sydney
Special Treatment Testsets
JPEG 90% Quality, Scanned Newspaper
MPEG Videoclip of a german female musician
MP3 Techno styled song
PDF a medical encyclopedia
CHM Paintshop Pro 8 Trial Manual
ZIP sound & video codec pack
GIF image of a tiger-colored cat
PNG image of a tiger-colored cat
TIFF image of my parents house
SIS a game for a SymbianOS cellphone
SWF a collection of 8 flash 7.x games
InnoSetup a software installation package
InstallShield CAB a software installation package
InstallShield MSI a software installation package
NSIS a software installation package
MS Compress a software installation package
WISE a software installation package
AVI an uncompressed movie
WMV a private webcam video
RM a private webcam video
DIVX a private webcam video
MOV a private webcam video
MP4 a private webcam video
Language Testsets
AFRIKAANS www.unboundbible.org New Testament from 1953
ALBANIAN www.unboundbible.org New Testament
ARABIC www.unboundbible.org New Testament by Smith & Van Dyke
CHINESE www.unboundbible.org New Testament NCV (Traditional)
CROATIAN www.unboundbible.org New Testament
CZECH www.unboundbible.org New Testament BKR
DANISH www.unboundbible.org New Testament
DUTCH www.unboundbible.org New Testament Staten Vertaling
ENGLISH www.unboundbible.org New Testament World English Bible
ESPERANTO www.unboundbible.org New Testament
FINNISH www.unboundbible.org New Testament from 1776
FRENCH www.unboundbible.org New Testament Darby 1991
GERMAN www.unboundbible.org New Testament Luther 1912
GREEK www.unboundbible.org New Testament Byzantine/Majority Text 2000 Parsed
HEBREW www.unboundbible.org Old Testament Westminster Leningrad Codex
HUNGARIAN www.unboundbible.org New Testament Karoli
ITALIAN www.unboundbible.org Giovanni Diodati Bible 1649
KOREAN www.unboundbible.org New Testament
LATIN www.unboundbible.org New Testament Vulgata Clementina
LITHUANIAN www.unboundbible.org New Testament
MAORI www.unboundbible.org New Testament
NORWEGIAN www.unboundbible.org Det Norsk Bibelselskap 1930
PORTUGUESE www.unboundbible.org Almeida Atualizada
ROMANIAN www.unboundbible.org Cornilescu
RUSSIAN www.unboundbible.org Synodal Translation 1876
SPANISH www.unboundbible.org Sagradas Escrituras 1569
SWEDISH www.unboundbible.org New Testament 1917
TAGALOG www.unboundbible.org Ang Dating Biblia 1905
THAI www.unboundbible.org King James Version Translated
TURKISH www.unboundbible.org New Testament
VIETNAMESE www.unboundbible.org New Testament 1934
XHOSA www.unboundbible.org New Testament
Earlier Testsets that are not used anymore
CALGARY well-known test set of 18 files; souce codes, white papers, library inventory… 
CANTERBURY well-known test set of 11 files; souce codes, epics 
DIRECTX GAME a critical directx game with DelphiX sounds and graphics (.dxg), many .jpg & .lbm files; large exe file, 149 files
MODULES 94 amiga sound files of the 80's filetypes: .mod, .s3m, .xm; those files had been the music standard decades ago
SAVEGAMES 784 savegame files of 90's games (XCOM 1+2+3, Keen, Nightmare 3D, Command & Conquer 2, Crystal Caves, Raptor..)
TEXT DATABASE an .ini file for MediaPlayer that includes a text-based database of 244.569 audio cd's (Title, tracklist, play time …)
CoDEC's a collection of dozens audio & video codecs in .exe, .dll, .ax, .ocx, .acm, .drv, .cpl, and  .gif, .htm, .bmp files; total of  265 files
NOKIA 12.632 monochrome operator logos and 2.926 monotone ringtones for Nokia LogoManager; some 2 color bitmaps included
Bitmap 2007 are replaced because they were created from a lossy source and contained artifacts
Waveforms 2007 are replaced because they were created from a lossy source and contained artifacts
Encyclopedia 2001 an encyclopedia of the computers history in german language, some .bmp, .wav, .dll .al7 (dictionary with an exe header). 78 files
the .al7 file (287 MB) contains 1.390 bitmaps (111 MB) and thousands of text files (176 MB) with full headers, TAR - alike
Fonts 2001 a total of 114 TrueType fonts and system fonts included in WinXP; .ttf, .fon; 
those binary files are needed to display text in Windows based environments
What the hell is the sense of compressor benchmarks?
Testing compression programs has a long history with some very famous persons having only one idea in mind: To find the best compression program currently
available. Back in 1985 the first compressor was born: SQPC file squeezer. And this actually evolved a whole compressor population until now. Every programmer 
claims his or her program to be the best around - but how do end users distinguish between the good and the bad compressors? The answer should be: 
Reading such charts. But this is not the case for most of the world's end users. They believe in commercials and buy the first one they notice… 
But when buying a new car every man compares the models around…
It is very very interesting in my opinion that some compressors released this year do not compress better than those released decades ago, 
and on the other hand - it is interesting that a few compressors released some years ago could compress nearly as well as those released today…
By running a compression challenge we can activate perhaps developing improvements or at least could we let know some compression program writers,
 that actually the old compression codecs are outdated and should be retired.
It is rumored, that every couple of years, when the sun shines some beautiful morning, a genius mind creates a new compression algorithm or improves an existing…
Why this test?
There are a few benchmarks out there and the testers already did and still do brilliant work in testing archivers and publishing results. 
But those results indicate compression capabilities of small & few files only. This Squeeze Chart was designed to show the ability of compressors 
in handling much and large files thus reflecting strength of solid archiving.
Why is one testset not enough?
Since the beginning of Personal Computing in the 80's mankind has discovered many fields on which computers can aid and serve.
 So nowadays we use computers for text processing, presentations, profit calculations, messaging, gaming, seeking information, reading books, 
listening to music, image editing and archiving …
And all this different usage fields have caused different file formats for storing data.
So a file compressor has to understand these different file formats. Mainly, there are four elements of which most files consists (in plain or melted form):
 executables, texts, bitmaps and wavetables. Each element needs to be recognized by archiver and requires a specific algorithm to be compressed with.
Can compression still be improved?
Improvements mean to work out different algorithms for the 4 elements or to modify (preprocess) those elements before compression takes place. 
The LZ-based algorithms are fast, the PPM-based algorithms are slow but do well on text, the Arithmetic algorithms are the slowest but superior on binary data.
 The best way is a hybrid algorithm that has the speed of LZ, text sorting power of PPM and the context comparing precision of Arithmetic algorithms.
 Actually the first step was done with LZMA algorithm of Igor Pavlov's 7-Zip that consists of LZ-based speed and Arithmetics strength..
And there are only a few compressors that can withstand this compression: WinRK, SLIM and the PAQ6 family.
Maybe one day we have algorithms in use that have PPM's text power with Arithmetic's precision (PPMAri). And for the complex bitmap
and wave compression some pixel / sample weavers or replacers that will conquer the market
and convince users that lossy algorithms such as MP3 or JPG are bad since human memoris will fade - so our digital memories should not loose information, too.