“The HTML file that contains all the text for this article is about 25000 bytes. That’s less than one of the image files that was also downloaded when you selected this page. Since image files typically are larger than text files and since web pages often contain many images that are transmitted across connections that can be slow, it’s helpful to have a way to represent images in a compact format. In this article, we’ll see how a JPEG file represents an image using a fraction of the computer storage that might be expected. We’ll also look at some of the mathematics behind the newer JPEG 2000 standard.”
Quite a nice reading for the evening
I feel that, while this article goes into great depth for an online article, it fails to tell in a simple way what each type of compression does. Both JPEG and JPEG2000 are based on quite a complex set of mathematics and are not easily explained in only a few lines, especially if you lack the strong mathematical foundation that is required to understand this stuff. Then again, it is posted in a math column, so what can you expect?
So, for the sake of simplicity, I’ll try to describe what each format does in more accessible terms. JPEG tries to find blocks with very similar colours and will make one block of this. The compression is obtained because for example instead of having to describe an 8×8 block (which means you have to state 64 times which color the tiny 1×1 block is), you can simply say that the entire block is one colour. JPEG is so successful, because the human eye can only distinguish a limited number of shades of colour, whereas a computer can store quite a lot that we are never able to distinguish.
JPEG 2000 takes a completely different approach. This algoritm searches to find excessive detail. This is very similar to what we did above, but the math behind it is a lot more refined and JPEG 2000 can easily obtain 10x better compression with the very same result.
What JPEG 200 tries to do is find the basic image (this is 1/4th of the original picture) and what the details are. So it is just as if we scale the image to half its width and height. To return to our original image, we describe the differences between this smaller picture and our original picture. This will be something like: the difference between the first pixel and the second pixel in the original pixel is 0,2 shades of red. We now store the first pixel and remember that the second pixel is the first pixel + 0,2 shades of red. Other pixels may have far larger differences, for example 9,2 shades of red. The rough, scaled down image is stored in the low pass filter. The differences between the pixels is stored in the high pass filter (because this allows us to return to a higher resolution). This process is now repeated. The scaled down image is now used as the original and we scale it down again and once again try to find the differences. This continues until we end up with an image that is only a single pixel in size.
What JPEG 2000 now does, is to remove all the small differences. This is quite cool, because removing 90 to 99% of all the smallest differences will have hardly any impact on the end result. We then simply backtrack our steps and add the detail once more (after we deleted all those tiny numbers and replaced them with 0, which gives a tremendous compression because storing 0,213978 is a lot more difficult than to simply store 0). So yes, we can actually obtain very high compression ratios thanks to JPEG 2000. But, as always, this comes at a cost: time.
This is where the easy part stops. To encode this, you need to have a decent understanding of orthonormal bases and other stuff. Suffices to say that the complex mathematics are needed because we want the decoding to be no harder than the encoding. If we did not take care of quite a few things, we would be left with the need to actually invert matrices, which is a tremendous pain in the butt. Believe me. For those who are in the know, the mathematical stuff ensures that we can invert a matrix in O(1) instead of O(n^3). Nifty, not?
For those who want to know more, be sure to consider this reference:
Fractal and Wavelet Image Compression Techniques – Stephen Welstead. (ISBN : 0819435031). Not an easy read and be sure to brush up your math, but the book is amazing.
Edited 2007-09-16 14:26
Anyone with a traditional Engineering bachelor’s of science has had much harrier mathematics.
However, if you’re going to reference Fractal and wavelet compression techniques it makes more sense to refer to a master book on Image technologies:
Digital Image Processing (3rd Edition) (Hardcover)
ISBN-10: 013168728X
ISBN-13: 978-0131687288
I do agree that the article really didn’t have a goal. It showed a lot of meat yet more practical application is what would make it more compelling.
Something that was pointless is the following:
Notice that higher values of q give lower values of . We then round the weights as…
Really? I sure hope so seeing that the basic linear function is inversely proportional to one another over the lower bound domain 1<=q<=50 percent and linearly decelerating from 50<=q<=100 percent.
alpha = 50 / q. Inversely proportional.
alpha = 2 – q/50 Linearly decelerating.
Of course the article doesn’t even discuss the infinite range of values between 0<q<=1 percent with alpha = 50 / q, or at least, explain why it’s relevance to the article is insignificant.
I liked the fact the article went into some work though the author could have really sold it by showing the various industry applications with such a 3 x 3 matrix transforms.
Jpeg2000 sadly is the result of “design by committee”. Overly complex, overly slow and it ends up not really doing anything well, especially for extremely large multi channel images.
I’d thought that the Microsoft HD-Photo (or whatever it’s name is now) was the best variant available but then I found a paper about “Backward Coding” of wavelets which is extremely fast and can be extremely memory efficient and still gives all benefits of the benchmark SPIHT algorithm.
With jpeg2k not having support for embedded indexing (insufficient container) and the standard not dealing with tile edge artifacts (insufficient engineering) it’s not all its cracked up to be.
very strange. To me jpeg2K is the grail for huge images and does not need indexing because of the fractal/tiles format design.
To have an idea you can extract small portion of a 200 000×200 000 pixels compressed files, with in place resizing, quicker than a flat uncompressed file when you resize a lot, on cheap hardware (CPU and Memory). This is simply wonderful, but it’s mainly used by cartographers, though.
A file format like jpeg2K allow completely new approach to problems, for example, we begin to see heterogeneous dpi for screen, from 75 to 130, and i won’t be surprised to see 200+ dpi screens in the years coming. Just think about this image on web pages problem, in high dpi screens images will become too small or overzoomed, and too big or with heavy resize in low dpi screens. With some engineering jpeg2k allows you to stop the download when you have enough data : the format allows the building of an image overview with just reading the beginning of the file. So for low dpi screens you download only 12kB to display the well fitted image and 50kB for the high dpi one.
If this format weren’t crippled by patents it would be already the high definition image format de facto standard.
I work with photogrammetry and GIS.
Jpeg2k doesn’t allow for embedded indexing of the tiles. It requires a separate “index file” to be generated.
Additionally with any small amount of compression (~5:1 or better) the tile boundaries become clearly visible unless custom smoothing is done at the tile boundaries after decompression and image formation.
The SPIHT family of wavelet compression schemes is superior to what jpeg2k uses. The problem was that SPIHT is a patent minefield and in its original form is prohibitively expensive both in cpu and memory.
Jpeg2k is an improvement time wise but not complexity wise. The bit encoding scheme is very complex and very difficult to implement a clean room scheme.
BCWT is pretty revolutionary and very clean. Implementation is very easy, the algorithm is small. and BCWT outperforms jpeg2k at the same compression rates WITHOUT doing any encoding of the output stream.
I don’t know what the patent problems are with BCWT but for today it would be a great place to start, and to move ahead to fix the problems with jpeg2k (container issues).
That was detailed and damning. From the Taubman & Marcellin, I thought the image edge (e.g. for stitching cases etc.) was nicely defined and the whole thing a nice lead-in to toys like that Microsoft-does-fractal-images one from the beginning of ’07.
So you say JPEG2000 ‘extended profiles’ are not going to be as successful as the h.264 MPEG4’s typical matroska wrapping?
With Hi-Speed Internet connection, with all the multi-mega pixels digital camera and common 500Gb desktop storage… Why do we need to loose all those details using compression?
I prefer to wait 2-3 second more and have a better quality photo than crap compression photo.
Even if JPEG2000 does a great job, it’s not perfect.
you need compression because an uncompressed 1600*1200 true color images takes around 8Mo.
Now, to store an uncompressed FullHD 1h30 movie with say 30 images/s (around 162000 1600*1200 true color images) you basically need a 1600 Go Hard drive
I think you get the picture :
Compression is here to stay and long live maths…
with all the multi-mega pixels digital camera and common 500Gb desktop storage…
As an example, let’s assume you have a 3 megapixel camera. The camera supports both 8bit and 12bit color depths. So, a single pixel would take either 3*8bits (3 bytes) or 3*12bits (4 bytes). Storing 3 million pixels would then require atleast 3.000.000*3 bytes of space (8 megabytes) per image. That means, even if your camera had one gigabyte of space you could only take 128 pictures before you run out of space…
Another example from a completely different angle would be pictures on web pages: if a picture on a web page was say 5 megabytes in size and there was 20 users simultaneously viewing the web page, the server would need to push out 100 megabytes worth of data…with 100 users that would be 500 megabytes. And at that point their net bandwidth would most likely not be enough.
That’s why compression is needed.
I shoot (with a Nikon D2X) in RAW+JPEG(fine). Every picture is approx 20Mb.
On a three week trip to Madagascar I shot plenty of pictures. 5609 to be exact. I didn’t fill up either of my Vosonic 100Mb backup devices.(2 copies of everything and didn’t take my laptop)
I shoot in both formats because if I want to post to the web, the 2Mb Jpeg copy is fine. If I want to print to anything larger than A3 then I use the RAW copy.
I also underexpose everything by 2/3 of a stop to reduce white burnout.
A true photographer will want to have the most data possible to work on when it matters. In the digital world, data discarded (with a lossy compression system like JPEG) is data lost forever.
Storage is very cheap these days so for me compression is a real No-no.
Question. How do you fit 5609 pictures each 20MB in size on two 100Mb devices?
Am I misunderstanding something? 5609 pictures times 20 megabytes would estimate to around 112 gigabytes..How can those fit in 100mb backup devices?
Though, I can understand the wish to retain all data possible. I do just save all my pictures as JPEG but I use very high quality, and I always work on copies, thus not degrading the originals any more than the initial compression.
He probably meant GB, and 20 is approximated. It may (and will most of the time) be less.
Raw is pretty neat if you don’t own one of the rare camera that can produce perfect jpegs, like all the Canon DSLRs.
Out of the box, with no fancy settings when i convert raw to jpgs on my computer, the pictures taken in raw are sharper, deals better with noise and have a lot more micro-contrast than the ones i shoot in jpeg, with my Olympus E-300. But yeah, this olympus is one of the worst at producing jpegs.
Raw is better too when you want to change things like white balance or pump up the exposure.
Because web servers already have 500GB HDDs, yet web hosting companies offer hundreds of gigs of space, and people like to store many photos and videos (videos are only a bunch of JPEG photos with a sound track). And also because there are people who still have 56Kbps connections as their only option to access the Internet. If you need 40 min to download a page, that doesn’t cut it.
jpeg2k guarantees a lossless compression, which result of compressed files at an average 50% of original file size. This is the guaranteed lossless, there is a security margin. So that a 40% of original file size, you may have totally invisible information loss. At 10% of the original file size, the naked eye may notice some encodings artifacts without comparing to original. 5% of original file size is a good ratio for interchange (not storage). Why pay for 500GB hard drive if you can have the same storage on 250GB, thanks to the software magic ?
jpeg2k won’t emerge soon because it’s heavily patented, but it’s a great technology.
Probably where the future lies is with a compression scheme and codec that doesn’t “get in the way”.
Basically a scheme that is good enough to take away the gross disk space useage while not sucking up tons of resources.
Additionally the codec/scheme should actually “enhance” the image itself. Wavelets are the only scheme I know of that inherently adds this value through the built in pyramids.
…even if I don’t understand them.
I second that. I much prefer these nuts and bolts articles to gossip about copyright infringement posted the same day the relevant parties are notified… followed by articles about the payback gambit when the accused side finds a way to counterattack.
/me nosebleeds
Long time ago, in a german computer magazine far far away… there was a nice article about how JPEG compression (esp. DCT) works. As an example about how quality would change due to compression grade and multiple applications of compression, a picture of a seagull was shown. And beneath the last picture, the sentence “Now the seagull is out of air” (Da geht der M"owe die Luft aus) – the content of the picture could hardly be recognized.
On the other hand, modern home PC telemarketing (seen on a german TV channel nowadays) warns you: “Don’t open your pictures too often – they’ll lose quality!” For real!
While this warning is okay when you open and re-save your pictures (or do transformations to them, such as rotating, mirroring or flipping), they really will lose quality due to new encoding, so file sizes will shrink (while images sizes will stay the same). But only opening files for reading and displaying won’t do them any harm. I have mercy for the poor people who immediately stopped viewing their digital photos in order to keep the quality.
Another nice example what not to do was a friend of mine who insisted on saving his PCB layouts in JPEG images. When transfered to a foil, the generated boards were messy and unusable. Someone should have instructed him about the artefact generation around black lines on white background (or vice versa). I’m sure GIF or PNG would have been a better solution.
But to come back on topic: The article was very informative allthough I had to repeat some math from my time at the university (where we had to descripe DCT, JPEG et al. in detail). Ah, memories…
And thanks to kvaruni for the good summary, ++ for this.
I’m sure GIF or PNG would have been a better solution.
Yeah, both GIF and PNG are lossless image formats so you don’t lose any information nor do they generate any artefacts on your images. Though, GIF is only for 256 color images and GIF doesn’t use too smart algorithms for compression PNG is better, and I always save my files as PNG when I don’t want the quality to degrade (for example when I am editing something and I save a temporary image/backup)
Actually GIF is a lossy format.
PNG and TIFF however are lossless format. GIF, JPEG and JPEG2K are lossy formats.
Say what? If GIF is lossy then I’m Santa Clause. Where did you get that misinformation from?
From Wikipedia: “GIF images are compressed using the Lempel-Ziv-Welch (LZW) lossless data compression technique to reduce the file size without degrading the visual quality.”
And TIFF is not lossless. TIFF is a graphics file format, but is not tied to a single specific compression algorithm. TIFF files can be compressed with LZW (lossless) or JPEG (lossy) or even other algorithms.
Edited 2007-09-16 16:56
It can be argued that saving a 24bpp image as a gif is lossy since it has to be converted to 256 colors. That’s probably what he meant.
That could be it. Though, technically the compression format is not lossy then, the conversion from 24bpp to 8bpp is the lossy step.
“Though, technically the compression format is not lossy then”
And that is what he said, that GIF is a lossy format. He didnt say “GIF uses lossy compression”.
No, GIF is not a lossy format. GIF is meant for 8bit images so it’s the user’s own fault if he/she tries to save a high-color image as GIF. But try saving an 8bit image as GIF and you’ll see you don’t lose any detail whatsoever. You can open and save the image 1000 times and it’s still the same as in the beginning.
It’s made more confusing by image editors like Photoshop having lossy controls for image formats which don’t support it (i.e., PS performs lossy on the image before applying the file-specific compression)
GIF uses a lossless compression algorithm, but it usually can’t handle 24-bit color, hence the loss.
http://en.wikipedia.org/wiki/GIF
“True color
This short section requires expansion.
Although the standard GIF format is limited to 256 colors, there is a hack[6] that can overcome this limitation under certain circumstances.
GIF89a was designed based on the principle of rendering images (known as frames when used for animation) to a logical screen. Each image could optionally have its own palette, and the format provides flags to specify delay and waiting for user input between them (the latter is not widely supported by viewers). This is the feature that is used to create animated GIFs, but it can also be used to store a 24-bit RGB (truecolor) image by splitting it up into pieces small enough to be encoded into a 256 color palette and setting up the GIF to render these with no delay on the logical screen.[7][8] However, most web browsers seem to assume that this multi-image feature will only be used for animation and insert a minimum delay between images. There will also be some file size bloat from doing this. There are few tools around that can easily produce 24-bit GIFs (e.g. ANGIF or SView5) – however it is rarely an appropriate format unless there is absolutely no other option.”
I really don’t understand why today, in 2007, people are still confused about the differences between PNG and JPG and even between lossy and lossless. All you have to do is to save a screenshot of a computer program (or some other image in which changes in detail is very noticable) to notice that the .JPG loses quality while .PNG does not.
I think he wants to say that he has more than one of those 100MB devices.
Yeah, I made the mistake of saving circuits in jpeg once. I learned after that gif was my friend.
That is an interesting article. Not sure how relevant it is or applicable to many BUT it is very COOL!
An excellent article. While I’ve worked with the Independent JPEG Group code for sometime and my own non standard DCT codec, the newer 2000 standard is less familiar to me. I only hope that the web shifts over to it for the less blocky images.
The math and graphics were very good as far as it goes, about as far as you would want it to unless you actually had to work with this stuff.
In conventional JPEG codecs the default is to code luminance & chrominance but it is also possible to encode in the RGB space.
One of the things with regular JPEG is that the DCT code itself is remarkably small, elegant, and usually very fast and very friendly to processor caches since a lot of work is done on only 8 integer vectors repeating the same step on all 8 vertical and 8 horizontal columns & rows of every tile. The DCT step is essentially a set of butterflies with most of the complex math reduced by using adds and very few multiplies. If I recall correctly about 11 multiplies and some 80 moves, adds and subtracts does the job for 8 values. Repeat 16 times for a 8×8 tile and you get the idea the math is manageable for any modern processor. The multiplies can even be replaced by hand crafted addition, subtraction, shift macros for similar results. That does not give any reduction in data size but leads on to the possibility of frequency dependent quantization and the final step of lossless entropy encoding. The DCT step can usually quantize in the final stage of the DCT and the entropy encoding is usually a separate step that takes about the same amount of time, thats where there is some complexity.
What is much more complicated and far beyond the article scope is the general management of the whole process, the entropy encoder, the management of file options and the JFIF informal standard. I think the DCT step is possibly only 1% of the entire code base but uses about 50% of the processor time. That DCT kernel is well worth studying on its own just because it is so small.
For my own work I tried encoding the R,G,B quantized data directly with no subsampling with a fairly good entropy encoder similar to that used in the Nelson classic text on Data Compression. Mixing the the 3 coded streams into a simple file format with corresponding decoder gave me JPEG like quality files but about twice as big for the same q of quality. That reinforces why the industry uses the YCbCb scheme to chuck as much information away as possible from the CbCr fields. Thats similar to the same way Analog TV has worked for 50 years too in the NTSC, PAL, SECAM standards, transmit luminance and color on different RF bands although these standards do these steps in different ways.
What I then found was that for tiles where the color is mostly the same hue, and there is some luminance information, the 3 RGB quantized vectors are significantly mostly the same in the high frequency end of the zigzag vector and these can be simplified by differencing. The values closest to the DC value have the luminance information so there is some R,G,B variation if there is any picture detail and so differencing there changes the values and can also reduce their magnitude. Any reversible reduction in these q magnitude values only helps the entropy encoder emit smaller streams, and wiping out long blocks of values really helps.
So I entropy encode the quantized G, G-R, G-B instead and now the resulting file is almost the same as the comparison q JPEG files. While the YCbCR transform of RGB space is direct, the G,G-R,G-B trick is really doing the same thing and much simpler to do, but it does require 3 full DCTs instead of about 1.5. Not even sure if its common knowledge.
Anyhow despite having built my own codec comparable in performance, I reverted to the use of the IJG code since it works so well and its seems to be everywhere. It also handles progressive and other common options e and can do quite a few other things for me.
The Mark Nelson text The Data Compression Book is a good but pretty dated read for the JPEG section but it gets very light after the DCT area. It also presented a simple MN entropy encoder and thats why I did the same thing, it encourages the reader to build a working codec (just don’t ship it).
For production code, the Independent JPEG Group C library is easy to find and include in your own apps if you need this sort of thing.
That thing about compression being fundamentally wrong is mistaken; there is noise in your image, and with these maths applied you can get a separation of patina, structure, detail and noise components which approaches perfect and improves on overprotectiveness in exposure. Until of course you can check the camera back or something and get the analysis right there (…a year off in IP blocks, maybe more in UI work?) though, you might as an analytical person be rotating the camera body and taking multiple angle shots. It beats zipping right up to the tiger and snapping shots like mad even though you ‘lose data.’
So, the method does not look new in mathematics, but the big deal here is the code; all I found as such was a 9-page paper from March by authors with one name each (but from Texas Tech -and- Beijing Inst. Tech Zhuhai) http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4148760 http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/pr…
The killer thing with JPEG2000 for me was the paranoid-exactness about having a dimensionless image, the ability to extract image features straight from the already-done calculations, the JBIG sidebar option (lossless!) and (separately of course) the ability to specify how many bytes -you- want the image to take.
Face it though; the wrapper is supposed to be compressed XML.
Thanks for the cues so far!
The backward image coding is just the raw image compression algorithm. The part that is missing is the “container” part.
For transmission purposes the jpeg2000 container is fine. I’d just propose something more “tiff-like” for static images. The Jpeg2k specification blew off the whole idea of fast tile lookup. I figure using something like an mmappable version of steven dekorte’s skipdb as a container format. Nothing wrong with storing image tiles,components and metadata in fast simple recompactable key/value database. Just have to keep the natives away from abusing the specification (a’la tiff and exif).
Edited 2007-09-17 04:03 UTC
What the article fails to mention is that JPEG2000 is heavily encumbered with patents. If you want to implement the full standard, you’re going to have to pay, pay, pay.
One alternative that (surprisingly) doesn’t seem as encumbered is Microsoft’s Windows Media Photo (see http://en.wikipedia.org/wiki/Windows_Media_Photo). Here’s a blurb from the Wiki:
“The HD Photo bitstream specification claims that “HD Photo offers image quality comparable to JPEG-2000 with computational and memory performance more closely comparable to JPEG”, that it “delivers a lossy compressed image of better perceptive quality than JPEG at less than half the file size”, and that “lossless compressed images … are typically 2.5 times smaller than the original uncompressed data”.”
If true, that’s impressive.
Well the *problem* with microsoft’s format is that they *promised* some time ago to totally open up their format without anyone being concerned that they would at some time start to claim their rights on the format. We’ve been waiting 9+ months for that document.
Also it’s important to note taht there’s a few strings attached. Meaning the algorithm must be implemented with their container. Their container does NOT support larger than 4GB file size, which has become utterly crippling, especially in GIS and astronomy.
Microsoft really really really wants digital camera companies to adopt this format. That’s part of the reason for the promise. The digital camera companies just dont’ trust them.
There’s some debate over this format otherwise. That wavelets have as part of their format built in pyramids already makes me lean towards that format still. It’s a pain to keep separate pyramid files or separate “subfiles” which contain the pyramids.
Edited 2007-09-17 15:45 UTC