|
|
|
Future Work
Thanks to the unbelievable simplicity of the VCF algorithm, countless
improvements, added features and alternative implementations can be
proposed.
Here is a non exhaustive list of the most outstanding propositions we have.
Improving the test implementation.
Cell interpolation.
The test implementation has exhibited a weakness in the algorithm: the strong
cell boundary artefacts are too much visible and waste compression resources.
Fortunatly, an easy solution has been proposed to address this problem:
instead of building "flat" cells for which all pixels have the same value,
we should build a gradient in the cell by interpolation the cell value with
its 8 neighbours.
For this improvement, we do not need to add parameters to the cell: changing
the way a cell is rebuilt with its single value will be enough to make the
cell boundaries invisible. By doing that, we will at the same time increase
the compression rate, because the VCF algorithm would not need to waste
transitions to compensate the artefacts.
However, we feel that flat cells might also be need. One can imaging that
flat areas of an image might be coded more efficiently by flat cell, and we
don't want the VCF algorithm to waste transitions resources by fighting the
generated gradients.
More experiment has to be done on the subject, to make sure this is really
needed, and how to encode the information. The proposed way to deal with
that is by adding, in a dedicated table, one bit to each top-level cell to
decided whether this cell tree will be interpolated or flat. We might also
want to add a flag in the header to decide if this table is present in this
image or not.
Multiple depth support.
Another thing the tests has shown is the need to handle pixel depth
different that 8-bits. The test has been done for 1-bit, but a large set of
depths is actually useful. We propose to handle depths of 1, 2, 4, 6, 8, 12,
14, 16, 24 and 32 bits per pixel, either in gray scale (at least for 1 to
16) or in color (RGB or RGBA, at least for 8 to 32). For instance, grayscale
images with 1 to 4-bits per pixel can be used for facsimile storage and
transmission, while those with 8 to 16-bits per pixels might be used for
medical images.
One idea to test is how much interesting would it be to store the pixel
values at lower depth (say 4-bits for a 8-bits image, possiblity encoded by
a mu-Law for more dynamic), but still build the gradients with the screen's
depth.
Adding features.
Color images.
To become a real image format, VCF should of course handle color images. We
propose 3 implementation for that, and some further test will be made to
make sure all of them are needed.
The naive way to handle color images is to deal with 16, 24 or 32-bits per
pixel as one pixel value, with just the mean calculation and the
interpolation functions being aware of the presence of 3 (RBG, CMY or YUV)
or 4 (RGBA or CMYK) fields in the pixel value. This is easy to implement and
is likely to give very good results.
The second proposition we make is to treat each color and alpha channel as
separate images. To take advantage of the difference of sensibility of the
human eye for different colors (as 4:2:2 and 4:1:1 encoding do), we might
want to use a different depth for each channel. For instance, we might want
to encode R and B with 5 bits and G with 6 bits. This way, R and B channels
will have their pertinences naturally scalled down by a factor 4 regarding
to channel G, yet keeping the high-pertinence details.
This proposition also allows to encode channels that do not map to usual
chromaticity. For instance, we could compress so-called "sepia" images with
only two channels: one for luminance and one for the sepia chroma.
The drawback is the need for one transistion table for each channel. But the
image quality might be improved by separating channels and some tests are
needed to decide to whether or not support this feature.
The last proposition we have for supporting color images has never been
proposed for lossy compression of images, but, thanks to the simplicity of
the VCF algorithm, we can easily test unusual features.
We can try and compress images with indexed colors (look-up table). For that,
the flat cells are probably only way to go, and the cells value calculation
as to be changed from mean of pixels value to majority of pixels value. The
reconstruction uses the same algorithm. As usually, more tests have to be
made for that feature.
More control over the compression.
Since the compression and decompression processes are comparable to
memory-to-memory copy, an image viewer or an image retouching application
could use the VCF encoding as memory cache for the large images, painting
the needed image areas by decompressing the relevant top-level cells
on-the-fly. An advanced compression utility could even allow a user to
increase the pertinence of some areas (such as a text in the image) to
increase the image quality, or decrease it (such as the borders of the image
or the background of the subject) to increase the compression rate.
This feature could be paired with the ability for the user to ask for a
given weight for the image: is a user wants the image to be, say exactly
16KB, the VCF algorithm can add more and more details up to the desired
weight, by progressively decreasing the pertinence selection.
Movies.
Movies are just a set of images following one another. We believe that the
VCF algorithm can be adapted to perform very well in this area too. Because
as fast movements make blurry images, the human eye can only detect a
certain amount of new details of an image. By delaying the adding of details
to several frames, we can achieve high compression rates while keeping sharp
images: the image will eventually be sharp if it's still enough for the eye
to catch it.
For that, we can not just compress independently all images: we have to
take advantage of the fact that the next image is likely to look almost the
same as the current one. We propose to use a flag for each top-level cell
(all flags put in yet another table) to decided whether this tree will use
plain pixel values as for regular images, or a reference to the same tree in
the preceding image.
Preliminary tests we have made show that it would be better to used two
versions of the current image: the regular one, and a slightly blurrer one.
Each tree will decide to use one or the other. This is needed in order to
avoir the wasting of transitions to "erase" details.
Another improvement would probably be to use a motion-compensation reference,
by allowing a tree to reference a slightly shifted top-level cell sized area
in one of the reference images. The shifting of this reference needs only to be
smaller than a top-level cell size, horizontally and vertically.
Farther future.
Layers
The same way different channel could be encoded as different images, we can
embed several layers in the same compressed file. Layers are not only
useful for image retouching, viewer applications could take advantage of
such a feature, just like subtitles for a movie. For instance, schematics
could be displayed in a very handy fashion with a viewer allowing to hide
and show different layers. One can also think of a medical viewer,
where anatomic images could be explored while showing and hiding sets of
organs.
Some of these layers could contain non-graphical data, such as chunks of
data. We could use these chunks to store meta-data related to the image,
such as copyright information, EXIF data, etc.
Other dimensions.
The VCF principle of transition can be applied not only to images (2D), but
also to data having any number of dimension.
For instance, one could wish to compress 1D data, such as sound, by using a
transition of 1 cell to 2 (half) cells. Of course, the cell value would
probably not be calculated with the mean of samples, and the cells
reconstruction will not be either flat or interpolated. This sound
compression will likely use a time-frequency representation of some kind to
be efficient.
On the other hand, compressing 3D data sets (such as medical ultra-sound
images or mechanical models) would use most of the concepts present in
this paper. The transitions would be 1 cell to 8 smaller cells, and the flat
and interpolated values still apply. A viewer application would build a 2D
section and projection of the data set. The user would be allowed to rotate and
move the object in the viewer, move the section, and it would be possible to
show and hide layers to make some parts visible or not.
There is no reason to limit the applications to the third dimension. A set
of n-dimensions data could be displayed with exactly the same viewer: after
all, an nD-to-2D projection is not much more complicated than a 3D-to-2D
projection, same for interpolation. Transitions will be from 1 cell to 2^n
cells, and flat or interpolated cell will be built. Statistical visualisation
could benefit from such a viewer using flat cells, possibly with the indexed
color encoding, and layers.
If we use another projection formula, we can also display 2D data in a panoramic
fashion, either from seen from outside (for an object), or seen from inside
(for a room or a landscape).
Multi-resolutions images.
One great promise of the fractal compression was to store multi-resolutions
images. Well, VCF can also do that too! In the test implementation, we choosed
to keep the transitions unpainted. If we keep the pixel value of the cell
even for transitions, we can choose down to what level we want to uncompress
the image: no need to uncompress a huge 3000x2000 image and scale it down to
display it on a 640x480 screen. Same if we want to display a 80x60
thumbnailed version of the image.
Of course, that would bloat the compressed file with pixel value that will
seldom be displayed. But not that much, since upper level cells are much
less numerous than the pixels. Moreever, we can compensate this somehow by
allowing "not-painted non-transitions": if the transition does not occur,
the cell will be painted with the value of the upper level, and there will
be no value for this particular cell.
Hidden details and Infinite loops.
Since we can have multi-resolutions images, we can display any image at the
resolution we want without needing to uncompress the details smaller than a
screen's pixel.
We can use that to put, in an image, details that need to be zoomed to
become visible. For instance, one could zoom in an image of a person's face
and display a very detailed representation of the reflections in one eye of
this person. Many areas of such an image could hide "easter-eggs" to be
explored.
Even better: zooming into the eyes of the person would reveal a complete
landscape where the same person is standing. One could continuously zoom
from inside the eyes to the landscape, to the person's face, to the eyes ...
in an infinite loop. Several completely indepent such infinite zooming loops
could exist in the same image.
prev
toc
next
last modification: March 2002
(C) Eric GAUDET, 2002
|
|