MULTIMEDIA COMPUTING Data Compression-I Multimedia data compression I: Lossless compression algorithm: Run-Length Coding, Variable Length Coding, Dictionary Based Coding, Arithmetic Coding, Lossless Image Compression

 

Introduction To All Topics of unit 3 in Multimedia Computing


Unit III: Data Compression-I Multimedia data compression I: Lossless compression algorithm: Run-Length Coding, Variable Length Coding, Dictionary Based Coding, Arithmetic Coding, Lossless Image Compression.

Unit III: Data Compression

Question 1: What is Run-Length Coding?

Run-Length Coding (RLC) is a simple form of data compression that works particularly well with data that contains long sequences of repeated values or characters. The basic idea behind RLC is to replace consecutive identical elements (runs) with a code indicating the length of the run and the value that is repeated.

For example, consider the sequence: "AAAABBBCCDAA". In RLC, this sequence can be encoded as "4A3B2C1D2A". Here, "4A" indicates four consecutive 'A's, "3B" indicates three consecutive 'B's, and so on.

RLC is most effective when applied to data with many repeated values, such as in certain types of images or text files. However, it may not perform well on data with little redundancy, and it doesn't work well with random data.

Question 2: How does Variable Length Coding (VLC) work?

Variable Length Coding (VLC) is a data compression method where each symbol or character in the input data is represented by a variable-length code, as opposed to a fixed-length code. In VLC, frequently occurring symbols are assigned shorter codes, while less frequent symbols are assigned longer codes. This is based on the principle of exploiting statistical redundancy in the data.

For example, in a text file, common letters like 'e' or 't' might be assigned shorter codes than less common letters like 'q' or 'z'. This allows VLC to achieve better compression ratios than fixed-length coding methods like ASCII.

A popular example of VLC is Huffman coding, where a binary tree is constructed based on the frequency of symbols in the data. Huffman coding assigns shorter codes to more frequent symbols, resulting in efficient compression.

Question 3: Explain Dictionary Based Coding.

Dictionary-Based Coding is a data compression technique that replaces repetitive sequences of data with references to a dictionary or a previously seen part of the data. This method works well for data that contains repeated patterns, such as text files, certain types of images, and some forms of audio.

One of the most famous dictionary-based compression algorithms is Lempel-Ziv-Welch (LZW). In LZW, a dictionary is built dynamically during the compression process. Initially, the dictionary contains entries for all possible single characters. As the compressor reads the input data, it looks for sequences of characters that have been seen before. When such a sequence is encountered, the compressor outputs a code that refers to the entry in the dictionary representing that sequence, and adds a new entry to the dictionary for the extended sequence.

During decompression, the decompressor builds the same dictionary dynamically as it reads the compressed data, using the same algorithm. This allows the original data to be reconstructed accurately.

Question 4: What is Arithmetic Coding?

Arithmetic coding is a sophisticated form of entropy encoding that achieves compression ratios close to the theoretical limit defined by the Shannon entropy of the data. It works by converting a string of data into a single floating-point number in the range [0,1), which is then represented in binary form.

The basic idea behind arithmetic coding is to assign variable-length codes to symbols based on their probabilities of occurrence. Unlike Huffman coding, which assigns codes to individual symbols or fixed-size blocks, arithmetic coding encodes the entire message in a single code.

During encoding, arithmetic coding maintains a range that represents the interval of possible values for the message. As each symbol is processed, the range is divided into subranges proportional to the probabilities of the symbols. The subrange corresponding to the next symbol is then selected, effectively narrowing down the range.

During decoding, the same process is followed in reverse to reconstruct the original message.

Arithmetic coding is particularly effective for data with highly skewed probability distributions, as it can assign shorter codes to more probable symbols.

Question 5: Explain Lossless Image Compression.

Lossless image compression is a method of reducing the size of an image file without losing any information from the original image. This is achieved by identifying and eliminating redundancy and unnecessary information in the image data.

One common technique for lossless image compression is predictive coding, where each pixel value is predicted based on the values of neighboring pixels, and the difference between the predicted value and the actual value is encoded. Predictive coding exploits spatial redundancy in the image data, as neighboring pixels in an image often have similar values.

Another approach is to use transform-based methods such as the Discrete Cosine Transform (DCT) or the Wavelet Transform. These methods transform the image data into a different domain where it can be more efficiently compressed. The transformed coefficients are then quantized and encoded using variable-length coding techniques like Huffman coding or arithmetic coding.

One widely used lossless image compression format is PNG (Portable Network Graphics), which uses a combination of predictive coding and entropy encoding to achieve high compression ratios without loss of quality.

Overall, lossless image compression is essential for applications where preserving image quality is paramount, such as medical imaging, archival storage, and certain types of graphic design.