Huffman Coding Compressing data with binary trees and collections. Huffman coding is an algorithm devised by David Huffman in 1952 for compressing data, reducing the file size of an image (or any...

attached specification on Huffman coding.I need a total cost on developing the code.


Huffman Coding Compressing data with binary trees and collections. Huffman coding is an algorithm devised by David Huffman in 1952 for compressing data, reducing the file size of an image (or any file) without affecting its quality. Unbelievably, this algorithm is still used today in a variety of very important areas. For example, MP3 files and JPEG images both use Huffman coding. We’ll use Huffman coding to compress text files composed of English characters. Character representation Huffman coding relies on the key observation that some characters appear more often than others. The characters that appear most often are, in terms of frequency, more important than the characters that appear less often. Earlier, we saw that every char has an equivalent int value between 0 and 255. Computers work with integers in binary format (1’s and 0’s). Binary is similar to normal decimal numbers except it is base2 instead of base10. For example, the decimal number 120410 is read as 1 · 103 + 2 · 102 + 0 · 101 + 4 · 100. Similarly, we can represent the decimal number 9710 in binary as 011000012 since 9710 is 0 · 27 + 1 · 26 + 1 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20. All Java char values can be written using exactly 8 binary digits (bits). This encoding is called ASCII. The following string data contains 240 characters: 229 a’s, 4 b’s, 3 c’s, and 2 d’s. If each character is represented in Java as an ASCII char, then the total file size is 2560 bits. bbbbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa cccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ddaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa However, since a is by far the most frequent character, then it’s better to represent a with the fewest possible bits to save space (followed by b, c, and then d). The following Huffman coding is a more efficient representation of the same string data. · a can be represented with a one-bit code, 12. · b can be represented with a two-bit code, 002. · c can be represented with a three-bit code, 0112. · d can be represented with a four-bit code, 0102. The resulting file size is only 255 bits, compressing the data by a factor of 10! Why not want represent b with a one-bit code, 02? Algorithm Consider the following example found in the simple-spec-example.txt file. aba ab cabbb Counting characters The example contains the characters a, b, c, and the space character. Character ’ ‘ ‘a’ ‘b’ ‘c’ Count 2 4 5 1 Initializing a priority queue A priority queue will be used to keep track of character counts. Priority queues are a special type of queue ordered by the priority value (e.g. character count) instead of FIFO order, so the remove method removes the highest priority element from the queue rather than the oldest element. In order to represent the (char, int) pair representing the character together with its frequency, introduce a new data type called HuffmanNode. For Huffman coding, rather than remove the highest-frequency character, remove the character with the lowest frequency first. Combining nodes Given the priority queue of the nodes, our goal is to form a Huffman tree representing the relative importance of each character. Repeatedly remove the smallest two nodes and create a new node with them as children. 1. First, remove the smallest two nodes from the priority queue. 2. Then, create a new node. The frequency is the sum of the two nodes. 3. Add the new node back to the priority queue. Repeat this process until there is only one node in the priority queue representing the complete Huffman tree. Generating Huffman codes At this point, the frequencies of the letters have already been taken into account, so character counts are no longer necessary. The resulting Huffman tree represents the optimal encodings for the data file. Notice that the only actual information is in the leaves of the tree. To determine the Huffman code for a character, traverse the tree from the root to the node with the letter in it. For left branches, print 0. For right branches, print 1. For example, the character c would be represented with the three-bit code 100, because it is located in the node right, left, left of the overall root. Output the tree in the following standard code file format consisting of pairs of lines, where the first line in the pair is the ASCII value of the character and the second line is the Huffman code for that character. Each pair should appear in traversal order. For example, the simple-spec-example.code file contains the following contents. 98 0 99 100 32 101 97 11 HuffmanCode Implement the HuffmanCode class representing a Huffman code for a particular message as a Huffman tree generated according to the described algorithm. The HuffmanCode class includes a private static inner class called HuffmanNode, representing individual nodes in the Huffman tree. There are two separate parts to the HuffmanCode class. 1. Creating a Huffman code with the frequencies constructor and save method. 2. Decompressing a message with the input constructor and translate method. HuffmanNode The contents of the HuffmanNode class are up to you, but it should have at least one constructor used by the HuffmanCode class and all node fields must be declared public. HuffmanNode should not contain any of the actual algorithm. Since the priority queue will ultimately be declared as PriorityQueue, the HuffmanNode class must implement Comparable. Use the frequency of the subtree to determine its ordering relative to other subtrees, with lower frequencies considered “less than” higher frequencies. If two frequencies are equal, the nodes are considered equal as well. Method summary public HuffmanCode(int[] frequencies) Initializes a new HuffmanCode object using the described algorithm given a 256-length array of frequencies where frequences[i] represents the count of the character with ASCII value i. Use a PriorityQueue to build the Huffman code. public HuffmanCode(Scanner input) Initializes a new HuffmanCode object by reading in a previously saved code file. Assume the Scanner is not null and always contains data encoded in the standard format. Since frequencies are irrelevant for this part, all node frequencies should be set to some standard value such as 0 or -1. public void save(PrintStream output) Stores the current Huffman codes to the given output stream in the standard format. public void translate(BitInputStream input, PrintStream output) Reads individual bits from the input stream and writes the corresponding characters to the output. Stops reading when the input is empty. Assume the input stream is compatible with this Huffman tree. Line-based processing When implementing the Scanner constructor, do not call nextInt() to read the integer and nextLine(). Mixing token-based reading and line-based reading can cause problems. Instead, use only line-based reading and call Integer.parseInt when int values are needed. int asciiValue = Integer.parseInt(input.nextLine()); String code = input.nextLine(); Translate example Algorithm for translate 1. Begin at the top of the tree. 2. For each bit in the input, go left for 0 and go right for 1. 3. At a leaf node, write the integer code to the output. 4. Repeat this process so long as there are still more bits to read. Suppose we have the following simple-spec-example.short message encoded using the simple-spec-example.code Huffman coding. 1101110111010110011000 The trace below diagrams the steps determining each character in the short message, resulting in the decompressed message, “ab ab c”. 1. Read 1, go right. Read 1, go right. ‘a’ is a leaf. Output ‘a’. Remaining input is 0101110101100. 2. Read 0, go left. ‘b’ is a leaf. Output ‘b’. Remaining input is 101110101100. 3. Read 1, go right. Read 0, go left. Read 1, go right. ‘ ‘ is a leaf. Output ‘ ‘. Remaining input is 110101100. 4. Read 1, go right. Read 1, go right. ‘a’ is a leaf. Output ‘a’. Remaining input is 0101100. 5. Read 0, go left. ‘b’ is a leaf. Output ‘b’. Remaining input is 101100. 6. Read 1, go right. Read 0, go left. Read 1, go right. ‘ ‘ is a leaf. Output ‘ ‘. Remaining input is 100. 7. Read 1, go right. Read 0, go left. Read 0, go right. ‘c’ is a leaf. Output ‘c’. Remaining input is empty. When implementing translate, do not call System.out.print and instead write to the given output. The PrintStream class contains a write method that accepts an int as a parameter. Be careful with using recursion since Java may throw a StackOverflowError when it exceeds a certain number of recursive calls. For a large message, there may be hundreds of thousands of characters to decode. BitInputStream The provided BitInputStream class reads data bit-by-bit for the translate method in HuffmanCode. The interface for BitInputStream looks very much like a Scanner, so it should be used in a similiar fashion. public int nextBit() Returns the next bit in the input stream. Throws a NoSuchElementException if there are no more bits. public boolean hasNextBit() This method returns true if the input stream has at least one more bit and false otherwise. Avoid unnecessary logic such as extra base cases or recursive cases. Use the x = change(x) pattern to simplify code. Remember to declare variables with their interface types when possible, so PriorityQueue variables should be delcared using the Queue interface type. The Huffman coding for a given message is not necessarily unique. If implemented exactly as specified, then you should get exactly the same tree so long as, when combining nodes, the first node removed becomes the left subtree and the second node removed becomes the right subtree. Submission In addition to the completed program, submit files named secretmessage.short and secretmessage.code that represent a secret compressed message from you to your Instructor. The message can be anything you want as long as it is not offensive. Submit HuffmanCode.java, secretmessage.short and secretmessage.code
May 27, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here