2 Attempts AllowedAvailable: Aug 23, 2021 10:00amAvailable: Aug 23, 2021 10:00amuntil Sep 30, 2021 11:59pmuntil Sep 30, 2021 11:59pmDetails
CSCI 6620 Data Structures
Program 2: Run Length Compression
Use this file as your input: uncompressed.txt
Download Use this file as your input: uncompressed.txt
Word Version
Download Word Version
1 Goals of this Assignment
- To implement the compression algorithm that matches the decompression algorithm from Program from Program 1
- To use file I/O in C++ (open, read a char, test for eof).
- To use an acceptable coding style: no goto, no global variables, proper indentation, and reasonable names.
- To understand and use a type cast to create a 1-byte integer
1.1 Review: Run Length Compression
This algorithm is effective on text or binary files that have the same byte repeated over and over. Think of a file containing a table of numbers: it has lots of consecutive space characters, and may have a repeated filler character, such as a ‘.’ .
In this scheme, any “run” of the same character (4 or more identical consecutive bytes) is replaced by a triplet of bytes, consisting of
- An escape character. We will use 0x7f, which is sometimes called “esc”. It is a non-printing ASCII character
- The letter that has been repeated
- A 1-byte count = the number of repetitions
To make this work, any esc character, or run of them, that occurs in the input mustalsobe replaced by a triplet: esc esc count .
Does it Compress?There is a general theoretical result that no algorithm can compress all files while maintaining all the information in the file. Consider all sequences of N bits: there are 2n
different sequences, and you cannot represent all of them in fewer than N bits. The best we can do is compress MOST files (at the cost of lengthening some) or maintain all SIGNIFICANT information. Formats such as mp3 and jpg do throw away information; they are called “lossy encodings”.
So it makes sense to ask what kind of files can/cannot be compressed using run-length encoding. This representation does not throw away any bits – therefore it cannot compress all files. What files CAN be compressed?
- If the input file contains no “esc” characters:
- If one or more runs of length 4 or more exist: the output file will be shorter
- If the file contains no runs of length 4 or more: the output will be identical to the input file
- If the input file contains runs of 1 or 2 “esc” characters, the output file will be longer unless those short runs are balanced by longer runs
1.2 The Compression Algorithm.
Your job is to implement the compression algorithm:
- At all times, keep a “prior char” variable to compare to the “current char” read from the Also maintain a short-integer “runCount”.
- To begin, set the runCount to 1 and read the first input char into priorChar using a method that does NOT skip whitespace
- Now enter a processing loop
- Read a character into currentChar (do not skip whitespace). If it is the same as priorChar:
- If the run counter has reached 255, write a triplet and reinitialize the counter to 1
- Else increment the run counter
- If currentChar is different from priorChar:
- If the runCount is 1, 2, or 3, write out priorChar 1, 2, or 3 times
- Else (the count is 4 or more) write out a triplet and reinitialize runCount to 1
- In both cases, save currentChar in priorChar
- When end of file occurs, write out the last triplet or write the last char (priorChar) an appropriate number of times
- Please note: if you handle the end of file wrong, the last character of your output will be wrong
2 Due September 15
If you have questions, email them to me, before Friday if possible.
All programming in this course will be done in C++. This assignment involves file I/O which must be done using the C++ I/O libraries. Standards for style, organization, and OO usage will increase gradually throughout the term. For this program, do it simply. You do not need any classes or structs. You can do the whole thing in main().
2.1 Submitting Your Work
Zip up your complete project (with all source files) including your test output into a directory named P2-YourName. Hand in an electronic copy of your zipped directory to the appropriate place in Canvas. I will unzip and run your project on my desktop computer