As mentioned in Sect. 1.4, DFAs are often implemented by tables where the current state is cross-indexed by the next symbol to find the next state. If the alphabet is large, such a table can take up...



As mentioned in Sect. 1.4, DFAs are often implemented by tables


where the current state is cross-indexed by the next symbol to find the next state. If


the alphabet is large, such a table can take up quite a lot of room. If, for example,


16-bit Unicode is used as the alphabet, there are 216 = 65536 entries in each row


of the table. Even if each entry in the table is only one byte, each row will take up


64 KB of memory, which may be a problem.


A possible solution is to split each 16-bit Unicode character c into two 8-bit characters c1 and c2. In the regular expressions, each occurrence of a character c is hence


replaced by the regular expression c1c2. This regular expression is then converted to


an NFA and then to a DFA in the usual way. The DFA may (and probably will) have


more states than the DFA using 16-bit characters, but each state in the new DFA use


only 1/256th of the space used by the original DFA.


a) How much larger is the new NFA compared to the old?


b) Estimate what the expected size (measured as number of states) of the new DFA


is compared to the old. Hint: Some states in the NFA can be reached only after


an even number of 8-bit characters are read and the rest only after an odd number


of 8-bit characters are read. What does this imply for the sets constructed during


the subset construction?


c) Roughly, how much time does the new DFA require to analyse a string compared


to the old?


d) If space is a problem for a DFA over an 8-bit alphabet, do you expect that a


similar trick (splitting each 8-bit character into two 4-bit characters) will help


reduce the space requirements? Justify your answer.

May 26, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here