Assignment #3: Using the Lex/Flex Scanner Generators


Due: Monday, Feb. 10, by 11:59pm

Contents:


Overview

Topic(s): Lexical Analysis, Lexical Analysis Generators, Lex/Flex

Related Reading: Section 2.1-2.2, and I'd recommend the following websites (as needed):


Problems to be Submitted (30 points possible)

Please submit your solution via email to the instructor by midnight on the due date.

  1. (10 points)

    The Swedish Chef is a character from Jim Henson's "The Muppet Show" that is well known for his comically-extreme Swedish accent. In a similar comical vein, in 1993 John Hagerman wrote a Lex program that transforms regular English text into an equivalent version of the text as it would be pronounced by the Swedish Chef. The Lex code for this program is chef.l.

    The Swedish Chef text converter (along with a few similar "linguistic" converters) has also been rendered into an applet by John Chambers, which can be found via the following link.

    Your task in this problem is to figure out how this Lex program works, and provide comments for each of the rules in the translation rules section, describing what English text it converts, and what it converts it to. Be sure to pay attention to the INW and NIW start conditions, as these impact when and/or how the subsequent text is converted.

    When you are uncertain how a particular rule works, you can experiment with the Swedish Chef converter program to gain better understanding as to how/when particular English strings are converted to Swedish Chef-isms. You can do this via the applet, or by compiling the chef.l program into an executable via the following Linux commands:

    flex chef.l
    gcc -o chef lex.yy.c -lfl

  2. (10 points)

    Leet (or "1337"), also known as eleet or leetspeak, is a system of modified spellings used primarily on the Internet. It often uses character replacements in ways that play on the similarity of their glyphs via reflection or other resemblance. Additionally, it modifies certain words based on a system of suffixes and alternate meanings. There are many dialects or linguistic varieties in different online communities.

    Leet originated within bulletin board systems (BBS) in the 1980s, where having "elite" status on a BBS allowed a user access to file folders, games, and special chat rooms. The Cult of the Dead Cow hacker collective has been credited with the original coining of the term, in their text-files of that era. One theory is that it was developed to defeat text filters created by BBS or Internet Relay Chat system operators for message boards to discourage the discussion of forbidden topics, like cracking and hacking. Creative misspellings and ASCII-art-derived words were also a way to attempt to indicate one was knowledgeable about the culture of computer users.

    Once the reserve of hackers, crackers, and script kiddies, leet has since entered the mainstream. It is now also used to mock newbies, also known colloquially as noobs, or newcomers, on web sites, or in gaming communities. Some consider emoticons and ASCII art, like smiley faces, to be leet, while others maintain that leet consists of only symbolic word encryption. More obscure forms of leet, involving the use of symbol combinations and almost no letters or numbers, continue to be used for its original purpose of encrypted communication. It is also sometimes used as a script language. Variants of leet have been also used for censorship purposes for many years.

    (Source: Wikipedia Your task for this problem is first to create a simple Lex program that translates some common 1337 shorthands (either intentional misspellings or unique vocabulary works) to their full English equivalents. Please have your program convert at least 20 shorthands to their regular English equivalent. Note: Please avoid building in offensive or inappropriate term transalations (i.e. any profanity).

  3. (10 points)

    Your task for this problem is to extend the texting converter that you created in Problem #2 to properly perform capitalization at the beginning of a sentence. For example, if the shorthand "leet" appears at the beginning of a sentence, it should be converted to "Elite", whereas if it appears in the middle of a sentence, it will be converted to "elite".

    Note: Be careful here! I want the start of EVERY sentence capitalized - not just at the beginning of a line. Yes, the first character counts, and there are many forms of punctuation! My best hint is that you need to have a full understanding of how the Swedish Chef converter in Problem #1 works in order to accomplish this task (since states are very helpful for this part).