Assignment #3: Using the Lex/Flex Scanner Generators


Due: Wednesday, Feb. 8, by 11:59pm

Contents:


Overview

Topic(s): Lexical Analysis, Lexical Analysis Generators, Lex/Flex

Related Reading: Section 2.1-2.2, and the following websites:


Problems to be Submitted (25 points)

Please submit your solution via email to the instructor by midnight on the due date.

  1. (10 points)

    The Swedish Chef is a character from Jim Henson's "The Muppet Show" that is well known for his comically-extreme Swedish accent. In a similar comical vein, in 1993 John Hagerman wrote a Lex program that transforms regular English text into an equivalent version of the text as it would be pronounced by the Swedish Chef. The Lex code for this program is chef.l.

    The Swedish Chef text converter (along with a few similar "linguistic" converters) has also been rendered into an applet by John Chambers, which can be found via the following link.

    Your task in this problem is to figure out how this Lex program works, and provide comments for each of the rules in the translation rules section, describing what English text it converts, and what it converts it to. Be sure to pay attention to the INW and NIW start conditions, as these impact when and/or how the subsequent text is converted.

    When you are uncertain how a particular rule works, you can experiment with the Swedish Chef converter program to gain better understanding as to how/when particular English strings are converted to Swedish Chef-isms. You can do this via the applet, or by compiling the chef.l program into an executable via the following Linux commands:

    flex chef.l
    gcc -o chef lex.yy.c -lfl

  2. (10 points)

    Text messaging has become a very popular mechanism for communicating over cell phones and the Internet in recent years. Over time, a shorthand text messaging language (aka "texting"; "Internet slang" is also related) has come to be developed that minimizes the amount of typing individuals have to do in order to communicate their messages.

    Some common texting shorthands include:
    lol    -    "laugh out loud" (or less commonly, "lots of luck")
    sry    -    "sorry"
    plz    -    "please"
    b4    -    "before"
    etc.

    A list of common text abbreviations is available under those two links on Wikipedia.

    Your task for this problem is to create a Lex program that translates some of these texting shorthands to their full English equivalents. Please have your program convert at least 20 texting shorthands to their regular English equivalent. Please avoid offensive or inappropriate terms (i.e. please don't use "wtf", "omfg", or similar shorthands...).

    Note: Those of you who are experienced with texting surely know that some texting shorthands, such as "lol", are usually not used to convey their literal translation, but are used more like emoticons, which are shorthands that convey emotions. "Lol" is so commonly used that it is more correctly interpreted as a chuckle (/chuckle) than a loud laugh. If you wish, you may provide your own interpretations for some of the texting shorthands that fall into this category (but please don't do this for all shorthands; also be sure to translate some shorthands literally).

  3. (5 points)

    Your task for this problem is to extend the texting converter that you created in Problem #2 to properly perform capitalization at the beginning of a sentence. For example, if the texting shorthand "sry" appears at the beginning of a sentence, it should be converted to "Sorry", whereas if it appears in the middle of a sentence, it will be converted to "sorry".

    Note: You need to have a full understanding of how the Swedish Chef converter in Problem #1 works in order to accomplish this task.