Home
Netbeans Eclipse Qt Java
Games
College of Engineering Aeronautics and Astronautics Agricultural and Biological Engineering Biomedical Engineering Chemical Engineering Civil Engineering Construction Engineering and Management Electrical and Computer Engineering Engineering Education Engineering Professional Education Environmental and Ecological Engineering Industrial Engineering Materials Engineering Mechanical Engineering Nuclear Engineering
EPICS (Engineering Projects In Community Service) First-Year Engineering Program First-Year Engineering Honors Program Global Engineering Program Minority Engineering Program Professional Practice (Co-Op) Program Women in Engineering Program
College Administration Schools Programs All Groups All People ECN Webmail
Purdue Home

ECE 264 Spring 2009

Programming Assignment 1

String Sorting

Please read the guideline first.

You should never add space or any special symbol (such as %$#) in a file's or a directory's name. The international standard for file names has no space.  If your submission (throughout the whole semester) contains a file whose name includes space, the submission will not be graded.

Purpose

Learn string processing and sorting. This program is a simplified version of the sort program. The sort program in UNIX allows a user to use "-k n" to specify using the n-th field as the key for sorting. 

Example

Consider this example and the ouputs:

The second output is sorted by the second word (after blank) in each line.  The third output is sorted by the third word in each line.  All lines in this example have four words. However, your program must be able to handle lines that have more or fewer words.

String Comparison

C provides a library function strcmp(string1, string2) to compare two strings.  To see the definition, you should check the manual of the library. In most UNIX computers, you can simply type "man strcmp" to see the explanation. You can also search the Internet for "man strcmp."  This comparison is defined by lexicographical order (also known as dictionary order or alphabetic order). For example a < b, aa < ab, cxyz < dabc.  If string1 is less than string2, strcmp returns a negative value. If the two strings are the same, strcmp returns zero. If string1 is greater than string2, strcmp returns a positive value. 

If you want to be a good programmer, you need to know as many library functions as possible. You need to be familiar with using the manual pages.

This library function is convenient; however, it does not easily handle comparing two strings using the second, the third, or the fourth word. The problem is that the words may have different lengths and words are separated by spaces.

Separate Words

For this program, a word is defined as the [a-z|A-Z]+. This is a regular expression

  • a-z means a character between a and z.
  • A-Z means a character between A and Z.
  • a-z|A-Z means either (1) a character between a and z or (2) a character between A and Z.
  • [a-z|A-Z]+ means a word that is followed by a character between a and z or between A and Z is also a word.

Based on this definition, abc, Avbu, yuIPoVZ are words. 78uy, a%$, b9i# are not words. When we define words, we choose the longest possible definitions for each word. For example, "This is a dictionary." can have different ways to defined the first word, such as "T", "Th", "Thi", or "This".  However, the longest possible definition is "This".  Therefore, the first word in this sentence is "This".  The second word is "is" and the third word is "a".  "This is a dictionary." has four words: "This", "is", "a", and "dictionary".

Assignment Requirements

Write a C program that takes three (additional, other than the program's name) input arguments:

  1. the name of the input file
  2. an integer to specify which word to use for comparison. The value 1 means the first word (after no space). The value 2 means the second word (after one and only one space).
  3. the number of lines to process. If a file has more lines, ignore the remaining lines. If a file has fewer lines, your program processes all lines in the file.

The output of your program is the sorted lines based on the word given as the second argument. If two lines have the same word, use the words after this to determine the order.  For example "David Johnson\n" and "David Anderson\n" have the same first word "David". The second word "Johnson" is larger than "Anderson". Hence, "David Anderson\n" should appear before "David Johnson\n". Here, '\n' is the newline character.

Assumptions

The following assumptions are established to simplify your program.

  • There will be only English alphabets [a-z|A-Z] or spaces ' ', no number [0-9]+ nor symbols.
  • Each line ends with the newline character '\n'.
  • Each line in the input has at least one word. There is no empty line.
  • Each line has no more than 200 characters (including spaces).  If a line contains more than 60 characters (including newline and spaces), ignore all characters after 60 characters and use '\n' for the character before '\0'.
  • The first character in each line must be an alphabet and cannot be a space.
  • There is one and only one space between two adjacent words. There must be at least one word after a space, i.e. a line cannot end with space and newline.
  • Suppose the second argument is p (sort by p-th word), then after trimming to 60 characters, it is guaranteed that if one line has less than p words, all lines will have less than p words; if one line has at least p words, all lines will have at least p words.  In other words, you do not need to consider the situation when one line has more than p words and another has less than p words.
  • If two lines have the same suffixes for comparison, do not change their order. For example, "Susan Smith\n" appears before "Mary Smith\n" in the input file and both have the same suffixes "Smith".  If we use the second word to compare them, their order should be the same as the input file: "Susan Smith\n" appears before "Mary Smith\n" in the output.  (Hint: Bubble sort in the lecture cannot meet this requirement.  Please modify the algorithm a little bit, or seek another sorting method, which should be easy and intuitive.  Click here for reference.)

String Order

In this assignment, you can assume that there are only [a-z|A-Z]+, space (' '), and newlines ('\n').    Coniser two lines of n words each.  Suppose the second argument is p (choose the p-th word, p is a positive integer).

  • If n >= p the order of the two lines is defined by their dictionary order of the p-th word.
  • If n < p, the order of the two lines is defined by their (whole lines) dictionary order.

Based on this definition, any two lines have a unique order for any positive integer p.  Here "dictionary order" is determined by ASCII and therefore is case-sensitive.  For example, word "Zigzag" should appear before "apple" since the ASCII code for 'Z' is 90, less than 97, which is the ASCII code for 'a'.

Handle Error

  • If a user does not specify the three arguments, the program terminates with an error message "Need three arguments.\n"
  • If one of the numbers is (1) not an integer or it is (2) zero or (3) negative, the program terminates with an error message "Invalid number.\n"
  • If the value of the second argument (p) is equal to or larger than 60 (the maximum admissible number of characters per line), the program terminates with an error message, "Cannot have so many words per line.\n"
  • If the input file cannot be read, the program terminates with an error message "Cannot open file.\n"
  • If the program terminates abnormally, the main function must return -1. If the program terminates normally, the main function returns 0.

It is suggested that your program should detect the above errors in the order as they are listed.  The error messages will also be graded by an automatic grading script, so the message text should be identical to what is specified above.  You may lose credits if you use a different text that is not recognized by the script.

You should never use assertions in any program. Please read exercise4 for the reasons.

What to submit?

  • The source code named "pa1.c".
  • README to explain what you have done and how you do it. You need to explain how you compare two lines based on the n-th words. The file name must be README, not readme, not Readme, not README.txt, not README.pdf, nor README.doc. The file must be readable using "more" in Linux. This is a widely accepted convention (both file name and format). Submitting a wrong file name or a wrong format is a common mistake.
  • Do not include object files, executable files, sample input, or sample output.

Grading (total 8 points)

This programming assignment will be graded by another program. Make sure your program's output is exactly the same as specified. Any deviation from the specification will likely lose points.  You can use "diff" to compare the output of your program with the output of the sample program.

If you do not finish the complete program, explain clearly in README what your program can do. You may receive some points. If your program does not meet the specification and there is no README, you will receive zero.

  • 1: report errors for missing arguments, wrong arguments, file cannot open, and so on (as specified above)
  • 1: can read the input file as many lines as specified by the third argument. This argument is smaller than the total number of lines in the file. The program can print the entire file line by line, if your program cannot sort the lines. If your program can sort the lines, do not print the file.
  • 1: can read the input file to the end or as many lines as specified by third argument.  The program can print the file line by line, if your program cannot sort the lines. If your program can sort the lines, do not print the file.
  • 1.5: can sort the lines based on the first words of each line and print the sorted lines.
  • 2: can sort the lines based on the n-th word, specificed by the second argument, and print the sorted lines.
  • 0.5: README explaining how to sort the lines based on the n-th word.
  • 1: programming style and clarity
    • 0.2 meaningful variable names (not just i, j, k, x, y...).
    • 0.2 proper indentation.
    • 0.2 proper names (noun for variables, verb for functions).
    • 0.2 one statement per line, no overly complex statements, no line exceeds 80 characters (at least one newline every 80 characters).
    • 0.2 sufficient comments in the code.
  • -1: for each warning message reported by -Wall.
  • -0: memory leak reported by valgrind. The program has to release allocated memory before the program terminates. Since memory leak is Exercise 5, you receive no penalty in this assignment.
  • You will receive zero if your program contains one or multiple syntax errors or crashes during tests.
  • You will receive zero if your program cannot run on msee190pcxx.ecn.purdue.edu.
  • The lowest score is zero. You will not receive a negative score.

If you submit 72 hours before the deadline, your submission will be graded and you may correct mistakes and submit again before the deadline. Your final score for this assignment will be no lower than the score received in this "pre submission."

This class has zero tolerance of plagiarism. Please remember that copying code from your classmate is strictly forbidden. Both students (copying and being copied) will receive F, even if this is your first violation.  You should know that technology can easily detect code similarities even if after all variable names have been changed. 

Performance

The score of your program is not determined by its performance. However, your program has to be "reasonably fast" so that it can be graded. If your program takes longer than 15 times of the sample solution, your program will not be graded and you will receive zero.  The sample solution takes 6.5 seconds for an input file of 10,000 lines, 8 words each line. Your program cannot take more than 2 minutes for a similar input.

Sample Inputs, Outputs, and Solution

  • input100 generated by "inputgen 100 6"
    • output10 generated by "pa1 input100 1 10"
    • output50 generated by "pa1 input100 6 50" (NEW, Please test your program using this sample)
    • output100 generated by "pa1 input100 1 100"
  • input200 generated by "inputgen 200 7"
    • output150 generated by "pa1 input200 4 150"

C program to generate inputs: inputgen.c.

Executable sample solution. This program can run on Linux-Intel computers only, such as msee190pcxx, algolxx, or qstructxx. If you discover that the sample solution does not meet the requirements, please post the problem in Blackboard - Discussion.

If you have difficulty getting started, this file may help.