Advanced C Programming

Spring 2019 :: ECE 264 :: Purdue University

⚠ This is for Spring 2019, not the current semester.
Due 3/8

JSON #1

Learning goals

You will learn or practice how to:

  1. Practice writing programs with linked lists.
  2. Write programs using more complex data structures (beyond the cannonical linked list and BST).
  3. Apply test-driven development (TDD).
  4. Use union and enum types.

Overview

You will create a decoder for the JSON data format.

JSON is a file format for exchanging hierarchical data between programs, often written in different programming languages and/or on different computers. In web programming, this allows a server application written in any language (e.g., Python or even C) to pass complex structures of information to the user's browser, which then renders it.

A JSON string may represent a number, a string, or a list. (In this context, when we say “list”, we mean a linked list.) Here are some examples:

type json data
number 10 10
string "ten" "ten"
list [2, 6, 4] linked list with values 2, 6, 4

As in C, whitespace is ignored (except within a string). Thus, the following are all equivalent.

type json data
list [2, 6, 4] linked list with values 2, 6, 4
list [2, 6, "four"] linked list with values 2, 6, "four"
list [2, [3, "+", 3], "four"] linked list with values 2, (3, "+", 3), "four"

Getting Started

  1. Use 264get HW08 to fetch the starter code.
  2. Implement parse_int(…).
    1. Create a trivial test (e.g., 0).
    2. Implement just enough of parse_int(…) so that it passes your trivial test. At this point, it should fail most other tests.
    3. Add another simple trivial test (e.g., 9).
    4. Implement just enough to pass your two tests so far.
    5. Add another test (e.g., 15).
    6. Implement just enough to pass your three tests so far.
    7. Add another test (e.g., -15).
    8. … and so on, until parse_int(…) is finished.
  3. Test your parse_int(…) completely, using miniunit.h or any other method you wish. Get parse_int(…) completely finished and tested to perfection—including error handling—before you do anything else. Seriously… do not do anything else until this is done.
  4. Submit.
  5. Implement print_element(…) just enough to support integers.
  6. Test print_element(…). (This should be easy to test.)
  7. Submit.
  8. Implement parse_element(…) just enough to support integers.
  9. Test parse_element(…). (This should be trivial.)
  10. Submit.
  11. Do the same for parse_string(…).
    • First, get the functionality right. Initially, you will have memory leaks.
    • Implement just enough of free_element(…) so that no memory leaks result from testing parse_string(…)
  12. Test your parse_string(…) thoroughly and make sure it is 100% perfect, and fully tested, before you do anything else.
  13. Submit.
  14. Extend print_element(…) so that it supports strings, too.
  15. Test print_element(…). (Again, this should be easy.)
  16. Submit.
  17. Extend parse_element(…) to support strings.
  18. Test parse_element(…). (Again, easy, but make sure it's perfect.)
  19. Submit.
  20. Implement parse_list(…).
    1. Create a trivial test (e.g., []).
    2. Implement just enough of parse_list(…) so that it passes your trivial test (functionality only). It should fail all other tests with linked lists.
    3. Extend free_element(…) so that your parse_list(…) tests work without any memory leaks (e.g., passes Valgrind).
    4. Add another simple test (e.g., [0]).
    5. Implement just enough to pass your two list tests so far.
    6. Add another simple test (e.g., ["a"]).
    7. Implement just enough to pass your three list tests so far.
    8. Add a test with multiple integers as list items (e.g., [1, 2]).
    9. Implement just enough to pass your tests so far.
    10. Add a test with multiple strings as list items (e.g., ["A", "B"]).
    11. You shouldn't need to add any code for this, but make sure it works 100% before you proceed.
    12. Add a test with an empty list as a list item (e.g., [[]]).
    13. Implement just enough to pass your tests so far.
    14. Add a test with a non-empty list as a list item (e.g., [[1]]).
    15. Implement just enough to pass your tests so far.
    16. Add a few more tests with gradually increasing complexity (e.g., [[1, 2]], [1, [2, 3], 4], [[1, 2], [3, 4], []]).
    17. Test as needed. Get things perfect—including memory (Valgrind) and test coverage (gcov)—and submit at every stage.
  21. Test error cases. Make sure parse_element(…), parse_int(…), parse_string(…), and parse_list(…) return false for incorrect input.
  22. Test that there are no memory leaks, even for incorrect input.

How much work is this?

This assignment will only feasible if you are disciplined in the way you build it. If you try to build it all at once, success is extremely unlikely.

Instructor's solution json.c is 69 sloc for HW08 and 127 sloc for the two halves together. That code is tight, but defensible (meets code quality standards, and uses no methods or language features we haven't covered).

Bonus opportunities

1 bonus point for finishing all of the required portions (including lists) by Fri 3/8.

2 bonus point for handling the following escape sequences: \", \\, \/, \b, \f, \n, \r, \t, \u00▒▒. Interpret these the same as in C. (You may need to search for more information.) These correspond to the nine escape sequences listsed at json.org. For \u00▒▒, you only need to handle \u0020 to \u007e. You can ignore any unicode-specific details, and just treat these as ASCII values (e.g., \u0041 for 'A', etc.). Add #define BONUS_JSON_ESCAPE_SEQUENCES to your json.h if you believe you have finished this.

2 bonus point for handling the constants null, true, and false. Add ELEMENT_NULL and ELEMENT_BOOL to the enum in json.h, and add void* as_null and bool as_bool to the union. Add #define BONUS_JSON_CONSTANTS to your json.h if you believe you have finished this.

4 pointsIf you finish all of the required parts of this homework before the break and are interested in a more substantial bonus opportunity, see the instructor at office hours on Thu or Fri. Another option would be to support JSON “objects” using a BST. This would be more involved.  Update: One student has already done this. More details will follow. In short, this bonus will entail supporting JSON “objects” with BSTs. For the BST node, you will have typedef struct _BSTNode { struct _BSTNode* left; struct _BSTNode* right; Element element; char* key; } BSTNode;. For the parse function: bool parse_object(BSTNode** a_root, char** a_pos) { … }. This note may be amended and/or replaced later, but the above details should be sufficient. Refer to the JSON standard

Bonus options must be submitted with HW09. There will be no partial credit for the bonus options.

Requirements

  1. Your submission must contain each of the following files, as specified:
    file contents
    json.c functions
    parse int(int✶ a value, char✶✶ a pos)
    return type: bool
    Set *a_value to whatever integer value is found at *a_pos.
    1. *a_pos is initially the address of the first character of the integer literal in the input string.
    2. *a_value is the (already allocated) location where the parsed int should be stored.
    3. Return true if a properly formed integer literal was found at *a_pos*a_pos should refer to the next character in the input string, i.e., after the last digit of the integer literal.
      1. Ex: parse_int(…) should return true for 9, -99, and 123AAA.
    4. Return false if an integer literal was not found.  *a_pos should refer to the unexpected character that indicated a problem.
      1. Ex: parse_int(…) should return false for A, AAA123, -A, -A9, and -.
    5. Calling parse_int(…) should not result in any calls to malloc(…).
    6. You do not need to parse hexadecimal, octal, scientific notation, floating point values, or anything other than integers in decimal notation (positive or negative).
    parse string(char✶✶ a string, char✶✶ a pos)
    return type: bool
    Set *a_string to a copy of the string literal at *a_pos.
    1. Caller is responsible for freeing the memory.
    2. A string literal must be surrounded by double quotation marks, and may not contain a newline. In addition, we make two simplifications:
      • Strings may not contain double quotation marks.
      • Backslash is not special. Do not parse escape codes (i.e., "\▒") in the input.
    3. Calling parse_string(…) should result in exactly one call to malloc(…).
    4. Return true if a properly formed string literal was found.  *a_pos should be set to the next character in the input string, i.e., after the ending double quotation mark.
      1. Ex: parse_string(…) should return true for "abc", "abc\", and "abc\z".
    5. Return false if a string literal was not found.  *a_pos should refer to the unexpected character that indicated a problem (e.g., newline or null terminator in the input).
      1. Ex: parse_string(…) should return false for "abc and "abc
        def"
        .
    Correction (3/3/2019): First parameter is char** a_string not char* a_value.
    parse list(Node✶✶ a head, char✶✶ a pos)
    return type: bool
    Set *a_head to the head of a linked list of Element objects.
    1. Caller is responsible for freeing the memory.
    2. Linked list consists of a '[', followed by 0 or more JSON-encoded elements (integers, strings, or lists) separated by commas, and finally a ']'. See the examples above. (There will be no HBB on this definition, unless there is something truly wrong and/or grossly unclear.) There may be any number/amount of whitespace characters (' ', '\n', or '\t'), before/after any of the elements.
    3. Return true if a properly formed list was found.  *a_pos should be set to the next character in the input string, after the list that was just parsed.
      1. Ex: parse_list(…) should return true for [], [1,2], [1,"A"], [1,[2]], and [1]A.
    4. Return false if a list was not found (i.e., syntax error in the input string).  *a_pos should refer to the unexpected character that indicated a problem.
      1. Ex: parse_list(…) should return false for A[], [1,,2], [[, 1,2, and ,2].
    parse element(Element✶ a element, char✶✶ a pos)
    return type: bool
    1. First, eat any whitespace at *a_pos.
      1. “Eat whitespace” just means to skip over any whitespace characters (i.e., increment *a_pos until isspace(**a_pos)==false).
    2. Next, decide what kind of element this is.
      1. If it's a digit (isdigit(**a_pos)) or hyphen ('-'), set the element's type to ELEMENT_INT and call parse int(&a element->value.as int, a pos).
      2. If it's a string (**a_pos=='"'), then set the element's type to ELEMENT_STRING and call parse string(&a element->value.as string, a pos).
      3. If it's a list (**a_pos == '['), then set the element's type to ELEMENT_LIST and call parse list(&a element->value.as list, a pos).
    3. Return whatever was returned by parse_int(…), parse_string(…), or parse_list(…).
      • If none of those functions was called—i.e., if the next character was neither digit, '-', '"', nor '['—then return false.
    4. Do not modify *a_pos directly in parse_element(…), except for eating whitespace.
      • *a_pos can—and should—be modified in parse_int(…), parse_string(…), and parse_list(…).
    print element(Element element)
    return type: void
    Given an Element object, print it in JSON notation.
    1. Spacing is up to you, as long as it is valid JSON.
    2. If element is a string or integer, then print it (with double-quotes) using printf(…).
    3. If element is an integer, print it (with double-quotes) using printf(…).
    4. If element is a list, print a '['. Then print each element in the list using print_element(…) (recursively), separated by commas. Finally, print ']'.
    free element(Element element)
    return type: void
    Free the contents of the Element, as needed.
    1. If it contains a linked list, free the list, including all elements.
    2. If it contains a string, free the string.
    3. Do not attempt to free the Element object itself. free_element(element) only frees dynamic memory that element refers to.
    test_json.c functions
    main(int argc, char✶ argv[])
    return type: int
    1. This is the same as in previous assignments.
    2. Tests must result in 100% code coverage.
    expected.txt test output
    1. This is the same as in previous assignments.
    2. If you are using miniunit.h, you may simply redirect the output of your working tests to expected.txt to create this.
  2. You may ignore any trailing characters in the input string, as long as it starts with a well-formed JSON element.
    • Acceptable: 123AAA, [1,2]AAA, [1,2][, "12"AAA, "12",[,
  3. You only need to support the specific features of JSON that are explicitly required in this assignment description. You do not need to support unicode (e.g., "萬國碼", "يونيكود", "യൂണികോഡ്"), objects/dictionaries (e.g., {"a":1, "b":2}), backslash escapes (e.g., "\n"), embedded quotes (e.g., "He said, \"Roar!\""), floating point numbers (e.g., 3.1415), non-decimal notations (e.g., 0xdeadbeef, 0600), null, false)
  4. Do not modify json.h except as explicitly directed.
  5. There may be no memory faults (e.g., leaks, invalid read/write, etc.), even when parse_▒▒▒(…) return false.
  6. The following external header files, functions, and symbols are allowed.
    header functions/symbols allowed in…
    stdbool.h bool, true, false json.c, test_json.c
    stdio.h printf, fprintf, fputs, stdout, fflush json.c, test_json.c
    assert.h assert json.c, test_json.c
    ctype.h isdigit, isspace json.c, test_json.c
    stdlib.h EXIT_SUCCESS, abs, malloc, free, size_t json.c, test_json.c
    string.h strncpy, strchr, strlen, strcmp json.c, test_json.c
    miniunit.h anything test_json.c
    clog.h anything json.c, test_json.c
    For miniunit.h and clog.h, you can use anything from HW05. You are welcome to change them to your liking, and/or add more in the same spirit. All others are prohibited unless approved by the instructor. Feel free to ask if there is something you would like to use.
  7. Submissions must meet the code quality standards and the course policies on homework and academic integrity.

Submit

To submit HW08, type 264submit HW08 json.c json.h test_json.c expected.txt miniunit.h clog.h from inside your hw08 directory. If your code does not depend on miniunit.h or clog.h, those may be omitted. Your json.h will most likely be identical to the starter.

Pre-tester

The pre-tester for HW08 has not yet been released. As soon as it is ready, this note will be changed.

Q&A

  1. How do the parameters work together at the high level?
    Here's a light test. This would not be adequate on its own.
    Example of how to test
  2. How can I structure my tests?
    An example was posted in the snippets from 3/5/2019.
  3. What should be the value of *a_pos after parse_▒▒▒(…) returns?
    _____________________________________________
    # EXAMPLE #1
    
    INPUT:
    
        123
    
    BEFORE we call parse_int(…):
    
        123
        ↑
        *a_pos
    
    RETURN value from parse_int(…):
    
        true
    
    After parse_int(…) returns:
    
        123
           ↑
           *a_pos refers to null terminator just
            after the integer literal.
    
        element.type  == ELEMENT_INT
        element.value == 123
    
    _____________________________________________
    # EXAMPLE #2
    
    INPUT:
    
        123ABC
    
    BEFORE we call parse_int(…):
    
        123ABC
        ↑
        *a_pos
    
    RETURN value from parse_int(…):
    
        true
    
    After parse_int(…) returns:
    
        123ABC
           ↑
           *a_pos refers to the non-digit character
            after the integer literal.
    
        element.type  == ELEMENT_INT
        element.value == 123
    
    _____________________________________________
    # EXAMPLE #3
    
    INPUT:
    
        -A1
    
    BEFORE we call parse_int(…):
    
        -A1
        ↑
        *a_pos
    
    RETURN value from parse_int(…):
    
        false
    
    After parse_int(…) returns:
    
        -A1
         ↑
         *a_pos refers first character that informed
          us this cannot be an integer literal.
    
        element.type  == (don't care)
        element.value == (don't care)
  4. May I use anonymous structs/unions?
    Yes. You will need to make a few changes. (This is optional.)
    json_c11.h.  Run 264get again to get the updated starter files. You will find two header files: json.h and json_c11.h. Delete the old one json.h to avoid confusion. Fom now on, you will use only json_c11.h.
    json.c and test_json.c.  Change #include "json.h" to #include "json_c11.h". Do not include both. In your code, change element.value.▒▒▒ to element.▒▒▒.
    .bashrc.  Update your .bashrc (cp -i ~ece264/19sp/.bashrc ~).
    Makefile.  Change -std=c99 to -std=c11.
    When you submit, you will submit json_c11.h in lieu of json.h. Submitting both won't hurt anything on our end, but keeping them both around may cause you confusion.
    Note: json_c11.h defines a symbol (#define JSON_C11), which we will use when scoring your homework to ensure that our tests use the appropriate syntax.
  5. Do I have to switch to C11?
    No.
  6. How will you compile my code, if I'm not using C11?
    From this point on, we will compile with C11. This should work for everyone.
  7. What does the output of print_element(…) look like?
    It's just the inverse operation to parse_element(…).
    parse_element(…) takes JSON as input.
    print_element(…) prints JSON as output.
    If you were to parse the output of print_element(…) with parse_element(…) you should get an equivalent object.
    If you parse a JSON string and then print it again, you should get an equivalent string.
    If you're looking for concrete examples, just look at any example of input to parse_element(…) (except for the trailing characters). There are several at the top of this assignment description page.
  8. The specification says we don't have to handle escape sequences, but then it mentions escape sequences. Do we parse escape sequences or don't we? How do we handle backslash?
    We use C escape codes to make C string literals containing certain characters (e.g., double-quote, newline, etc.) in our C code.
    You don't have to parse JSON escape codes. Unless you are doing the escape sequence bonus, just treat a backslash like any other character.
    Here's an example to illustrate the distinction:
    #include <stdlib.h>
    #include <assert.h>
    #include <string.h>
    #include <stdio.h>
    #include "json_c11.h"
    
    int main(int argc, char* argv[]) {
        Element element;   // 'element' will be initialized inside parse_element(…)
    
        // C escape codes used to create a C string literal containing double quotes
        char* json_input = "\"A\""; // same as: {'\"', 'A', '\"'}
        char* pos = json_input;
        assert(strlen(json_input) == 3);
        parse_element(&element, &pos);
        printf(">>>|%s|<<<\n", element.as_string);
        // Output:
        // >>>|A|<<<
    
        // JSON escape codes
        json_input = "\"A\\n\"";  // same as: {'\"', 'A', '\\', 'n', '\"'}
        assert(strlen(json_input) == 5);
        pos  = json_input;
        parse_element(&element, &pos);
        printf(">>>|%s|<<<\n", element.as_string);
        // Output:
        // >>>|A\n|<<<
    
        // Output: (with escape code bonus)
        // >>>|A
        // B|<<<
    
        // Note: strlen(…) does not count the null terminator
    
        return EXIT_SUCCESS;
    }
  9. Should the double quotes be stored in memory?
    No. The double quotes are part of the JSON syntax.
    This is just like how in C, when you define a string like this:
    char s[] = "abc";
    … the double quotes are not stored in memory.
  10. How much code should I end up with?
    Here's a screenshot of the instructor's solution. The parts that had to be added to support lists (i.e., HW09) are highlighted in yellow.
    implementation code screenshot redacted test code screenshot redacted

Updates

3/2/2019 Corrections: print_element(…)
struct _Node* as_list; } value; … in starter code (json.h), line 9 (definition of Element)
3/3/2019 Corrections: free_element(…) returns void
parse_string(char** a_s, char** a_pos)
submission files += expected.txt, miniunit.h and clog.h
Clarifications: *a_pos, return codes
3/4/2019 Split assignment into HW08 + HW09. OK to use size_t, strlen(…).
OK to ignore trailing characters in input.
parse_element(…) may modify *a_pos in order to eat whitespace.
Reminder: No memory leaks, even when parse_▒▒▒(…) return false.
Bonus: '\t' (not '\h')
3/6/2019 Added *a_pos examples (Q&A #2), option for C11 (anonymous unions/structs); ok to use strcmp(…); isspace(**a_pos)