JSON parse_int(…)
Learning goals
You will learn or practice how to:
- Parse strings
- Apply test-driven development (TDD).
- Use
unionandenumtypes. - Write programs that use dynamic memory (
malloc(…)) - Debug code that uses dynamic memory (
malloc(…)) - Practice writing programs with linked lists.
- Write programs using more complex data structures (beyond the cannonical linked list and BST).
- Learn to refactor existing programs with new requirements.
- Read and write text files using
FILE*and the related functions. - Work with binary search trees as part of a larger program.
- Combine all major learning objectives into a single program that demonstrates almost everything you learned in this class.
Overview
This is part 1 of a 4-part sequence in which you will create a decoder for the JSON data format.
JSON is a file format for exchanging hierarchical data between programs, often written in different programming languages and/or on different computers. In web programming, this allows a server application written in any language (e.g., Python or even C) to pass complex structures of information to the user's browser, which then renders it.
A JSON string may represent a number, a string, a boolean, null, an object, or a list. In this context, when we say “list”, we mean a linked list. (Linked lists are a data structure which we will cover in class soon.) Here are some examples:
| type | json | data |
|---|---|---|
| number | 10 | 10 |
| string | "ten" | "ten" |
| list | [2, 6, 4] |
|
| boolean | true | true |
| null | null | |
| object | {"key1": "value", "key2": true} | Imagine a lovely picture of a binary search tree here. I plan to draw one during lecture. |
Parts of this assignment
Originally, this was one assignment. Over time, we have split it into pieces to help you manage your time and efforts. (The assignment has also evolved, as all assignments do.)
| HW08 |
|
| HW11 |
|
| HW13 |
|
| HW15 |
|
| EC03 |
|
Linked lists
As in C, whitespace is ignored (except within a string). Thus, the following are all equivalent.
| type | json | data |
|---|---|---|
| list | [2, 6, 4] |
|
| list | [2, 6, "four"] |
|
| list | [2, [3, "+", 3], "four"] |
|
| list | [{"key": "value"}, {"key": "another value"}] | Also plan to draw this during lecture |
| object | {"array": [1, 2, 3]} | And this one too, its a lot of drawing to do just to post this! |
| object | {"nested object": {"key": 5}} | Yeah, pictures |
Getting Started on HW08
- Use
264get HW08to fetch the starter code.you@eceprog ~/ $cd 264you@eceprog ~/264/ $264get hw08 - Copy your miniunit.h, log_macros.h, and Makefile from any previous assignment.
you@eceprog ~/264/ $mv HW██/miniunit.h HW██/log_macros.h HW██/Makefile -v -t hw08/you@eceprog ~/264/ $cd hw08you@eceprog ~/264/hw08 $ - Implement
parse_int(…).- Create a trivial test (e.g., 0).
- Implement just enough of
parse_int(…)so that it passes your trivial test. At this point, it should fail most other tests. - Submit.
- Add another simple trivial test (e.g., 9).
- Implement just enough to pass your two tests so far.
- Submit.
- Add another test (e.g., 15).
- Implement just enough to pass your three tests so far.
- Submit.
- Add another test (e.g., -15).
- … and so on, until
parse_int(…)is finished.
- Test your
parse_int(…)completely, using miniunit.h. Getparse_int(…)completely finished and tested to perfection—including error handling—before you do anything else. Seriously… do not do anything else until this is done. - Submit.
- Implement
parse_element(…)just enough to support integers.- Add a test for a plain integer.
Be sure to call it from
main(…).static int _test_parse_element_int_plain() { mu_start(); //──────────────────── Element element; // will be initialized by parse_element(…) char const* input = "1"; char const* pos = input; bool is_success = parse_element(&element, &pos); mu_check(is_success); mu_check(element.as_int == 1); mu_check(element.type == ELEMENT_INT); mu_check(pos == input + 1); // pos should now refer to the byte just after the '1' at input. mu_check(*pos == '\0'); // That byte should be the null terminator. //──────────────────── mu_end(); }
- Add just enough code to
parse_element(…)to make this test pass. Expect to add about 2−4 lines of code insideparse_element(…). - Test.
- Submit.
- Extend the test function above to test for negative numbers.
// negative number input = "-2"; pos = input; is_success = parse_element(&element, &pos); mu_check(is_success); mu_check(element.as_int == -2); mu_check(element.type == ELEMENT_INT); mu_check(pos == input + 2); // pos should now refer to the byte just after the '2'. mu_check(*pos == '\0'); // That byte should be the null terminator.
- Test.
- Submit.
- Add a test for a integer with leading whitespace.
Be sure to call it from
main(…).static int _test_parse_element_with_leading_whitespace() { mu_start(); //──────────────────── Element element; // will be initialized by parse_element(…) char const* input = " 1"; char const* pos = input; bool is_success = parse_element(&element, &pos); mu_check(is_success); mu_check(element.as_int == 1); mu_check(element.type == ELEMENT_INT); mu_check(pos == input + 3); // pos should now refer to the byte just after the '1' at input. mu_check(*pos == '\0'); // That byte should be the null terminator. //──────────────────── mu_end(); }
- Add just enough code to
parse_element(…)to make this test pass. Expect to add about 1−3 lines of code insideparse_element(…). - Test.
- Submit.
- Add a test function for some more cases involving integers. Here is a start. This is not adequate. You need to extend this with your own test cases.
static int _test_parse_element_int_oddballs() { mu_start(); //──────────────────── Element element; // will be initialized by parse_element(…) char const* input = " 4 A"; char const* pos = input; bool is_success = parse_element(&element, &pos); mu_check(is_success); mu_check(element.as_int == 4); mu_check(element.type == ELEMENT_INT); mu_check(pos == input + 2); // pos should now refer to the byte just after the '1' at input. mu_check(*pos == ' '); // That byte contains a space. // TODO: Add more tests here based on your ideas for tests. //──────────────────── mu_end(); }
- You may or may not not need to add any code to make these tests pass.
- Test.
- Submit.
- Add a test function for strings that contain integers but are not valid JSON because they are
preceded by junk characters (not whitespace).
static int _test_parse_element_invalid() { mu_start(); //──────────────────── Element element; // will be initialized by parse_element(…) char const* input = "--4"; char const* pos = input; bool is_success = parse_element(&element, &pos); mu_check(! is_success); mu_check(pos == input + 1); // pos should now refer to second '-' since that is the // character where we learn that this is not a valid JSON // expression for an int. mu_check(*pos == '-'); // That byte contains a '-'. // TODO: Add more tests here based on your ideas for tests. //──────────────────── mu_end(); }
- You may or may not not need to add any code to make these tests pass.
- Test.
- Submit.
- Add a test for a plain integer.
Be sure to call it from
- Implement
parse_string(…).- Create a test function for a valid JSON expression representing an empty string.
static int _test_parse_string_valid_empty() { mu_start(); //────────────────────────────────────────────────────── char* result; char const* input = "\"\""; char const* pos = input; bool is_success = parse_string(&result, &pos); mu_check(is_success); // because the input is valid mu_check_strings_equal("", result); mu_check(pos == input + 2); mu_check(*pos == '\0'); free(result); //────────────────────────────────────────────────────── mu_end(); }
- Implement just enough of
parse_string(…)to make this test pass. - Test.
- Submit.
- Create a test function for a valid JSON expression representing a string of size one.
static int _test_parse_string_valid_one_char() { mu_start(); //────────────────────────────────────────────────────── char* result; char const* input = "\"A\""; char const* pos = input; bool is_success = parse_string(&result, &pos); mu_check(is_success); // because the input is valid mu_check_strings_equal("A", result); mu_check(pos == input + 3); mu_check(*pos == '\0'); free(result); //────────────────────────────────────────────────────── mu_end(); }
- Implement just enough of
parse_string(…)to make this test pass. - Test.
- Submit.
- Create a test function for a valid JSON expression representing a string containing multiple characters.
static int _test_parse_string_valid_multiple_chars() { mu_start(); //────────────────────────────────────────────────────── char* result; char const* input = "\"ABC\""; char const* pos = input; bool is_success = parse_string(&result, &pos); mu_check(is_success); // because the input is valid mu_check_strings_equal("ABC", result); mu_check(pos == input + 5); mu_check(*pos == '\0'); free(result); //────────────────────────────────────────────────────── mu_end(); }
- Implement just enough of
parse_string(…)to make this test pass. - Test.
- Submit.
- Add test function(s) to check strings that contain double quotation marks but are not valid JSON. Here is a start. You will need to add more of your own.
static int _test_parse_string_invalid() { mu_start(); //────────────────────────────────────────────────────── char* result; char const* input = "\"A"; char const* pos = input; bool is_success = parse_string(&result, &pos); mu_check(! is_success); // because the input is valid mu_check(pos == input + 2); mu_check(*pos == '\0'); // We do not call free(…) if the string was invalid. input = "\"A\nB\""; pos = input; is_success = parse_string(&result, &pos); mu_check(! is_success); // because the input is valid mu_check(pos == input + 2); mu_check(*pos == '\n'); // TODO: Add more tests of your own. //────────────────────────────────────────────────────── mu_end(); }
- Create a test function for a valid JSON expression representing an empty string.
-
Extend
parse_element(…)to support strings. You will need to implementfree_element(…)in the same step.- Add a test for a plain string containing multiple characters.
Be sure to call it from
main(…).static int _test_parse_element_string() { mu_start(); //──────────────────── Element element; // will be initialized by parse_element(…) char const* input = "\"ABC\""; char const* pos = input; bool is_success = parse_element(&element, &pos); mu_check(is_success); mu_check_strings_equal("ABC", element.as_string); mu_check(element.type == ELEMENT_STRING); mu_check(pos == input + 5); // pos should now refer to the byte just after the second double quote. mu_check(*pos == '\0'); // That byte should be the null terminator. free_element(element); //──────────────────── mu_end(); }
- Add just enough code to
parse_element(…)to make this test pass. Expect to add about 4 lines of code insideparse_element(…)and 1−3 lines insidefree_element(…). - Test.
- Submit.
- Add additional tests as needed to ensure that
parse_element(…)works for JSON expressions representing strings and integers. - Test.
- Submit.
- Add a test for a plain string containing multiple characters.
Be sure to call it from
- Implement
print_element(…)so that it supports integers and strings.- Add a test for
Elementobjects that contain an integer. This will not use Miniunit. Since this is simple, you an use visual inspection (just look).static void _test_print_element() { Element element; // will be initialized by parse_element(…) char const* input = "123"; bool is_success = parse_element(&element, &input); printf("Testing parse_element(&element, \"123\")\n"); printf(" - Expected: 123\n"); printf(" - Actual: "); print_element(element); fputc('\n', stdout); free_element(element); }
- Add just enough code to
print_element(…)to make this test pass. Expect to add about 1―3 lines of code insideprint_element(…). - Test.
- Submit.
- Add a test for
Elementobjects that contain an string. Again, this will not use Miniunit.static int _test_print_element() { mu_start(); //────────────────────────────────────────────────────── // This uses diff testing to check print_element(…). Element element; // will be initialized by parse_element(…) char const* input = "\"ABC\""; bool is_success = parse_element(&element, &input); printf("Testing parse_element(&element, \"\\\"ABC\\\"\")\n"); printf(" - Expected: \"ABC\"\n"); printf(" - Actual: "); print_element(element); fputc('\n', stdout); free_element(element); //────────────────────────────────────────────────────── mu_end(); }
- Add just enough code to
print_element(…)to make this test pass. Expect to add about 3―5 lines of code insideprint_element(…). - Test.
- Submit.
- Add a test for
- Implement
parse_list(…).- Create a trivial test (e.g., []).
- Implement just enough of
parse_list(…)so that it passes your trivial test (functionality only). It should fail all other tests with linked lists. - Extend
free_element(…)so that yourparse_list(…)tests work without any memory leaks (e.g., passes Valgrind). - Add another simple test (e.g., [0]).
- Implement just enough to pass your two list tests so far.
- Add another simple test (e.g., ["a"]).
- Implement just enough to pass your three list tests so far.
- Add a test with multiple integers as list items (e.g., [1, 2]).
- Implement just enough to pass your tests so far.
- Add a test with multiple strings as list items (e.g., ["A", "B"]).
- You shouldn't need to add any code for this, but make sure it works 100% before you proceed.
- Add a test with an empty list as a list item (e.g., [[]]).
- Implement just enough to pass your tests so far.
- Add a test with a non-empty list as a list item (e.g., [[1]]).
- Implement just enough to pass your tests so far.
- Add a few more tests with gradually increasing complexity (e.g., [[1, 2]], [1, [2, 3], 4], [[1, 2], [3, 4], []]).
- Test as needed. Get things perfect—including memory (Valgrind) and test coverage (gcov)—and submit at every stage.
- Test error cases. Make sure
parse_int(…)returns false for incorrect input. - Create
print_element_to_file(…).-
Rename
print_element(…)toprint_element_to_file(…)and add the extraFILE*parameter. Make no other changes yet. -
Recreate
print_element(…)by having it callprint_element_to_file(…). You should only need a single line of code. - Ensure all tests for
print_element(…)still pass. - Submit.
-
Add tests for
print_element_to_file(…). You will need to usefopen(…)andfclose(…). - Ensure your new tests are failing.
- Make use of the
FILE*parameter fromprint_element_to_file(…) - Ensure both your new test and your previous tests for
print_element(…)work. - Submit.
-
Rename
- Implement
write_json(…).- Implement tests for the function. They can similar to your tests for
print_element_to_file(…). - Implement
write_json(…). It should callprint_element_to_file(…). - Submit.
- Implement tests for the function. They can similar to your tests for
- Implement
read_json(…).- We recommend creating a helper that reads all charaters in a file and returns a malloced string of the contents. It might be worth testing your helper on its own before moving on.
- Add tests for
read_json(…). - Implement
read_json(…). Hint: if you useparse_json(…)alongside the suggested helper, the implementation should be trival. - Ensure your tests are passing.
- Submit.
- Implement
parse_null(…).- Add support for null to the enum and union for
Element. - Ensure no previous tests broke.
- Add some tests for valid null.
- Implement enough code to pass the tests.
- Submit.
- Add some tests for invalid null.
- Implement enough code to pass the tests, this may require no additional code.
- Submit.
- Add tests for other JSON functions (e.g.
parse_json(…)) using null. - Implement code in other functions for null.
- Ensure all tests are passing, both new and old.
- Submit.
- Add support for null to the enum and union for
- Implement
parse_boolean(…).- Add support for booleans to the enum and union for
Element. - Ensure no previous tests broke.
- Add some tests for valid booleans.
- We recommend adding a helper to allow reusing some logic from
parse_null(…). - Implement enough code to pass the tests.
- Submit.
- Add some tests for invalid booleans.
- Implement enough code to pass the tests, this may require no additional code.
- Submit.
- Add tests for other JSON functions using booleans.
- Implement code in other functions for booleans.
- Ensure all tests are passing, both new and old.
- Submit.
- Add support for booleans to the enum and union for
- Create or update functions adjacent to
parse_object(…)- Update json.h to support objects, and add all object related function headers.
-
Create a test for
insert_element(…)andget_element(…). Feel free to do this in a single stage or split based on the size of the tree. - Implement enough code to pass the tests.
- Submit.
- Next, write tests for
print_element(…)orwrite_json(…)to support objects. This will help you debug. - Implement enough code to pass the tests. You should only need to modify
print_element_to_file(…). - Submit.
- Update your tests to use
free_element(…)for JSON objects - Update
free_element(…)to ensure no memory leaks in previous tests. You will need a recursive helper function to free the tree, recommended signature:void _free_tree(BSTNode** a_root). - Submit.
- Create
parse_object(…)- Start by updating
parse_element(…)to support JSON objects. This will allow you to write tests usingparse_json(…)orread_json(…)if you wish. - Next, create tests to parse an empty object.
- Implement and submit.
- Next, create tests to parse an element with a single key and value.
- Implement and submit.
- Next, update the tests to parse two key and value pairs.
- Implement and submit.
- Update the tests to parse any number of key and value pairs.
- Implement and submit.
- Finally, add tests to handle any invalid cases you did not handle before.
- Implement and submit.
- Start by updating
- Test that there are no memory leaks, even for incorrect input.
How much work is this?
You must be disciplined in your development appraoch. If you try to build this all at once, success is extremely unlikely.
HW08:
Instructor's solution for
parse_int(…) only is 18 sloc
(14 sloc in the body of parse_int(…)).
HW11:
Instructor's solution for
parse_int(…),
parse_string(…), and
parse_element(…) only
is 75 sloc.
HW13:
Instructor's solution for
parse_int(…),
parse_string(…),
parse_list(…), and
parse_element(…)
is 127 sloc.
HW15:
Instructor's solution for
parse_null(…) and parse_boolean(…) is 15 sloc (5 sloc in the body of a helper called by both).
HW15:
Instructor's solution for
write_json(…) and read_json(…) is 27 sloc
(in addition to the sloc from the body of print_element_to_file(…)).
SLOC means “source lines of code” and does not lines that are blank or contain only comments. The figures above do not count test_json.c. Also, those figures are based on a tight—but defensible—solution that meets code quality standards and uses no methods or language features we have not yet covered.
Requirements
- Your submission must contain each of the following files, as specified:
file contents json.c functions parse int(int✶ a value, char const✶✶ a pos)→ return type: boolSet*a_valueto whatever integer value is found at*a_pos.*a_posis initially the address of the first character of the integer literal in the input string.*a_valueis the (already allocated) location where the parsed int should be stored.- Return
trueif a properly formed integer literal was found at*a_pos.*a_posshould refer to the next character in the input string, i.e., after the last digit of the integer literal.- Ex:
parse_int(…)should returntruefor 9, -99, and 123AAA.
- Ex:
- Return
falseif an integer literal was not found.*a_posshould refer to the unexpected character that indicated a problem.- Ex:
parse_int(…)should returnfalsefor A, AAA123, -A, -A9, and -.
- Ex:
- ⚠ Calling
parse_int(…)should not result in any calls tomalloc(…). - You do not need to parse hexadecimal, octal, scientific
notation, floating point values, or anything other than
integers in decimal notation (positive or negative).
You may assume any
integer is within the range of an int on our platform
(i.e., ≥
INT_MINand ≤INT_MAX). - Whenever
parse_int(…)returnsfalse,*a_valueshould not be modified. parse_int(…)should ignore any trailing characters that come after the last digit in the number.
Examples
parse_int(…)should returntruefor any of these:
9 -99 768336YAK 768336 EMUparse_int(…)should returnfalsefor any of these:
A AAA123 -A -A9 -parse string(char✶✶ a string, char const✶✶ a pos)→ return type: boolSet*a_stringto a copy of the string literal at*a_pos.- Caller is responsible for freeing the memory.
- A string literal must be surrounded by double quotation marks, and
may not contain a newline. In addition, we make two simplifications:
- Strings may not contain double quotation marks.
- Backslash is not special. Do not parse escape codes (i.e., "\▒") in the input.
- Calling
parse_string(…)should result in exactly one call tomalloc(…). - Return
trueif a properly formed string literal was found. *a_pos should be set to the next character in the input string, i.e., after the ending double quotation mark. - Return
falseif a string literal was not found. *a_pos should refer to the unexpected character that indicated a problem (e.g., newline or null terminator in the input). - Whenever
parse_string(…)returnsfalse, do not modify*a_string, and no heap memory should be allocated. parse_string(…)should ignore any trailing characters that come after the closing double quotation mark.
Examples
parse_string(…)should returntruefor any of these:
"antfox" "emubee\" "tompup\z" "ratkoi"YAK "doecod" YAKparse_string(…)should returnfalsefor any of these:
" henjay" "ram
dog" 768336"eelfly" 768336 "catbug"parse list(Node✶✶ a head, char const✶✶ a pos)→ return type: boolSet*a_headto the head of a linked list ofElementobjects.- Caller is responsible for freeing the memory if
parse_list(…)returnstrue. - Linked list consists of a
'[', followed by 0 or more JSON-encoded elements (integers, strings, or lists) separated by commas, and finally a']'. See the examples above. (There will be no HBB on this definition, unless there is something truly wrong and/or grossly unclear.) There may be any number/amount of whitespace characters (' ','\n', or'\t'), before/after any of the elements. - Return
trueif a properly formed list was found. *a_pos should be set to the next character in the input string, after the list that was just parsed.- Ex:
parse_list(…)should returntruefor [], [1,2], [1,"A"], [1,[2]], and [1]A.
- Ex:
- Return
falseif a list was not found (i.e., syntax error in the input string). *a_pos should refer to the unexpected character that indicated a problem.- Ex:
parse_list(…)should returnfalsefor A[], [1,,2], [[, 1,2, and ,2]. - Whenever parse_list returns
False, you must ensure that all heap memory allocated as a result of that call toparse_list(…)is freed. In other words, testingparse_list(…)with an invalid JSON list representation (e.g.,["one",[1,2],3,,]) should not result in a memory leak.
- Ex:
parse_list(…)should ignore any trailing characters that come after the closing square bracket.
parse boolean(bool✶ a value, char const✶✶ a pos)→ return type: boolSet*a_valueto whatever boolean value is found at*a_pos.*a_posis initially the address of the first character of the boolean literal in the input string.*a_valueis the (already allocated) location where the parsed boolean should be stored.- Return
trueif a properly formed boolean literal was found at*a_pos.*a_posshould refer to the next character in the input string, i.e., after the last character of the boolean literal. Note that trailing characters are acceptable, just like previous functions. -
- Ex:
parse_boolean(…)should returntruefor true, false, and falsehood
false. - Ex:
- Return
falseif a boolean literal was not found.*a_posshould refer to the unexpected character that indicated a problem.- Ex:
parse_boolean(…)should returnfalsefor A, 123, "true", and truly
*a_poswill be'l'as that is the first invalid character. - Ex:
- ⚠ Calling
parse_boolean(…)should not result in any calls tomalloc(…). - ⚠ Do not confuse the boolean return value from the result. The return value determines whether a boolean was found, while
*a_valueis the boolean that was found. - Whenever
parse_boolean(…)returnsfalse,*a_valueshould not be modified.
parse null(char const✶✶ a pos)→ return type: boolDetermine whethernullis found at*a_pos.*a_posis initially the address of the first character of the null literal in the input string.- Return
trueif a properly formed null literal was found at*a_pos.*a_posshould refer to the next character in the input string, i.e., after the last character of the null literal. Note that trailing characters are acceptable, just like previous functions. -
- Ex:
parse_null(…)should returntruefor null, null literal, and nullify
null. - Ex:
- Return
falseif the null literal was not found.*a_posshould refer to the unexpected character that indicated a problem.- Ex:
parse_null(…)should returnfalsefor A, 123, "null", and nil
*a_poswill be'i'as that is the first invalid character. - Ex:
- ⚠ Calling
parse_null(…)should not result in any calls tomalloc(…). - This function has no result, as the only possible result is
null.
parse object(BSTNode✶✶ a root, char const✶✶ a pos)→ return type: boolSet*a_rootto the root of a tree ofElementobjects.- Caller is responsible for freeing the memory if
parse_object(…)returnstrue. - Objects consist of a
'{', followed by 0 or more key-value pairs separated by commas, and finally a'}'. Each key-value pair consists of a string key, a character, then a JSON-encoded element (integer, string, list, object, boolean, or null). See the examples above. (There will be no HBB on this definition, unless there is something truly wrong and/or grossly unclear.) There may be any number/amount of whitespace characters (' ','\n', or'\t'), before/after any of the element and before/after the key. - Return
trueif a properly formed object was found. *a_pos should be set to the next character in the input string, after the object that was just parsed.- Ex:
parse_object(…)should returntruefor {}, {"number": 123}, {"string": "Hello", "list": [ 1, 2, 3 ]}, { "nested_object" : { "integer": 5 } }, and { "string_contains": "trailing_characters" }ABC.
- Ex:
- Return
falseif an object was not found (i.e., syntax error in the input string). *a_pos should refer to the unexpected character that indicated a problem.- Ex:
parse_object(…)should returnfalsefor A{}, {"key": 1,,"value": 2}, { {, "key": 1 { "key": 1 "key2": 2}. and { "key" 1 }.
- Ex:
- The parsed elements are stored in a binary search tree sorted by the key values.
-
If duplicate keys are found, replace the value in the previous node with the new element parsed.
For example, {"duplicate": 123, "duplicate": 456}
should produce the same result as {"duplicate": 456},
Hint: this should be handled byinsert_element(…). - Whenever
parse_object(…)returnsfalse, do not modify*a_root, and free any heap memory that was allocated prior to discovery of the error. - Hint: you already have logic written to parse a string, reuse that logic for the key instead of writing it again.
parse element(Element✶ a element, char const✶✶ a pos)→ return type: bool- First, eat any whitespace at
*a_pos.- “Eat whitespace” just means to skip over any whitespace characters (i.e., increment
*a_posuntilisspace(**a_pos)==false).
- “Eat whitespace” just means to skip over any whitespace characters (i.e., increment
- Next, decide what kind of element this is.
- If it's a digit (
isdigit(**a_pos)) or hyphen ('-'), set the element'stypetoELEMENT_INTand callparse int(&(a element -> as int), a pos). - If it's a string (
**a_pos=='"'), then set the element'stypetoELEMENT_STRINGand callparse string(&(a element -> as string), a pos). - If it's a list (
**a_pos == '['), then set the element'stypetoELEMENT_LISTand call:parse list(&(a element -> as list), a pos). - If it's a boolean (
**a_pos == 't'or**a_pos == 'f'), then set the element'stypetoELEMENT_BOOLEANand call:parse boolean(&(a element -> as boolean), a pos). - If it's null (
**a_pos == 'n'), then set the element'stypetoELEMENT_NULL, set the element'sas_nulltoNULL, and call:parse null(a pos). - If it's an object (
**a_pos == '{'), then set the element'stypetoELEMENT_OBJECTand call:parse object(&(a element -> as object), a pos).
- If it's a digit (
- Return whatever was returned by
parse_int(…),parse_string(…),parse_boolean(…),parse_null(…),parse_object(…),orparse_list(…).- If none of those functions was called—i.e., if the next character was neither digit,
'-','"', nor'['—then returnfalse.
- If none of those functions was called—i.e., if the next character was neither digit,
- Do not modify
*a_posdirectly inparse_element(…), except for eating whitespace.*a_poscan—and should—be modified inparse_int(…),parse_string(…),parse_boolean(…),parse_null(…),parse_object(…),andparse_list(…)
- Caller is responsible for freeing memory by calling
free_element(…)wheneverparse_element(…)returnstrue. - Whenever
parse_element(…)returnsfalse, do not modify*a_element. Any heap memory allocated prior to discovery of the error should be freed.
parse json(char const✶ json)→ return type: ParseResultGiven a JSON string, parse it and return aParseResultobject.- If parsing succeeds—i.e., the string is a valid JSON representation of
any JSON object, such as an integer—then the
.is_successfield of theParseResultobject should betrueand the.elementfield (of the anonymous union) should contain theElementobject, as returned byparse_element(…).-
Calling
parse_json("768336");should return aParseResultobject equivalent to this compound literal:
(ParseResult) { .is_success = true, .element = (Element) { .type = ELEMENT_INT, .as_int = 768336 } }
-
Calling
- If parsing fails, then the
.is_successfield of theParseResultobject should befalse, the.file_errorfield of theParseResultobject should be0, and the.error_idxfield (of the anonymous union) should contain the index into json of the first character that indicated a parse failure.-
Example: Calling
parse_json("768336X");should return aParseResultobject equivalent to this compound literal:
(ParseResult) { .is_success = false, .file_error = 0, .error_idx = 6 } -
Example: Calling
parse_json("X768336");should return aParseResultobject equivalent to this compound literal:
(ParseResult) { .is_success = false, .file_error = 0, .error_idx = 0 } - ⚠ Don't forget to set
.file_errorand test for it! The struct should always be fully initialized even though this function does not use that field.
-
Example: Calling
- Ignore leading or trailing whitespace.
-
Example: Parsing
768336,
768336 , or
768336 should return an equivalent result
with
.is_success == trueand.element.as_int == 768336.
-
Example: Parsing
768336,
768336 , or
768336 should return an equivalent result
with
-
Hint:
parse_json(…)should not need to do anything special for each type (i.e., int, string, list); if it works for one, it should work with the others with no modification.parse_json(…)is just a wrapper forparse_element(…)that gives you a more usable, understandable result. parse_json(…)should NOT ignore any non-whitespace trailing characters that come after the JSON object.-
Example: Parsing
X768336,
768336X,
768336 X, or
X 768336 X should result in a
ParseResultobject with.is_success == falseand.error_idxequal to the index of the firstXcharacter (i.e., 0, 6, 7, or 1, respectively). - ⚠ This is different from
parse_int(…),parse_string(…),parse_list(…), andparse_element(…).
read json(char const✶ filename)→ return type: ParseResultReads a JSON element from the file defined byfilename.- Opens the file named
filenameand reads in the contents as a JSON element. - If parsing succeeds—i.e., the file contains a valid JSON representation of
any JSON object, such as an integer or string, string, or list—then the
.is_successfield of theParseResultobject should betrueand the.elementfield (of the anonymous union) should contain theElementobject, as returned byparse_element(…).-
Calling
read_json("integer.json");whereinteger.jsoncontains768336should return aParseResultobject equivalent to this compound literal:
(ParseResult) { .is_success = true, .element = (Element) { .type = ELEMENT_INT, .as_int = 768336 } }
-
Calling
- If the file was unable to be read, then the
.is_successfield of theParseResultobject should befalse, the.error_idxfield should be0and the.file_errorfield should be set to the file error (fromerrno) - If the file could be read but JSON parsing fails, then the
.is_successfield of theParseResultobject should befalse, the.file_errorfield should be0and the.error_idxfield (of the anonymous union) should contain the index into json of the first character that indicated a parse failure.
-
Example: Calling
read_json("invalid_trailing.json");whereinvalid_trailing.jsoncontains768336Xshould return aParseResultobject equivalent to this compound literal:
(ParseResult) { .is_success = false, .file_error = 0, .error_idx = 6 } -
Example: Calling
read_json("invalid_leading.json");whereinvalid_leading.jsoncontains768336Xshould return aParseResultobject equivalent to this compound literal:
(ParseResult) { .is_success = false, .file_error = 0, .error_idx = 0 }
-
Example: Calling
- Ignore leading or trailing whitespace.
-
Example: Parsing
768336,
768336 , or
768336 should return an equivalent result
with
.is_success == trueand.element.as_int == 768336.
-
Example: Parsing
768336,
768336 , or
768336 should return an equivalent result
with
read_json(…)should NOT ignore any non-whitespace trailing characters that come after the JSON object.-
Example: Parsing
X768336,
768336X,
768336 X, or
X 768336 X should result in a
ParseResultobject with.is_success == falseand.error_idxequal to the index of the firstXcharacter (i.e., 0, 6, 7, or 1, respectively). - ⚠ This is different from
parse_int(…),parse_string(…),parse_list(…), andparse_element(…). However, it is the same asparse_json(…). -
Hint:
read_json(…)should not need to do anything special to parse elements or handle trailing characters, it is just a wrapper forparse_json(…)that reads from a file instead of a string. We talked in lecture about how to read all text from a file into a string. - ⚠ Do not add the
.jsonextension automatically. The caller will pass in the full filename they wish to create. If the caller passes in"example", read a file namedexample, not.example.json
print element(Element element)→ return type: voidGiven anElementobject, print it in JSON notation to the passed file.- Spacing is up to you, as long as it is valid JSON.
- If element is an integer, print it using
printf(…).
print element(Element element)→ return type: voidThis function should be exactly one line. It should simply callprint_element_to_file(…)with the proper argument for printing to the console.write json(char const✶ filename, Element element)→ return type: intWrites a JSON element to the file defined byfilename.- Opens the file named
filename. - Create the file if it does not exist, replace contents if it does exist.
- Write the contents of
elementto the file as valid JSON. -
Returns 0 if the file is successfully written.
If the file was unable to be written (i.e. the file could not be opened), then return the file error (from
errno). - Hint: use
print_element_to_file(…). - ⚠ Do not add the
.jsonextension automatically. The caller will pass in the full filename they wish to create. If the caller passes in"example", create a file namedexample, not.example.json
insert element(BSTNode✶✶ a root, char✶ key, Element element)→ return type: voidInserts the key/value pair into the binary search tree.- Sort the tree using
key, no need to considerelementwhen sorting. - This works the same way as the binary search tree insert on HW14, except we handle equal elements differently.
-
If the key already exists, replace the value in the old node with the new element.
⚠ do not forget to free the old element. keywill be a heap allocated string.elementwill be any valid element.- If the key is not found, add a new node to the tree.
- Hint: you can write this either using recursion or a loop.
- Hint: this function will be helpful in creating
parse_object(…).
get element(const BSTNode✶ root, char const✶ key)→ return type: const Element✶Finds the node in the binary search tree which matches the key.- If found, return the address of the element in the node.
- If not found, return
NULL. - Hint: you can write this either using recursion or a loop.
- Hint: this function will be helpful in testing
parse_object(…).
free element(Element element)→ return type: voidFree the contents of theElement, as needed.For HW08, this function does nothing. The body of the function should be empty.
- If it contains a string, free the string.
- If it contains a linked list, free the list, including all elements.
- If it contains a binary search tree, free each node in the tree, including all elements and keys.
Hint: You will want a recursive helper for this. - ⚠ Do not attempt to free the
Elementobject itself.free_element(element)only frees dynamic memory that element refers to.
json.h types ElementProvided definition for a struct type calledElementthat stores a union between different JSON types.- Modify the type enum to add new types:
ELEMENT_NULLELEMENT_BOOLEANELEMENT_OBJECT - Modify the union to include
void* .as_null - Modify the union to include
bool .as_boolean - Modify the union to include
BSTNode* .as_object
ParseResultAdd a definition for a struct type calledParseResultas needed to work withparse_json(…)andread_json(…).BSTNodestruct type with 4 fields:key(char*),element(Element),left, andright(BSTNode*).Leave other definitions in the header file as is. functions Function headers for all required functions. test_json.c functions main(int argc, char✶ argv[])→ return type: intTest your all of the above functions using yourminiunit.h..- This should consist primarily of calls to
mu_run(_test_▒▒▒). - 100% code coverage is required.
- Your main(…) must return EXIT_SUCCESS.
-
parse_int(…),parse_string(…),parse_list(…),parse_boolean(…),parse_null(…),parse_object(…), andparse_element(…)should ignore any trailing characters in the input string, as long as it starts with a well-formed JSON element.- Acceptable: 123AAA, "12"AAA, "12",[,
- You only need to support the specific features of JSON that are explicitly required in this assignment description. You do not need to support unicode (e.g., "萬國碼", "يونيكود", "യൂണികോഡ്"), objects/dictionaries (e.g., {"a":1, "b":2}), backslash escapes (e.g., "\n"), embedded quotes (e.g., "He said, \"Roar!\""), floating point numbers (e.g., 3.1415), non-decimal notations (e.g., 0xdeadbeef, 0600), null, false.
- Do not modify json.h except as explicitly directed.
- Ensure there may be no memory faults (e.g., leaks, invalid read/write, etc.), even when
parse_▒▒▒(…)return false. -
The following external header files, functions, and symbols are
allowed.
For miniunit.h and log_macros.h, you can use anything from HW05. You are welcome to change them to your liking, and/or add more in the same spirit. All others are prohibited unless approved by the instructor. Feel free to ask if there is something you would like to use.header functions/symbols allowed in… stdbool.h bool,true,falsejson.c,test_json.cstdio.h printf,fprintf,fputs,stdout,fflushjson.c,test_json.cassert.h assertjson.c,test_json.cctype.h isdigit,isspacejson.c,test_json.cstdlib.h EXIT_SUCCESS,abs,malloc,free,size_tjson.c,test_json.cstring.h strncpy,strchr,strlen,strcmpjson.c,test_json.climits.h INT_MIN,INT_MAXjson.c,test_json.cminiunit.h anythingtest_json.clog_macros.h anythingjson.c,test_json.c - Submissions must meet the code quality standards and the course policies on homework and academic integrity.
Submit
To submit HW08 from within your hw08 directory,
type
264submit HW08 json.c json.h test_json.c miniunit.h log_macros.h Makefile
Pre-tester ●
The pre-tester for HW08 has been released and is ready to use.
Q&A
How can I structure my tests?
Here is a start. This assumes you have written your
parse_json(…)function andParseResulttype.// Okay to copy/adapt, but ONLY IF YOU UNDERSTAND THIS CODE COMPLETELY. // ⚠ Do not copy blindly. #include <stdio.h> #include <stdlib.h> #include <string.h> #include "json.h" #include "miniunit.h" int _test_int_valid_0() { mu_start(); //──────────────────── ParseResult result = parse_json("0"); mu_check(result.is_success); if(result.is_success) { mu_check(result.element.type == ELEMENT_INT); mu_check(result.element.as_int == 0); free_element(result.element); // should do nothing } //──────────────────── mu_end(); } int _test_int_valid_0() { mu_start(); //──────────────────── ParseResult result = parse_json("0"); mu_check(result.is_success); if(result.is_success) { mu_check(result.element.type == ELEMENT_INT); mu_check(result.element.as_int == 0); free_element(result.element); // should do nothing } //──────────────────── mu_end(); } int _test_string_valid_abc() { mu_start(); //──────────────────── Parse Result result = parse_json("\"abc\""); mu_check(result.is_success); if(result.is_success) { mu_check(result.element.type == ELEMENT_STRING); mu_check(strcmp(result.element.as_string, "abc") == 0); mu_check(strlen(result.element.as_string) == 3); free_element(result.element); } //──────────────────── mu_end(); } static int _test_list_of_ints_valid_1_2() { mu_start(); //──────────────────── ParseResult result = parse_json("[1, 2]"); mu_check(result.is_success); if(result.is_success) { mu_check(result.element.type == ELEMENT_LIST); mu_check(result.element.as_list != NULL); mu_check(result.element.as_list -> element.as_int == 1); mu_check(result.element.as_list -> element.type == ELEMENT_INT); mu_check(result.element.as_list -> next != NULL); mu_check(result.element.as_list -> next -> element.type == ELEMENT_INT); mu_check(result.element.as_list -> next -> element.as_int == 2); free_element(result.element); } //──────────────────── mu_end(); } int main(int argc, char* argv[]) { mu_run(_test_int_valid_0); mu_run(_test_string_valid_abc); mu_run(_test_list_of_ints_valid_1_2); return EXIT_SUCCESS; }
⚠ If you do not understand this code, do not use it.The code above covers all parts of this assignment. If you choose to use it, you will likely need to select only the parts relevant to the part you are doing right now. In any case, that is not an adequate test on its own. You will need more. This is just to get you started.
What if I don't have parse_json(…) done yet?
Here is a simpler version without
parse_json(…)// OK TO COPY / ADAPT this snippet---but ONLY if you understand it completely. // ⚠ Do not copy blindly. // // This test is nowhere near adequate on its own. It is provided to illustrate how to // use helper functions to streamline your test code. #include <stdio.h> #include <stdlib.h> #include "json.h" #include "miniunit.h" static int _test_parse_int_valid() { mu_start(); //────────────────────────────────────────────────────── int result; // will be initialized in parse_int(…) char* input = "0"; char const* pos = input; bool is_success = parse_int(&result, &pos); mu_check(is_success); // because the input is valid mu_check(pos == input + 1); mu_check(result == 0); //────────────────────────────────────────────────────── mu_end(); } static int _test_parse_int_invalid() { mu_start(); //────────────────────────────────────────────────────── int result; // will be initialized in parse_int(…) char* input = "A"; char const* pos = input; bool is_success = parse_int(&result, &pos); mu_check(!is_success); // because the input is valid mu_check(pos == input); // failure should be at the first character in the input //────────────────────────────────────────────────────── mu_end(); } int main(int argc, char* argv[]) { mu_run(_test_parse_int_valid); mu_run(_test_parse_int_invalid); return EXIT_SUCCESS; }
What should be the value of
*a_posafterparse_▒▒▒(…)returns?_____________________________________________ # EXAMPLE #1 INPUT: 123 BEFORE we call parse_int(…): 123 ↑ *a_pos RETURN value from parse_int(…): true After parse_int(…) returns: 123 ↑ *a_pos refers to null terminator just after the integer literal. element.type == ELEMENT_INT element.as_int == 123 _____________________________________________ # EXAMPLE #2 INPUT: 123ABC BEFORE we call parse_int(…): 123ABC ↑ *a_pos RETURN value from parse_int(…): true After parse_int(…) returns: 123ABC ↑ *a_pos refers to the non-digit character after the integer literal. element.type == ELEMENT_INT element.as_int == 123 _____________________________________________ # EXAMPLE #3 INPUT: -A1 BEFORE we call parse_int(…): -A1 ↑ *a_pos RETURN value from parse_int(…): false After parse_int(…) returns: -A1 ↑ *a_pos refers first character that informed us this cannot be an integer literal. element.type == (don't care) element.as_int == (don't care)What does the output of
print_element(…)look like?It's just the inverse operation toparse_element(…).parse_element(…)takes JSON as input.print_element(…)prints JSON as output.If you were to parse the output ofprint_element(…)withparse_element(…)you should get an equivalent object.If you parse a JSON string and then print it again, you should get an equivalent string.If you're looking for concrete examples, just look at any example of input toparse_element(…)(except for the trailing characters). There are several at the top of this assignment description page.The specification says we don't have to handle escape sequences, but then it mentions escape sequences. Do we parse escape sequences or don't we? How do we handle backslash?
We use C escape codes to make C string literals containing certain characters (e.g., double-quote, newline, etc.) in our C code.You don't have to parse JSON escape codes. Unless you are doing the escape sequence bonus, just treat a backslash like any other character.Here's an example to illustrate the distinction:#include <stdlib.h> #include <assert.h> #include <string.h> #include <stdio.h> #include "json.h" int main(int argc, char* argv[]) { Element element; // 'element' will be initialized inside parse_element(…) // C escape codes used to create a C string literal containing double quotes char* json_input = "\"A\""; // same as: {'\"', 'A', '\"'} char const* pos = json_input; assert(strlen(json_input) == 3); parse_element(&element, &pos); printf(">>>|%s|<<<\n", element.as_string); // Output: // >>>|A|<<< // JSON escape codes json_input = "\"A\\nB\""; // same as: {'\"', 'A', '\\', 'n', 'B', '\"'} assert(strlen(json_input) == 5); pos = json_input; parse_element(&element, &pos); printf(">>>|%s|<<<\n", element.as_string); // Output: // >>>|A\nB|<<< // Output: (with escape code bonus) // >>>|A // B|<<< // Note: strlen(…) does not count the null terminator return EXIT_SUCCESS; }
Should the double quotes be stored in memory?
No. The double quotes are part of the JSON syntax.This is just like how in C, when you define a string like this:… the double quotes are not stored in memory.char s[] = "abc";
How many file errors should I test? Do I have to test all error numbers in errno.h?
You want to test at minimum 1 type of error, probably no need for more than 2. You do not need to worry about testing all errors as we are not testingfopen, we are testing our JSON functions.Some suggestions of reasonable errors to test are a file not existing (
ENOENT), or attempting to read from a directory (EISDIR, though sometimes this may show up asEINVAL). You may also consider a too long filename (ENAMETOOLONG). A permissions error (EACCES) is also not hard to test for, but it may be unreliable (you have different permissions than the grader).Note many errors defined in
errno.hare hard to trigger (e.g.ENOMEMorEDQUOT) or are not relevant to files (e.g.EHOSTDOWN), no need to worry about them; your goal is just finding a couple reasonable errors to test to make sure you properly return the number. I should really emphasize for everyone's sake, do not try to test errors from memory limits or file size limits such as out of memory errors or disk quota exceeded, you will probably just end up getting disconnected and possibly require us putting in a ticket with IT to get you connected; or worst case you could affect the ability for others to connect to the server or access the testers. File name limits are one of the few reasonable limits to test (that is,ENAMETOOLONG).My code freezes when trying to test file errors, what happened?
Some times of invalid files can be opened but give unexpected behavior with
feof(…). For example, when attempting to read from a directory (testingEISDIR),feof(…)never set to end of file. We can work around this by taking advantage offerror(…), or by careful usage ofEOF.The best approach is not worrying about
EOFat all, usefseek(…)andftell(…)to find the file position.fseek(…)will return non-zero if there was an error making it easy to detect.If you wanted to do this by checking for end of file, replace:
withfor (char ch = fgetc(file); !feof(file); ch = fgetc(file)) { // work with ch here }
We could also solve this problem using EOF, but there are a lot more pitfalls:for (char ch = fgetc(file); !feof(file) && !ferror(file); ch = fgetc(file)) { // work with ch here } // handle errors during reading if (ferror(file)) { // do something with ferror/errno }
Note the type of the loop variable must befor (int ch = fgetc(file); ch != EOF; ch = fgetc(file)) { // work with ch here } // if not at the end of the file, then something went wrong // could equivelently check ferror(file) here if (!feof(file)) { // do something with ferror/errno }
intfor this to work reliably. However, this is much more prone to bugs than just usingfseek(…)andftell(…), so you should use those.How do I ensure I am not printing too many commas when printing objects?
The easiest method to do this is to just pass in a booleandid_print_first_elementby address to your recursive printing logic. If the boolean is true, print the comma before your key-value pair; else set the boolean to true and do not print a comma.How do I ensure the keys are correct when printing?
JSON objects do not preserve the order of keys when parsing, so printing in the original order is not required.
We recommend just doing an in-order traversal for printing, as that will make all the keys in alphabetical order. However, any order is valid as long as all keys are eventually printed.
How can I structure my object tests?
Here's a start. (We may add to this at some point.)// OK TO COPY / ADAPT this snippet---but ONLY if you understand it completely. // ⚠ Do not copy blindly. // // This test is nowhere near adequate on its own. It is provided to illustrate how to // use helper functions to streamline your test code. #include <stdio.h> #include <stdlib.h> #include "json.h" #include "miniunit.h" // all tests from JSON 4, do not delete old tests!!! static int _test_parse_object_two_values() { mu_start(); //────────────────────────────────────────────────────── char const* input = "{\"key1\": 123, \"key2\": \"value\"}ABC"; char const* pos = input; BSTNode* root; bool is_success = parse_object(&root, &pos); mu_check(is_success); // because the input is valid // I was too lazy to count the characters, so made C count it for me // the 3 is the trailing characters mu_check(pos == input + strlen(input) - 3); mu_check(root != NULL); // we could check specific locations in the tree, // or we could just use the helper which does not care const Element* a_element = get_element(root, "key1"); mu_check(a_element != NULL); mu_check(a_element->type == ELEMENT_INT); mu_check(a_element->as_int == 123); a_element = get_element(root, "key2"); mu_check(a_element != NULL); mu_check(a_element->type == ELEMENT_STRING); mu_check_strings_equal(a_element->as_string, "value"); free_element((Element){.as_object = root, .type = ELEMENT_OBJECT}); //────────────────────────────────────────────────────── mu_end(); } int main(int argc, char* argv[]) { mu_run(_test_parse_object_two_values); mu_run(_test_write_int_zero); return EXIT_SUCCESS; }
That string looks super messy, is there a nicer way we can write it?
You may find it easier to create JSON objects as files. This will prevent the need to escape all the double quotes. You could create helper functions to make your tests easier to write. Some examples are below, Example JSON object file contents:{ "key1": "This is a string", "key2": 12345, "nested_object": { "with a nested list": [ 1, 2, true, null ], "okay, this object is probably a bit complex": true, "you probably want to test simplier ones :)": false } }
Updates
| 10/30/2022 |
|
| 7/2/2024 |
|
| 7/29/2024 |
|
| 7/30/2024 |
|