0

example of log file:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}

It will generate 5 files:

1.timestamp.column

2.Field1.column

3.Field_Doc.f1.column

4.Field_Doc.f2.column

5.Field_Doc.f3.column

Example content of timestamp.column:

2022-01-14T00:12:21.000
2022-01-18T00:15:51.000

I'm facing a problem while the values of keys are null, undefined as when the value us is null for example:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

can someone help me out here?

6
  • send your code please Commented Feb 21, 2022 at 15:35
  • You can check the code on this link: stackoverflow.com/questions/71194832/… Commented Feb 21, 2022 at 16:04
  • so that link shows 3 different ways of doing it. Which one are you using? Also, questions should be self-contained. Thirdly, please provide a good example of what you actually want to achieve. From the question as is, I cannot determine what it is you want to happen when a null or undefined is encountered. Also, note that null is a valid JSON value, but undefined is not. Lastly, the input file is of type ndjson (just as an FYI) Commented Feb 21, 2022 at 17:27
  • the file is not a JSON file, it's mentioned in the heading itself, JSON-BASED LOGFILE. logfile are always text files, and I'm using the code mentioned in the last, and I have to convert that log file into a column file for each key. Commented Feb 21, 2022 at 17:32
  • I have a edge case that i'm stuck on which is: The column file format is as follows: - string fields are separated by a new line '\n' character. Assume that no string value has new line characters, so no need to worry about escaping them - double, integer & boolean fields are represented as a single value per line - null, undefined & empty strings are represented as an empty line Commented Feb 21, 2022 at 17:34

1 Answer 1

1

Note, the input file is actually an NDJSON. See the docs.

That being said, since furas already gave an excellent answer on how to process the NDJSON logfile I'm going to skip that part. Do note that there's a library to deal with NDJSON files. See PyPI.

His code needs minimal adjustment to deal with the undefined edge case. The null value is a valid JSON value, so his code doesn't break on that.

You can fix this easily by a string.replace() while doing the json.loads() so it becomes valid JSON, and then you can check while writing if value == None to replace the value with an empty string. Note that None is the python equivalent of JSON's null.

Please note the inclusion of : in the replace function, it's to prevent false negatives...

main loop logic

for line in file_obj:
    # the replace function makes it valid JSON
    data = json.loads(line.replace(': undefined', ': null'))
    print(data)
    process_dict(data, write_func)

write_func() function adjustment

def write_func(key, value):
    with open(key + '.column', "a") as f:
        # if the value == None, make it an empty string.
        if value == None:
            value = ''
        f.write(str(value) + "\n")

I used the following as the input string:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.