1

I have a dictionary with multiple JSON lines, as below.

my_dict = [{'processId': 'p1', 'userId': 'user1', 'reportName': 'report1', 'threadId': '12234', 'some_other_keys': 'respective values.12234'}, {'userId': 'user1', 'processId': 'p1', 'reportName': 'report1', 'threadId': '12335', 'some_other_keys': 'respective values.12335', 'another_key': 'another_value.12335','key1': 'key1_value.12335'}, {'processId': 'p1', 'userId': 'user1', 'reportName': 'report1', 'threadId': '12834', 'some_other_keys': 'respective values.12834','key2': 'key2_value.12834'}]

Note: different json lines have different set of keys.

In these lines 'processId': 'p1', 'userId': 'user1', 'reportName': 'report1' are same for all the lines and this is known to the programmer.

Objective:

  1. write a function to create a single JSON line out of the above.
  2. function arguments are
    1. list of matching keys i.e. ["processId","userId","reportName"]
    2. the dictionary as mentioned above.

Output:

The expect output for the above input dictionary is as below, a single JSON record.

{"processId": "p1", "userId": "user1", "reportName": "report1", "threadId_0": "12234", "some_other_keys_0": "respective values.12234", "threadId_1": "12335", "some_other_keys_1": "respective values.12335", "another_key_1": "another_value.12335","key1_1": "key1_value.12335", "threadId_2": "12834", "some_other_keys_2": "respective values.12834","key2_2": "key2_value.12834"}

My current code looks like below:

def multijson_to_singlejson_matchingkey(list_json, list_keys):
    rec0 = {}
    for l in range(len(list_keys)):
        key0 = list_keys[l]
        value0 = list_json[0][key0]
        rec0[f'{key0}'] = value0
    rec = {}
    for i in range(len(list_json)):
        line = list_json[i]
        for j in range(len(list_keys)):
            del line[list_keys[j]]
        line_keys = list(line)
        for k in range(len(line_keys)):
            key_a = line_keys[k] + "_" + f"{i}"
            line[f'{key_a}'] = line[f'{line_keys[k]}']
            del line[f'{line_keys[k]}']
        rec = {**rec, **line}
    res = {}
    res = {**rec0, **rec}
    print(res)
    return res

But this is a function with 20 lines of code. I'm trying to optimize the code with less number of lines of code and making it more performance efficient. Need help with the available options for doing that.

1 Answer 1

1

You can simplify the generation of rec0 to a hopefully reasonably readable one-liner, and then loop over the list of input dictionaries to populate the rest, ignoring any keys that are in list_keys (although testing here equivalently against rec0 as it is marginally faster):

def multi_to_single(list_json, list_keys):
    rec0 = dict((key0, list_json[0][key0]) for key0 in list_keys)
    res = rec0.copy()
    for i, dct in enumerate(list_json):
        for k, v in dct.items():
            if k not in rec0:
                res[f'{k}_{i}'] = v
    print(res)
    return res

This gives (with pprint.pprint here instead of print for ease of reading):

{'another_key_1': 'another_value.12335',
 'key1_1': 'key1_value.12335',
 'key2_2': 'key2_value.12834',
 'processId': 'p1',
 'reportName': 'report1',
 'some_other_keys_0': 'respective values.12234',
 'some_other_keys_1': 'respective values.12335',
 'some_other_keys_2': 'respective values.12834',
 'threadId_0': '12234',
 'threadId_1': '12335',
 'threadId_2': '12834',
 'userId': 'user1'}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.