0

I'm looking for a way to merge objects where one or mores keys have the same value. Specific in my example I have a list where the category and code must match.

Input

[{
    "category": "Nace2008",
    "code": "01110",
    "NL": "Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden"
},
{
    "category": "Nace2008",
    "code": "01110",
    "FR": "Culture de c\u00e9r\u00e9ales (\u00e0 l'exception du riz), de l\u00e9gumineuses et de graines ol\u00e9agineuses"
},
{
    "category": "Nace2008",
    "code": "01120",
    "FR": "Culture du riz"
},
{
    "category": "Nace2008",
    "code": "01120",
    "NL": "Teelt van rijst"
}]

Expected output

[{
    "category": "Nace2008",
    "code": "01110",
    "NL": "Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden",
    "FR": "Culture de c\u00e9r\u00e9ales (\u00e0 l'exception du riz), de l\u00e9gumineuses et de graines ol\u00e9agineuses"
},
{
    "category": "Nace2008",
    "code": "01120",
    "NL": "Teelt van rijst",
    "FR": "Culture du riz"
}]

Looping through the list and do another loop to check for the same category and code will result in duplicate data.

7
  • what should happen if other keys match, or will they not? Commented Apr 9, 2022 at 10:01
  • In this specific case it won't happen. I'm currently working with hardcoded keys but if it can be dynamic with one or mores keys that would even be better Commented Apr 9, 2022 at 10:04
  • did you mean "code": "01110"?? Because your output implies you want to match "code":"0111" with "code":"01110" Commented Apr 9, 2022 at 10:04
  • The check here should be if other objects in the list have the same value for the keys category and code Commented Apr 9, 2022 at 10:08
  • 1
    I would use a dictionary where the values for the keys is the key in the dictionary… for ex: {“Nace2008_01110”: {object}} then you could do look it up with a .get call and then add the new language key and value for each subsequent match. You could write it as a function so you can pass in the keys you want to check for uniqueness… Commented Apr 9, 2022 at 10:09

1 Answer 1

4

So, you just want the standard dictionary grouping idiom based on the key you described:

>>> data = [{
...     "category": "Nace2008",
...     "code": "01110",
...     "NL": "Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden"
... },
... {
...     "category": "Nace2008",
...     "code": "01110",
...     "FR": "Culture de c\u00e9r\u00e9ales (\u00e0 l'exception du riz), de l\u00e9gumineuses et de graines ol\u00e9agineuses"
... },
... {
...     "category": "Nace2008",
...     "code": "01120",
...     "FR": "Culture du riz"
... },
... {
...     "category": "Nace2008",
...     "code": "01120",
...     "NL": "Teelt van rijst"
... }]

So create an empty dictionary, group by the key:

>>> result = {}
>>> for d in data:
...     key = d['category'], d['code']
...     result.setdefault(key, {}).update(d)
...

Note, the .update just merges whatever is there naively. If you would have duplicate keys in subsequent records, then it would take the last one. If they are all unique, it shouldn't be a problem. And the results:

>>> from pprint import pprint
>>> pprint(result)
{('Nace2008', '01110'): {'FR': "Culture de céréales (à l'exception du riz), de "
                               'légumineuses et de graines oléagineuses',
                         'NL': 'Teelt van granen (m.u.v. rijst), peulgewassen '
                               'en oliehoudende zaden',
                         'category': 'Nace2008',
                         'code': '01110'},
 ('Nace2008', '01120'): {'FR': 'Culture du riz',
                         'NL': 'Teelt van rijst',
                         'category': 'Nace2008',
                         'code': '01120'}}

Then you can extract the values of that dictionary if you want just that:

>>> pprint(list(result.values()))
[{'FR': "Culture de céréales (à l'exception du riz), de légumineuses et de "
        'graines oléagineuses',
  'NL': 'Teelt van granen (m.u.v. rijst), peulgewassen en oliehoudende zaden',
  'category': 'Nace2008',
  'code': '01110'},
 {'FR': 'Culture du riz',
  'NL': 'Teelt van rijst',
  'category': 'Nace2008',
  'code': '01120'}]

Note, the grouping idiom can be cleaned up a bit using defaultdict (some people find .setdefault confusing):

from collections import defaultdict
result = defaultdict(dict)
for d in data:
    key = d['category'], d['code']
    result[key].update(d)

Both are the same as:

result = {}
for d in data:
    key = d['category'], d['code']
    if key not in result:
        result[key] = {}
    result[key].update(d)
Sign up to request clarification or add additional context in comments.

2 Comments

If the object should have another key address (completely random) and I want to keep them both by merging the two strings or two objects (if address is an object with street, number, etc.). Should I change the update function to a custom merge function to handle this?
@Thore yeah, you'll want to write a custom update function to handle things.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.