0

I found several posts about flattening/collapsing lists in Python, but none which cover this case:

Input:

[a_key_1, a_key_2, a_value_1, a_value_2]
[b_key_1, b_key_2, b_value_1, b_value_2]
[a_key_1, a_key_2 a_value_3, a_value_4]
[a_key_1, a_key_3, a_value_5, a_value_6]

Output:

[a_key_1, a_key_2, [a_value1, a_value3], [a_value_2, a_value_4]]
[b_key_1, b_key_2, [b_value1], [b_value_2]]
[a_key_1, a_key_3, [a_value_5], [a_value_6]]

I want to flatten the lists so there is only one entry per unique set of keys and the remaining values are combined into nested lists next to those unique keys.

EDIT: The first two elements in the input will always be the keys; the last two elements will always be the values.

Is this possible?

2
  • What differentiates a key from a value? Commented May 22, 2015 at 1:46
  • It's simply based on position in the original input. So, position 0 and 1 are always keys, 2 and 3 are always values. Commented May 22, 2015 at 1:47

2 Answers 2

3

Yes, it's possible. Here's a function (with doctest from your input/output) that performs the task:

#!/usr/bin/env python
"""Flatten lists as per http://stackoverflow.com/q/30387083/253599."""

from collections import OrderedDict


def flatten(key_length, *args):
    """
    Take lists having key elements and collect remainder into result.

    >>> flatten(1,
    ...         ['A', 'a1', 'a2'],
    ...         ['B', 'b1', 'b2'],
    ...         ['A', 'a3', 'a4'])
    [['A', ['a1', 'a2'], ['a3', 'a4']], ['B', ['b1', 'b2']]]

    >>> flatten(2,
    ...         ['A1', 'A2', 'a1', 'a2'],
    ...         ['B1', 'B2', 'b1', 'b2'],
    ...         ['A1', 'A2', 'a3', 'a4'],
    ...         ['A1', 'A3', 'a5', 'a6'])
    [['A1', 'A2', ['a1', 'a2'], ['a3', 'a4']], ['B1', 'B2', ['b1', 'b2']], ['A1', 'A3', ['a5', 'a6']]]
    """
    result = OrderedDict()
    for vals in args:
        result.setdefault(
            tuple(vals[:key_length]), [],
        ).append(vals[key_length:])
    return [
        list(key) + list(vals)
        for key, vals
        in result.items()
    ]


if __name__ == '__main__':
    import doctest
    doctest.testmod()

(Edited to work with both your original question and the edited question)

Sign up to request clarification or add additional context in comments.

Comments

1
data = [
    ["a_key_1", "a_key_2", "a_value_1", "a_value_2"],
    ["b_key_1", "b_key_2", "b_value_1", "b_value_2"],
    ["a_key_1", "a_key_2", "a_value_3", "a_value_4"],
    ["a_key_1", "a_key_3", "a_value_5", "a_value_6"],
]

from itertools import groupby
keyfunc = lambda row: (row[0], row[1])
print [
    list(key) + [list(zipped) for zipped in zip(*group)[2:]]
    for key, group
    in groupby(sorted(data, key=keyfunc), keyfunc)
]


# => [['a_key_1', 'a_key_2', ['a_value_1', 'a_value_3'], ['a_value_2', 'a_value_4']],
#     ['a_key_1', 'a_key_3', ['a_value_5'], ['a_value_6']],
#     ['b_key_1', 'b_key_2', ['b_value_1'], ['b_value_2']]]

For more information check the Python Docs

2 Comments

It is not possible to group where position 0 is the only key and the collapsed values are positions 1, 2, and 3? It seems like you can't just manipulate the lambda row: (row[0], row[1]) to lambda row: (row[0]) or else it breaks.
(row[0]) is equal to row[0]. (row[0],) (with comma) is a tuple just like (row[0], row[1]); this way keeps the multikey and singlekey cases similar. Or, you could change to lambda row: row[0], and change list(key) into [key]; this is simpler, but is structurally different than the multikey case. In either case you will have to change [2:] into [1:].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.