1

I have my data in txt file.

1   B   F   2019-03-10 
1   C   G   2019-03-11 
1   B   H   2019-03-10 
1   C   I   2019-03-10 
1   B   J   2019-03-10 
2   A   K   2019-03-10 
1   D   L   2019-03-10 
2   D   M   2019-03-10 
2   E   N   2019-03-11 
1   E   O   2019-03-10 

What I need to do is to split the data according to the first column.

So all rows with number 1 in the first column go to one list( or dictionary or whatever) and all rows with number 2 in the first column do to other list or whatever. This is a sample data, in original data we do not know how many different numbers are in the first column.

What I have to do next is to sort the data for each key (in my case for numbers 1 and 2) by date and time. I could do that with the data.txt, but not with the dictionary.

with open("data.txt") as file: 
    reader = csv.reader(file, delimiter="\t")
    data=sorted(reader, key=itemgetter(0))
    lines = sorted(data, key=itemgetter(3))

lines

OUTPUT:
[['1', 'B', 'F', '2019-03-10'],
 ['2', 'D', 'M', '2019-03-10'],
 ['1', 'B', 'H', '2019-03-10'],
 ['1', 'C', 'I', '2019-03-10'],
 ['1', 'B', 'J', '2019-03-10'],
 ['1', 'D', 'L', '2019-03-10'],
 ['2', 'A', 'K', '2019-03-10'],
 ['1', 'E', 'O', '2019-03-10'],
 ['1', 'C', 'G', '2019-03-11'],
 ['2', 'E', 'N', '2019-03-11']]

So what I need is to group the data by the number in the first column as well as to sort this by the date and time. Could anyone please help me to combine these two codes somehow? I am not sure if I had to use a dictionary, maybe there is another way to do that.

2 Answers 2

1

You can sort corresponding list for each key after splitting the data according to the first column

def sort_by_time(key_items):
    return sorted(key_items, key=itemgetter(3))

d = {k: sort_by_time(v) for k, v in d.items()}

If d has separate elements for time and for date, then you can sort by several columns:

sorted(key_items, key=itemgetter(2, 3))
Sign up to request clarification or add additional context in comments.

Comments

1

itertools.groupby can help build the lists:

from operator import itemgetter
from itertools import groupby
from pprint import pprint

# Read all the data splitting on whitespace
with open('data.txt') as f:
    data = [line.split() for line in f]

# Sort by indicated columns
data.sort(key=itemgetter(0,3,4))

# Build a dictionary keyed on the first column
# Note: data must be pre-sorted by the groupby key for groupby to work correctly.
d = {group:list(items) for group,items in groupby(data,key=itemgetter(0))}

pprint(d)

Output:

{'1': [['1', 'B', 'F', '2019-03-10', '16:13:38.935'],
       ['1', 'B', 'H', '2019-03-10', '16:13:59.045'],
       ['1', 'C', 'I', '2019-03-10', '16:14:07.561'],
       ['1', 'B', 'J', '2019-03-10', '16:14:35.371'],
       ['1', 'D', 'L', '2019-03-10', '16:14:40.854'],
       ['1', 'E', 'O', '2019-03-10', '16:15:05.878'],
       ['1', 'C', 'G', '2019-03-11', '16:14:39.999']],
 '2': [['2', 'D', 'M', '2019-03-10', '16:13:58.641'],
       ['2', 'A', 'K', '2019-03-10', '16:14:43.224'],
       ['2', 'E', 'N', '2019-03-11', '16:15:01.807']]}

2 Comments

I want to group the data by the first column. How do I get rid of the first column in values?
You can simply ignore the first column: list(items[1:])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.