-1

I have hundreds of text files that i need to parse according to the username and the date. I tried to put useful data in the text files in lists like that:

    [
      ['[email protected]', '34209809' '1434546354', '2016-07-18 00:20:58'], 
      ['[email protected]', '234534345', '09402380',, '2016-07-18 00:20:03'], 
      ['[email protected]', '345315531','1098098098', '2016-07-18 02:40:00'], 
      ['[email protected]', '345431353', '231200023', '2016-07-18 15:45:49'], 
      ['[email protected]', '23232424', '234809809', '2016-07-18 20:45:40']
    ]

However, I would like to sort them according to the datetime and group by usernames so the output will be like:

    [
     ['[email protected]', '23232424', '234809809', '2016-07-18 20:45:40'],
     ['[email protected]', '34209809' '1434546354', '2016-07-18 00:20:58'],
     ['[email protected]', '345431353', '231200023', '2016-07-18 15:45:49'],
     ['[email protected]', '234534345', '09402380',, '2016-07-18 00:20:03'],
     ['[email protected]', '345315531','1098098098', '2016-07-18 02:40:00']
    ]

Here is my code:

    import glob
    from operator import itemgetter
    from itertools import groupby
    def read_large_file(filename):
        matrix=[]
        global username
        username=[]
        for myfile in glob.glob(filename):
            infile = open(myfile, "r")
            for row in infile:
                row=row.strip()
                array=row.split(';') 
                username.append(array[9])
                matrix.append(cdr(array[9],array[17],array[18],array[8]))

        return matrix


    class cdr(object):               
        def__init__(self,username,total_seconds_since_start,download_bytes,date_time):
            self.username=username
            self.total_seconds_since_start=total_seconds_since_start
            self.download_bytes=download_bytes
            self.date_time=date_time


    def GroupByUsername(matrix):
        new_matrix=[]
        new_matrix=groupby(matrix, itemgetter(0))
        return new_matrix

    matrix=read_large_file('C:\Users\ceren\.spyder2/test/*')
    matrix_new=GroupByUsername(matrix)

I tried to use the solution in this link : Sorting and Grouping Nested Lists in Python however i've got these errors:

   'cdr' object does not support indexing
   'cdr' object is not iterable

1 Answer 1

2

You can probably just use the simple Python built-in sort.

sorted_list = sorted(data, key=lambda user_info: (user_info[0], user_info[3]))

The lambda key tells Python how to sort the list (ascending). For each entry in data, user_info will be the list of 4 attributes. So, user_info[0] will be the email, and user_info[3] will be the datetime.

Sign up to request clarification or add additional context in comments.

6 Comments

Thank youu, i tried however i got this error: TypeError: <lambda>() takes exactly 1 argument (2 given)
Ah sorry, I forgot the key=. I've fixed it - give it a try :)
Still have this: TypeError: 'cdr' object does not support indexing , i think python does not let me to put class objects in list indexes.
If you want to replicate exactly the output you're going to want something like sorted(sorted(data, key=lambda x: x[-1], reverse=True), key=lambda x:x[0]) since Timsort is stable.
The solution I gave will sort the nested list you provided. To sort a list of cdr objects, you will want: sorted_list = sorted(cdr_list, key=lambda cdr: (cdr.username, cdr.date_time))
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.