how to turn multiple lines into multiple lists in python?

Question

I have a file with lines look like this:

"[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."

"[37.715399429999998, -89.21166221] 6 2011-08-28 19:45:41 Ate more veggie and fruit than meat for the first time in my life"

i have tried to strip these lines and split them, then i tried to strip substring in every list with punctuations.

 with open('aabb.txt') as t:
        for Line in t:
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()

what shoul I do to turn these lines into two lists look like this:

'["36.147315849999998","-86.7978174","6","2011-08-28","19:45:11","maryreynolds85","that","is","my","life","lol"]'

'["37.715399429999998","-89.21166221","6","2011-08-28","19:45:41","ate","more","veggie","and","fruit","than","meat","for","the","time","in","my","life"]'

I don't know enough about python, but should you use something from this : Read a file line-by-line with python and mix it with the function list = line.split(" ") — pensum
– pensum, Commented Nov 5, 2019 at 5:11
You're trying to read a TSV (Tab-Separated Value) file, which generically refers to whitespace-separated input (not just tabs). It also contains [...] brackets. — smci
– smci, Commented Nov 5, 2019 at 5:11
Variable names should generally follow the lowercase_with_underscores style. — AMC
– AMC, Commented Nov 5, 2019 at 5:13
Where do these strings come from? What’s the general format, context, etc? — AMC
– AMC, Commented Nov 5, 2019 at 5:21

Atreyagaurav · Accepted Answer · 2019-11-05 05:18:14Z

2

are all your data in the same format? if yes, use regex from re library.

import re
your_str="[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
reg_data= re.compile(r"\[(.*),(.*)\] (.*)")
your_reg_grp=re.match(reg_data,your_str)
if your_reg_grp:
  print(your_reg_grp.groups())

#this should put everything in the list except the parts outside the square brackets, you can split the last one by split(" ") then make a new list.

grp1=your_reg_grp.groups()
grp2=grp1[-1].split(" ")

Combine grp1[:-1] and grp2

edited Nov 5, 2019 at 5:18

answered Nov 5, 2019 at 5:11

Atreyagaurav

1,2058 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jayg_code Over a year ago

Adding to @Atreyagaurav, the following RegEx is more explicit: regex101.com/r/QRux5E/1

Atreyagaurav Over a year ago

Nice one, That seems to be useful, I didn't want to spend too much time in figuring the exact regex so I made a general one.

jane998 Over a year ago

@Atreyagaurav thank you for your help. I tried your answer but it seems like there are some puncuations are missed.like, ['6', '2011-08-28', '19:11:58', 'wahhhhhh', 'i', 'need', 'to', 'figure', 'out', 'what', 'to', 'do', 'wifff', 'my', 'life', '#lost']. the "#' infront of the word"lost' are supposed to be removed. could you show me how to solve the problem in that case? im kinda new to python. thank you for your help again.

Atreyagaurav Over a year ago

if such puntuation are in start or end, your code words.strip("!#$%&'()*+,-./:;?@[\]^_{|}~")` should work fine, use it for each item in your group, or write a function for that. If they can also be in the miiddle then you can write a function to remove those characters, shouldn't be hard.

Anuj Dekavadiya · Accepted Answer · 2019-11-07 04:12:11Z

-1

You are already creating words that you need on the list. You have to just create a list and add it to the list.

with open('aabb.txt') as t:
        for Line in t:
            list=[]
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()
                list.append(words)
            print(list)

You can also create a list of list for each line and use it for your needs.

with open('aabb.txt') as t:
        root_list=[]
        for Line in t:
            temp_list=[]
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()
                temp_list.append(words)
            root_list.append(temp_list)
        print(root_list)

edited Nov 7, 2019 at 4:12

answered Nov 5, 2019 at 5:49

Anuj Dekavadiya

12 bronze badges

3 Comments

Anuj Dekavadiya Over a year ago

@Dulaj Kulathunga I have no idea that you have formated my code. When I edited it's still mashed up.

jane998 Over a year ago

@ Anuj Dekavadiya im kinda new to python, could you show me how to create a list of list in this case?

Anuj Dekavadiya Over a year ago

@jane998 I have updated the answer with a list of list :)

Collectives™ on Stack Overflow

how to turn multiple lines into multiple lists in python?

2 Answers 2

4 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related