0

I have a file with lines look like this:

"[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."

"[37.715399429999998, -89.21166221] 6 2011-08-28 19:45:41 Ate more veggie and fruit than meat for the first time in my life"

i have tried to strip these lines and split them, then i tried to strip substring in every list with punctuations.

 with open('aabb.txt') as t:
        for Line in t:
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()

what shoul I do to turn these lines into two lists look like this:

'["36.147315849999998","-86.7978174","6","2011-08-28","19:45:11","maryreynolds85","that","is","my","life","lol"]'

'["37.715399429999998","-89.21166221","6","2011-08-28","19:45:41","ate","more","veggie","and","fruit","than","meat","for","the","time","in","my","life"]'
5
  • I don't know enough about python, but should you use something from this : Read a file line-by-line with python and mix it with the function list = line.split(" ") Commented Nov 5, 2019 at 5:11
  • You're trying to read a TSV (Tab-Separated Value) file, which generically refers to whitespace-separated input (not just tabs). It also contains [...] brackets. Commented Nov 5, 2019 at 5:11
  • Variable names should generally follow the lowercase_with_underscores style. Commented Nov 5, 2019 at 5:13
  • Related: parsing a tab-separated file in Python Commented Nov 5, 2019 at 5:13
  • Where do these strings come from? What’s the general format, context, etc? Commented Nov 5, 2019 at 5:21

2 Answers 2

2

are all your data in the same format? if yes, use regex from re library.

import re
your_str="[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
reg_data= re.compile(r"\[(.*),(.*)\] (.*)")
your_reg_grp=re.match(reg_data,your_str)
if your_reg_grp:
  print(your_reg_grp.groups())

#this should put everything in the list except the parts outside the square brackets, you can split the last one by split(" ") then make a new list.

grp1=your_reg_grp.groups()
grp2=grp1[-1].split(" ")

Combine grp1[:-1] and grp2

Sign up to request clarification or add additional context in comments.

4 Comments

Adding to @Atreyagaurav, the following RegEx is more explicit: regex101.com/r/QRux5E/1
Nice one, That seems to be useful, I didn't want to spend too much time in figuring the exact regex so I made a general one.
@Atreyagaurav thank you for your help. I tried your answer but it seems like there are some puncuations are missed.like, ['6', '2011-08-28', '19:11:58', 'wahhhhhh', 'i', 'need', 'to', 'figure', 'out', 'what', 'to', 'do', 'wifff', 'my', 'life', '#lost']. the "#' infront of the word"lost' are supposed to be removed. could you show me how to solve the problem in that case? im kinda new to python. thank you for your help again.
if such puntuation are in start or end, your code words.strip("!#$%&'()*+,-./:;?@[\]^_{|}~")` should work fine, use it for each item in your group, or write a function for that. If they can also be in the miiddle then you can write a function to remove those characters, shouldn't be hard.
-1

You are already creating words that you need on the list. You have to just create a list and add it to the list.

with open('aabb.txt') as t:
        for Line in t:
            list=[]
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()
                list.append(words)
            print(list)

You can also create a list of list for each line and use it for your needs.

with open('aabb.txt') as t:
        root_list=[]
        for Line in t:
            temp_list=[]
            splitline = Line.strip()  
            splitline2 = splitline.split()  
            for words in splitline2:
                words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
                words = words.lower()
                temp_list.append(words)
            root_list.append(temp_list)
        print(root_list)

3 Comments

@Dulaj Kulathunga I have no idea that you have formated my code. When I edited it's still mashed up.
@ Anuj Dekavadiya im kinda new to python, could you show me how to create a list of list in this case?
@jane998 I have updated the answer with a list of list :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.