Looking at your data, someone has dumped the str version of a list into a file as-is, using python2.
One thing's for sure - you can't use a CSV reader for this data. You can't even use a JSON parser (which would've been the next best thing).
What you can do, is use ast.literal_eval. With python2, this works out of the box.
import ast
data = []
with open('file.txt') as f:
for line in f:
try:
data.append(ast.literal_eval(line))
except (SyntaxError, ValueError):
pass
data should look something like this -
[(22642441022L,
'<a href="http://example.com">Click</a>',
'fox, dog, cat are examples http://example.com'),
(1153634043,
'<a href="http://example.com">Click</a>',
"I learned so much from my mistakes, I think I'm gonna make some more")]
You can then pass data into a DataFrame as-is -
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
df
A B \
0 22642441022 <a href="http://example.com">Click</a>
1 1153634043 <a href="http://example.com">Click</a>
C
0 fox, dog, cat are examples http://example.com
1 I learned so much from my mistakes, I think I'...
If you want this to work with python3, you'll need to get rid of the long suffix L, and the unicode prefix u. You might be able to do this using re.sub from the re module.
import re
for line in f:
try:
i = re.sub('(\d+)L', r'\1', line) # remove L suffix
j = re.sub('(?<=,\s)u(?=\')', '', i) # remove u prefix
data.append(ast.literal_eval(j))
except (SyntaxError, ValueError):
pass
Notice the added re.sub('(\d+)L', r'\1', line), which removes the L suffix at the end of a string of digits.
(andu'?Lsuffix at the end. My guess is someone foolishlystrd a list of tuples into a file using python2. Please kick them.printof a list of tuples... Eval comes to mind, although is probably not such a great idea (stackoverflow.com/questions/1832940/…)