6

I have done some search but most answer is about reading a complete csv file and none of these is like the problem I'm facing.

I'm trying to read a file from net using urllib2:

request = urllib2.Request('http://.../tv.txt')
response = urllib2.urlopen(request)
lines = response.readlines()
for line in lines:
    ...

The "line" format looks like these:

"ABC", "XYZ,MNO", "KLM"
"ABC", "MN"
"ABC", "123", "10", "OPPA GANGNAM STYLE", "LADY"

As seen above, these lines are not actually CSV lines. The number of columns keeps changing.

Is there a way to split each line into a list? The desire result should be:

["ABC", "XYZ,MNO", "KLM"]
["ABC", "MN"]
["ABC", "123", "10", "OPPA GANGNAM STYLE", "LADY"]

I've tried using line.split(",") but it cannot split correctly because there is comma inside each pair of double quotes.

Please help me if you know how to. Thank you very much.

Cheers,

PHP-Python-Java-MySQL-newbie.

2 Answers 2

8

use the csv module, it does what you need.

yourstring= '"ABC", "XYZ,MNO", "KLM"\n"ABC", "MN"\n"ABC", "123", "10", "OPPA GANGNAM STYLE", "LADY"'

import csv
import io

class MyDialect(csv.Dialect):
    strict = True
    skipinitialspace = True
    quoting = csv.QUOTE_ALL
    delimiter = ','
    quotechar = '"'
    lineterminator = '\n'


b = io.StringIO(yourstring)
r = csv.reader(b, MyDialect())

for i in r:
    print len(i), ':',' @ '.join(i)
Sign up to request clarification or add additional context in comments.

3 Comments

As mentioned in the question, I have tried csv module. If you can point me what function in csv module, that'd be great. I have tried for row in csv.reader([line]): print row per this docs.python.org/2/library/csv.html#examples but no success
Thank you. While waiting I've changed source from CSV to JSON format so didn't test your script, but will mark this as answer then.
Can you add a few lines regarding the default dialect?
0
import csv
import io

input='''"ABC", "XYZ,MNO", "KLM"
"ABC", "MN"
"ABC", "123", "10", "OPPA GANGNAM STYLE", "LADY"'''

resader = csv.reader(
    io.StringIO(input), 
    delimiter = ',', 
    quotechar = '"', 
    skipinitialspace = True,
)
for row in resader:
    print(row)

will result as

['ABC', 'XYZ,MNO', 'KLM']
['ABC', 'MN']
['ABC', '123', '10', 'OPPA GANGNAM STYLE', 'LADY']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.