From mysql I am generating a tab-separated output file using outfile. I then use python to load the tsv and process it. I feel like I'm missing something, but I cannot figure out how to get csv.reader to accept data where quoted fields can contain \t tabs, \n newlines, \r carriage returns, etc. The csv.reader keeps breaking the rows on all newline characters, not just the \n newline characters outside of my quoted fields.
Settings:
with open('/path/to/file.tsv', 'rbU') as f:
reader = csv.reader(
f,
delimiter='\t',
lineterminator='\n',
quoting=csv.QUOTE_ALL
)
for line in reader:
# do something
Example:
In the example below, \r is an actual carriage return, \n is an actual newline, and \N is what mysql is outputting for a null value.
"4256996" "[email protected]" "Y " "98230\r" "2012-07-10T12:00:00" "some location" \N \N "false" "aaa" "another-field" "true" 1
The resulting output:
['4256996', '[email protected]', 'Y\t', '98230'], ['2012-07-10T12:00:00', 'some location', '\\N', '\\N', 'false', 'aaa', 'another-field', 'true', '1']
Is there a way to get the csv.reader to read this input data properly, or is this some sort of limitation with the csv.reader object?
Note: If you try to replicate this, make sure you replace \r with an actual carriage return, \n with an actual newline, etc.
open()call and the way you set up the reader.'rbU'mode? Binary mode doesn't do universal line endings, universal line endings assumes text mode instead.