0

Instead of manually convert csv file to text tab delimited file using excel software

enter image description here

I would like to automate this process using Python.

However, using the following code

with open('endnote_csv.csv', 'r') as fin:
       with open('endnote_deliminated.txt', 'w', newline='') as fout:
           reader = csv.DictReader(fin, delimiter=',')
           writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
           writer.writeheader()
           writer.writerows(reader)

Return an error of

ValueError: dict contains fields not in fieldnames: None

May I know where did I do wrong,

The csv file is accessible via the following link

Thanks in advance for any insight.

2
  • your source file is not a proper .csv file. It has tons of commas in the middle of the entries. Commented Jul 6, 2020 at 16:57
  • 1
    this may help out if you want to avoid pandas: stackoverflow.com/questions/21527057/… Commented Jul 6, 2020 at 17:04

2 Answers 2

2

You can use the Python package called pandas to do this:

import pandas as pd
fname = 'endnote_csv'
pd.read_csv(f'{fname}.csv').to_csv(f'{fname}.tsv', sep='\t', index=False)

Here's how it works:

  • pd.read_csv(fname) - reads a CSV file and stores it as a pd.DataFrame object (not important for this example)

  • .to_csv(fname) - writes a pd.DataFrame to a CSV file given by fname

  • sep='\t' - replaces the ',' used in CSVs with a tab character

  • index=False - use this to remove the row numbers

If you want to be a bit more advanced and use the command line only, you can do this:

# csv-to-tsv.py
import sys

import pandas as pd

fnames = sys.argv[1:]

for fname in fnames:
    main_name = '.'.join(fname.split('.')[:-1])
    pd.read_csv(f'{main_name}.csv').to_csv(f'{main_name}.tsv', sep='\t', index=False)

This will allow you to run a command like this from the command line and change all .csv files to .tsv files in one go:

python csv-to-tsv.py *.csv
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the suggestion, I will try this and come back to you.
Hi, apparently, this save the doc as tsv instead of txt format
just change everywhere it says '.tsv' to '.txt' - for example: pd.read_csv(f'{main_name}.csv').to_csv(f'{main_name}.txt', sep='\t', index=False)
0

It is erroring out on comma seperated author names. It appears that columns in the underline rows exceeds number of headers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.