1

Everyday we get CSV file from vendor and we need to parse them and insert it to database. We use single Python3 program for all the tasks.

The problem happening is with multiline CSV files, where the contents in the second lines are skipped.

48.11363;11.53402;81369;München;"";1.0;1962;I would need
help from 
Stackoverflow;"";"";"";289500.0;true;""

Here the field "I would need help from Stackoverflow" is spread in 3 lines.

The problem that happens is python3 only considers "I would Need" as a record and skips the rest of the part.

At present I am using below options to read from database :

with open(file_path, newline='', encoding='utf-8') as f:
    reader = csv.reader(f,  delimiter=',' , quotechar='"', quoting=csv.QUOTE_MINIMAL)
    for row in reader:
         {MY LOGIC}

Is there any way to include multiline CSV as a single record.

I understand, In pyspark, there is an option of option("multiline",True) but we don't want to use pyspark in first place.

Looking for options.

Thanks in Advance

7
  • 1
    Do you know how many columns should be in one row? Commented Oct 7, 2020 at 7:18
  • can you show us the expected result please ? with rows and columns. Commented Oct 7, 2020 at 7:32
  • would this post help you ?stackoverflow.com/questions/20982437/… Commented Oct 7, 2020 at 7:33
  • 2
    Speak with the providers of the source data and have them sort it out, then send you a clean file. You (your company) should not be responsible for cleaning their poor data quality. Commented Oct 7, 2020 at 7:46
  • @Alderven : Yes there would be 60 columns in 1 row. Commented Oct 8, 2020 at 6:28

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.