0

I'm having trouble splitting a csv because some of the fields have a "\n" inside them

i'm using:

file_data = csv_file.read().decode("utf-8")
csv_data = file_data.split("\n")

but the fields look something like

'string 1','string 2',
'string
 3'
'string 4',

i would like csv_data[0] to be strings 1 and 2, csv_data[1] to be string 3, and csv_data[2] to be string 4

the way i'm currently using, i get csv_data[0] correctly, but string 3 is split in two indexes since it has a /n inside it's text...

---------------[edit]---------------

i solved it by not using split, instead iterating through csv_data (answer posted below)

2
  • Use the csv module Commented Jan 4, 2022 at 21:28
  • 1
    Please don't parse csv manually. The format is way more complicated than it looks like. Commented Jan 4, 2022 at 21:29

3 Answers 3

2

You should use the library csv instead of trying to parse it yourself.

Here a link that can help you

Sign up to request clarification or add additional context in comments.

Comments

1

Use a library. Python has the csv module [Python-doc] to parse csv files. I strongly advise to use a parser since the CSV file format is more complicated than it looks like, for example there is syntax to specify quotes and new lines as content of a string.

You can parse the csv content and for example produce a list of lists with:

import csv

with open('mycsv.csv') as mycsv:
    csvreader = csv.reader(mycsv)
    data = [tuple(row) for row in csvreader]

Comments

0

i solved it by not using split, instead iterating through csv_data as following:

        csv_file = request.FILES["csv_upload"]

        if not csv_file.name.endswith('.csv'):
            messages.warning(request, "O arquivo não é um csv!")
            return HttpResponseRedirect(request.path_info)

        file_data = csv_file.read().decode("utf-8")
        csv_data = file_data.split("\r\n")

        fields = []
        fieldsTemp = []

        # pegando os campos do csv
        text = ''
        firstQuote = False
        secondQuote = False
        for x in csv_data:
            for char in x:
                # removendo a virgulas de separação
                if char != ',':
                    text = text + char

                # tratando strings que contém virgula
                if char == '\"':
                    if firstQuote:
                        secondQuote = True
                    firstQuote = True
                    if secondQuote:
                        firstQuote = False
                        secondQuote = False

                # adicionando o campo
                if not firstQuote:
                    if char == ',':
                        fieldsTemp.append(text)
                        text = ''
            fields.append(fieldsTemp)
            fieldsTemp = []

as it turned out, i could split by /r/n and it would solve part of the problem for my specific csv, but later i couldn't split by commas for te same reason, commas appear in strings, so instead i used that loop to check if i'm inside quotes, and manually creating my fields

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.