Python: C engine does not support regex separators

Question

Attempting to upload a bunch of csv's to a database. The csvs are not necesarily always separated by a comma so I used a regular expression to ensure the correct delimiters are used. I then added the

error_bad_lines=False

in order to handle CParserError: Error tokenizing data. C error: Expected 3 fields in line 127, saw 4 which resulted in me getting this error

ValueError: Falling back to the 'python' engine because the 'c' engine does not support regex separators, but this causes 'error_bad_lines' to be ignored as it is not supported by the 'python' engine.

for the following code

Is there a workaround?

import psycopg2
import pandas as pd
import sqlalchemy as sa
csvList = []
tableList = []
filenames = find_csv_filenames(directory)
for name in filenames:  
    lhs, rhs = str(name).split(".", 1)
    print name
    dataRaw = pd.read_csv(name,sep=";|,",chunksize=5000000, error_bad_lines=False)
    for chunk in dataRaw:
        chunk.to_sql(name = str(lhs),if_exists='append',con=con)

What does your data look like? If your fields aren't always separated by commas, it's not really CSV. You may be able to hack something together, but if even using a regex separator doesn't allow you to consistently extract the fields, it sounds like you may be getting beyond what a CSV parser will handle. — BrenBarn
– BrenBarn, Commented Dec 18, 2015 at 17:02
the fields are separated by comma and semicolon as far as I know. I can manually go into each file and upload one at a time but then I have defeated the purpose of programming — Tyler Cowan
– Tyler Cowan, Commented Dec 18, 2015 at 17:04
Could you change these files? If yes you could preprocess your files and change ; to , with python re.sub or linux sed for example. — Anton Protopopov
– Anton Protopopov, Commented Dec 18, 2015 at 21:50
I suppose I could do that some of the files have problems with loading into memory they have 30 columns 55 million rows kind of thing and it seems to blow up my 32GB of RAM pretty quick. ill look into re.sub — Tyler Cowan
– Tyler Cowan, Commented Dec 18, 2015 at 21:54
You could do it line by line and create another clean file (if you have enough storage to store it). Use answer for that question. — Anton Protopopov
– Anton Protopopov, Commented Dec 18, 2015 at 22:53

Anshul Vyas · Accepted Answer · 2020-06-02 20:49:47Z

9

As per pandas parameter in this link Pandas-link if the separator is more than one character you need to add engine parameter as 'python'.

try this,

dataRaw = pd.read_csv(name,sep=";|,",engine ='python',chunksize=5000000,
error_bad_lines=False)

edited Jun 2, 2020 at 20:49

Anshul Vyas

6839 silver badges19 bronze badges

answered Jun 2, 2020 at 17:30

jatin shah

1011 silver badge3 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Yonatan Simson Over a year ago

I don't understand why this was downvoted. Good answer

Anton Protopopov · Accepted Answer · 2015-12-18 23:08:13Z

1

If you could preprocess and change your file try to change ; separator to , to make clean csv file. You could do it with fileinput to change it inplace:

import fileinput

for line in fileinput.FileInput('your_file', inplace=True):
    line = line.replace(';', ',')
    print(line, end='')
fileinput.close()

Then you could use read_csv with c engine and use parameter error_bad_lines or you could also preprocess them with that loop.

Note: If you want to make backup of your file you could use backup parameter for FileInput

answered Dec 18, 2015 at 23:08

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

Collectives™ on Stack Overflow

Python: C engine does not support regex separators

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related