28

It appears that the pandas read_csv function only allows single character delimiters/separators. Is there some way to allow for a string of characters to be used like, "*|*" or "%%" instead?

4
  • Why do you want more than one? Commented Jul 2, 2015 at 21:24
  • 6
    Because I have several columns with unformatted text that can contain characters such as "|", "\t", ",", etc. The likelihood of somebody typing "%%" is much lower... Commented Jul 3, 2015 at 12:31
  • Found this in datafiles in the wild because \t was replaced into 4 spaces by some linter. Commented Oct 27, 2020 at 17:59
  • //Why do you want more than one?// it makes it easier to avoid delimiter collision, especially when you do not get to control the data Wikipedia link Commented Jan 10, 2023 at 1:04

5 Answers 5

13

Pandas does now support multi character delimiters

import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")
Sign up to request clarification or add additional context in comments.

4 Comments

It should be noted that if you specify a multi-char delimiter, the parsing engine will look for your separator in all fields, even if they've been quoted as a text. When the engine finds a delimiter in a quoted field, it will detect a delimiter and you will end up with more fields in that row compared to other rows, breaking the reading process.
Note that while read_csv() supports multi-char delimiters to_csv does not support multi-character delimiters as of as of Pandas 0.23.4. The original post actually asks about to_csv(). (Side note: including "()" in a link is not supported by Markdown, apparently)
It would be helpful if the poster mentioned which version this functionality was added.
You should specify engine="python" to avoid the ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators.
5

The solution would be to use read_table instead of read_csv:

1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25

So, we could read this with:

pd.read_table('file.csv', header=None, sep='\*\|\*')

Comments

1

As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Spec states about delimiters:

... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),

It's unsurprising, that both the csv module and pandas don't support what you're asking.

However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$ separating lines, and %% separating columns.

'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())

Of course, you don't have to turn it into a string like this prior to writing it into a file.

Comments

0

Not a pythonic way but definitely a programming way, you can use something like this:

import re

def row_reader(row,fd):
    arr=[]
    in_arr = str.split(fd)
    i = 0
    while i < len(in_arr):
        if re.match('^".*',in_arr[i]) and not re.match('.*"$',in_arr[i]):
            flag = True
            buf=''
            while flag and i < len(in_arr):
                buf += in_arr[i]
                if re.match('.*"$',in_arr[i]):
                    flag = False
                i+=1
                buf += fd if flag else ''
            arr.append(buf)
        else:
            arr.append(in_arr[i])
            i+=1
    return arr

with open(file_name,'r') as infile:
    for row in infile:
        for field in  row_reader(row,'%%'):
            print(field)

Comments

0

In pandas 1.1.4, when I try to use a multiple char separator, I get the message:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

Hence, to be able to use multiple char separator, a modern solution seems to be to add engine='python' in read_csv argument (in my case, I use it with sep='[ ]?;)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.