Remove New Line from CSV file's string column

Question

I have a CSV file with multiple fields. There are few fields(string) for which data got spans to multiple lines. I want to aggregate those multiple lines into one line.

Input Data:

1, "asdsdsdsds", "John"
2, "dfdhifdkinf
dfjdfgkdnjgknkdjgndkng
dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"

Expected Output:

1, "asdsdsdsds", "John"
2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"

The same question is asked in SO earlier. However the solution is achieved using power shell. Is it possible to achieve the same using python or pandas or pyspark.

Whenever the data spans multiple lines it will be in double quotes for sure.

What I tried

I can able to read the the data without any issues using pandas and pyspark even though there are fields whose got spanned to multiple lines.

Pandas:

pandas_df = pd.read_csv("file.csv")

PySpark

df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true') \
        .option("delimiter", ",").option("escape", '\\').option("escape", ':').\
    option("parserLib", "univocity").option("multiLine", "true").load("file.csv")

Edit:

There can be n number of fields in the csv file and this data span can be in any field.

"I can able to read the the data without any issues using pandas and pyspark even though there are fields whose got spanned to multiple lines." Then what exactly is the issue? — mayank agrawal
– mayank agrawal, Commented Feb 19, 2018 at 7:43
I want data(cleansed) in new csv file where multiple lines are into one line. — data_addict
– data_addict, Commented Feb 19, 2018 at 8:01

piRSquared · Accepted Answer · 2018-02-19 07:52:08Z

2

def weird_gen(s):
    s = [s]
    while s:
        *x, a = s[0].split(',', 2)
        y, *s = a.split('\n', 1)
        yield ', '.join(z.strip().replace('\n', ' ') for z in x + [y])

print('\n'.join(weird_gen(open('bad.csv').read())))

1, "asdsdsdsds", "John"
2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"

answered Feb 19, 2018 at 7:52

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

DYZ Over a year ago

One should not use .split(',') to parse CSV files, because a comma may be within a quoted field.

piRSquared Over a year ago

You are right. This is the best I came up with for now. If someone posts any reasonable answer, I'll likely delete this.

Rakesh · Accepted Answer · 2018-02-19 08:03:28Z

0

This might help. I am using a simple for loop and negative indexing to get your required result.

s = """1, "asdsdsdsds", "John"
2, "dfdhifdkinf
dfjdfgkdnjgknkdjgndkng
dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul"""

res = []

for i in s.split("\n"):
    if i[0].isdigit():
        res.append(i)
    else:
        res[-1] = res[-1] + " " + I

for i in res:
    print(i)

Output:

1, "asdsdsdsds", "John"
2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy"
3, "dfjfdkgjfgn", "Rahul

answered Feb 19, 2018 at 8:03

Rakesh

82.9k17 gold badges86 silver badges122 bronze badges

4 Comments

data_addict Over a year ago

Hi Rakesh, Thanks for the suggested answer. I updated the question. There can be n number of fields and this data span can be in any field. Could you please suggest the solution.

Rakesh Over a year ago

Can you provide an example data?

data_addict Over a year ago

1, "asdsdsdsds", "John",3,4,5,"Hi","This is  	new line data" 2, "dfdhifdkinf dfjdfgkdnjgknkdjgndkng dkfdkjfnjdnf", "Roy",6,7,8,"Hellooo 	ooooooo 			ooooooo", "This" 3, "dfjfdkgjfgn", "Rahul",1,2,3,"Hi  this is   	new line data","This is   		another new line data"

stevenferrer Over a year ago

the assumption that the first character in every line should be a digit might fail if the string that got spanned on multiple lines contains digits. for example. 1, "some line\n1 plus one equals 2"

Collectives™ on Stack Overflow

Remove New Line from CSV file's string column

2 Answers 2

2 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related