Eliminate unnecessary spaces/comma in CSV file. Python

Question

I have a .csv file that looks like this:

 1  [AS?] [NULL] x.x.x.x 1.5ms
 2  [AS?] [NULL] x.x.x.x 2.7ms
 4  [AS?] [NULL] x.x.x.x 31.6ms
 6  [AS?] [NULL] x.x.x.x 43.5ms
 7  [6805] [TEDE-INFRA] x.x.x.x 52.8ms
 8  [6805] [TEDE-INFRA] x.x.x.x 49.2ms
 9  [12638] [TEDE-INFRA] x.x.x.x 45.9ms
10  [15169] [GOOGLE] x.x.x.x 65.4ms
11  [15169] [GOOGLE] x.x.x.x 67.3ms
12  [15169] [GOOGLE]  x.x.x.x 30.9ms

I need to remove the space in the first 7 lines and in the last one (between [GOOGLE] and x.x.x.x, because that ruins the processing in pandas. I tried to convert the sep in comma, but error persist like this:

,1,,[AS?],[NULL],x.x.x.x,1.5ms
,2,,[AS?],[NULL],x.x.x.x,2.7ms
,4,,[AS?],[NULL],x.x.x.x,31.6ms
,6,,[AS?],[NULL],x.x.x.x,43.5ms
,7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
,8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
,9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],,x.x.x.x,30.9ms

What I expect is something like this

1,,[AS?],[NULL],x.x.x.x,1.5ms 
2,,[AS?],[NULL],x.x.x.x,2.7ms
4,,[AS?],[NULL],x.x.x.x,31.6ms
6,,[AS?],[NULL],x.x.x.x,43.5ms
7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],x.x.x.x,30.9ms

With the frist lines and the last line without unnecesary spaces/coma. Is possible do it? How I can do?

sjw · Accepted Answer · 2022-02-03 16:49:19Z

3

You can pass a regular expression to read_csv to tell it that 1 or more whitespace characters is to be considered the separator:

df = pd.read_csv('data.csv', sep=r'\s+', header=None)

Giving:

    0        1             2        3       4
0   1    [AS?]        [NULL]  x.x.x.x   1.5ms
1   2    [AS?]        [NULL]  x.x.x.x   2.7ms
2   4    [AS?]        [NULL]  x.x.x.x  31.6ms
3   6    [AS?]        [NULL]  x.x.x.x  43.5ms
4   7   [6805]  [TEDE-INFRA]  x.x.x.x  52.8ms
5   8   [6805]  [TEDE-INFRA]  x.x.x.x  49.2ms
6   9  [12638]  [TEDE-INFRA]  x.x.x.x  45.9ms
7  10  [15169]      [GOOGLE]  x.x.x.x  65.4ms

edited Feb 3, 2022 at 16:49

answered Feb 3, 2022 at 16:39

sjw

6,5512 gold badges30 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Elijah · Accepted Answer · 2022-02-03 16:49:29Z

0

Are you sure you want the extra comma after the numbers? Just clean the lines manually then. The below is supposed to be pseudo-code, you may need to modify it a bit.

row = ''
clean_row = csv_row.split()
row = clean_row[0] + ',,' + ','.join(clean_row[1:])

answered Feb 3, 2022 at 16:49

Elijah

2,31825 silver badges35 bronze badges

Collectives™ on Stack Overflow

Eliminate unnecessary spaces/comma in CSV file. Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related