0

I have a .csv file that looks like this:

 1  [AS?] [NULL] x.x.x.x 1.5ms
 2  [AS?] [NULL] x.x.x.x 2.7ms
 4  [AS?] [NULL] x.x.x.x 31.6ms
 6  [AS?] [NULL] x.x.x.x 43.5ms
 7  [6805] [TEDE-INFRA] x.x.x.x 52.8ms
 8  [6805] [TEDE-INFRA] x.x.x.x 49.2ms
 9  [12638] [TEDE-INFRA] x.x.x.x 45.9ms
10  [15169] [GOOGLE] x.x.x.x 65.4ms
11  [15169] [GOOGLE] x.x.x.x 67.3ms
12  [15169] [GOOGLE]  x.x.x.x 30.9ms

I need to remove the space in the first 7 lines and in the last one (between [GOOGLE] and x.x.x.x, because that ruins the processing in pandas. I tried to convert the sep in comma, but error persist like this:

,1,,[AS?],[NULL],x.x.x.x,1.5ms
,2,,[AS?],[NULL],x.x.x.x,2.7ms
,4,,[AS?],[NULL],x.x.x.x,31.6ms
,6,,[AS?],[NULL],x.x.x.x,43.5ms
,7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
,8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
,9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],,x.x.x.x,30.9ms

What I expect is something like this

1,,[AS?],[NULL],x.x.x.x,1.5ms 
2,,[AS?],[NULL],x.x.x.x,2.7ms
4,,[AS?],[NULL],x.x.x.x,31.6ms
6,,[AS?],[NULL],x.x.x.x,43.5ms
7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],x.x.x.x,30.9ms

With the frist lines and the last line without unnecesary spaces/coma. Is possible do it? How I can do?

2 Answers 2

3

You can pass a regular expression to read_csv to tell it that 1 or more whitespace characters is to be considered the separator:

df = pd.read_csv('data.csv', sep=r'\s+', header=None)

Giving:

    0        1             2        3       4
0   1    [AS?]        [NULL]  x.x.x.x   1.5ms
1   2    [AS?]        [NULL]  x.x.x.x   2.7ms
2   4    [AS?]        [NULL]  x.x.x.x  31.6ms
3   6    [AS?]        [NULL]  x.x.x.x  43.5ms
4   7   [6805]  [TEDE-INFRA]  x.x.x.x  52.8ms
5   8   [6805]  [TEDE-INFRA]  x.x.x.x  49.2ms
6   9  [12638]  [TEDE-INFRA]  x.x.x.x  45.9ms
7  10  [15169]      [GOOGLE]  x.x.x.x  65.4ms
Sign up to request clarification or add additional context in comments.

Comments

0

Are you sure you want the extra comma after the numbers? Just clean the lines manually then. The below is supposed to be pseudo-code, you may need to modify it a bit.

row = ''
clean_row = csv_row.split()
row = clean_row[0] + ',,' + ','.join(clean_row[1:])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.