2

I have a ip packet as a CSV file that I'm trying to extract sequence numbers from info field to a separate column that only has the sequence numbers. Sequence numbers are substring in the middle of info string. So here is my raw code.. First I create a new column fo sequence numbers, then I check if Info field contains a Seq number, then I split the info field so I only get the sequence number out. If I print after 'Seq = j.split...', I do get the correct values. How do I write it to the CSV file Seq column?

file = pd.read_csv(file.csv)

file['Seq'] = None
for i in file['Info']:
    if 'Seq' in i:
        split = i.split(' ')
        for j in split:
            if 'Seq=' in j:
                Seq = j.split('Seq=',1)[1]
                file.loc[i,'Seq'] = int(Seq)

Example CSV:

No. Time        Source      Destination Protocol    Length  Info  
1   0.000000    sourceip    192.168.0.1 TCP         54      35165 > 80 [SYN] Seq=0 Win=16384 Len=0  
2   0.000001    sourceip    192.168.0.1 TCP         54      14378 > 80 [SYN] Seq=0 Win=16384 Len=0  
3   0.000003    sourceip    192.168.0.1 TCP         54      31944 > 80 [SYN] Seq=0 Win=16384 Len=0

Desired outcome:

No. Time        Source      Destination Protocol    Length  Info                                   Seq  
1   0.000000    sourceip    192.168.0.1 TCP         54      35165 > 80 [SYN] Seq=0 Win=16384 Len=0 0  
2   0.000001    sourceip    192.168.0.1 TCP         54      14378 > 80 [SYN] Seq=0 Win=16384 Len=0 0  
3   0.000003    sourceip    192.168.0.1 TCP         54      31944 > 80 [SYN] Seq=0 Win=16384 Len=0 0
5
  • From quick glance it seems ok. Did you try file.to_csv? Did you have any exceptions? Commented Apr 14, 2017 at 15:40
  • I did try that. No exceptions, I do have the the Seq column but every value is empty. At what point should I write the file? After all the for loops or inside the for loops? I'm really new to all this stuff.. Commented Apr 14, 2017 at 16:12
  • You should write the file in the end. After you've extracted everything you wanted. piRSquared'd method below didn't extract any output? You might want to check that the file is read correctly. Try print(file.head()) to see the first 10 rows that were extracted. Commented Apr 14, 2017 at 16:17
  • Actually the Seq values are there but they are after the other data, not on the same rows. Commented Apr 14, 2017 at 16:19
  • Do you use the same code as before? piRSquared's code (dropping the astype(int) or replacing it with astype(float)) will solve it. [Your current code can't work - file.loc[i,'Seq'] will raise an exception, as i is not in the index] Commented Apr 14, 2017 at 16:56

1 Answer 1

2

Use str.extract

file['Seq'] = file.Info.str.extract('Seq=(\d+)', expand=False).astype(float)
Sign up to request clarification or add additional context in comments.

4 Comments

This should replace all the code below read_csv in the question
Even if there might not be a Seq value at all? I tried this and got ValueError: cannot convert float NaN to integer.
What do you want there to be in the event that 'Seq' is not there?
Just empty value

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.