2

There is huge list data, how to convert all IP addresses to decimal numbers in pandas and merge with the value of the second column

   import pandas as pd
   filename = "/Users/sda/Desktop/file"
   pdd = pd.read_csv(filename,header=None,sep='|',error_bad_lines=False, warn_bad_lines=False,skiprows=[0],
              names=['Name','Code','Ipv', 'Ip','Range','Date', 'Category'])
   pd.options.display.max_columns = None
   k = pdd[pdd['Ipv'].str.contains("ipv4") & pdd['Ip'].str.contains('[0-9]')]
   print(k[['Ip','Range','Code']])

my output:

        Ip         Range    Code
     2.16.0.0      524288   EU
     200.109.100.0 1024     RU
     200.109.102.0 1024     RU

only need to get the decimal value of the first IP address with the same country code and range number.merge with the value of the second column only

       IP         range code
    3362612224    2028  RU

1 Answer 1

1

IIUC, Considering the dataframe name of the output is df, something like this:

import socket, struct

def ip2int(ip):
    """
    Convert an IP string to int
    """
    packedIP = socket.inet_aton(ip)
    return struct.unpack("!L", packedIP)[0]

df['ip_int'] = df.Ip.apply(ip2int)
df['range_sum']=df.groupby(['Code'])['Range'].transform('sum')
df[df.Code.duplicated(keep='last')]

               Ip  Range Code      ip_int  range_sum
 1  200.109.100.0   1024   RU  3362612224       2048
Sign up to request clarification or add additional context in comments.

8 Comments

use this function under this line? k = pdd [pdd ['Ipv'] .str.contains ("ipv4") & pdd ['Ip']. str.contains ('[0 - 9]')]
@warezers this is after you have the output. df = k[['Ip','Range','Code']]
got some error: Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: pandas.pydata.org/pandas-docs/stable/… k['range_sum']=k.groupby(['Code'])['Range'].transform('sum')
@warezers that is because k is a slice try assigning k to a dataframe , df = k[['Ip','Range','Code']] , also before that execute this line pd.options.mode.chained_assignment = None after that use the code. :)
Now it's not an error and if I get the output of this value only? # df[df.Code.duplicated(keep='last')]) print(df[['ip_int','range_sum','Code']])
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.