1

I have 2 dataframe and I need to get new column to first dataframe, using values from second FIrse df is

ID,"url","used_at","active_seconds"
8075643aab791cec7dc9d18926958b67,"sberbank.ru/ru/person/promo/10mnl?utm_source=Vesti.ru&utm_medium=html&utm_campaign=10_million_users_SBOL_dec2015&utm_term=every14_syncbanners",2016-01-01 00:03:16,183
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/system/login/rslogin.jsp?credit=false",2016-01-01 00:04:36,42
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/system/login/sms/sms.jsp?smsAuth=true",2016-01-01 00:05:18,22
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/rs/RSIndex.jspx",2016-01-01 00:05:40,14
a04a8041ffa6fe1b85471ca5af1ee575,"online.rsb.ru/hb/faces/rs/payments/PaymentReq.jspx",2016-01-01 00:05:54,22
ba880911a6d54f6ea6d3145081a0e0dd,"homecredit.ru/help/quest/feedback.php",2016-01-01 00:06:12,2

Second df looks like

URL Code
citibank\.ru\/russia\/info\/rus\/contacts_form\.htm 15
citibank\.ru\/russia\/info\/rus\/contacts\.htm  15
gazprombank\.ru\/contacts\/ 15
gazprombank\.ru\/feedback\/ 15
gazprombank\.ru\/additional_office\/    15
homecredit\.ru\/help\/quest\/feedback\.php  15
homecredit\.ru\/offices\/*  15

If I don't have a regex, I use

df1['code'] = df1.url.map(df2.set_index('URL')['Code'])

But I can't do this, because df2.URL is regex. But

df1['code'] = df1['url'].replace(df2['URL'], df2['Code'], regex=True)

doesn't work.

6
  • What error do you get? Commented Dec 12, 2016 at 9:25
  • 1
    Looking at pandas.Series.replace() I'm not sure if the method is supposed to work properly when series objects are passed to it as the to_replace and value arguments. Have you tried using Series.values, as in df1['code'] = df1['url'].replace(df2['URL'].values, df2['Code'].values, regex=True)? Commented Dec 12, 2016 at 9:25
  • @IanS it returns TypeError: replace() takes no keyword arguments Commented Dec 12, 2016 at 9:33
  • @cant I get sre_constants.error: bad character range Commented Dec 12, 2016 at 9:47
  • @cant it returns df1.url in df1.code, not df2.Code Commented Dec 12, 2016 at 9:50

1 Answer 1

1

As per my comment, the pandas.Series.replace() method doesn't allow Series objects as the to_replace and value arguments. Passing lists instead works:

df1['code'] = df1.url.replace(df2.URL.values, df2.Code.values, regex=True)
print df1[['url', 'code']]

produces the following output:

                                                 url  \
0  sberbank.ru/ru/person/promo/10mnl?utm_source=V...   
1  online.rsb.ru/hb/faces/system/login/rslogin.js...   
2  online.rsb.ru/hb/faces/system/login/sms/sms.js...   
3             online.rsb.ru/hb/faces/rs/RSIndex.jspx   
4  online.rsb.ru/hb/faces/rs/payments/PaymentReq....   
5              homecredit.ru/help/quest/feedback.php   

                                                code  
0  sberbank.ru/ru/person/promo/10mnl?utm_source=V...  
1  online.rsb.ru/hb/faces/system/login/rslogin.js...  
2  online.rsb.ru/hb/faces/system/login/sms/sms.js...  
3             online.rsb.ru/hb/faces/rs/RSIndex.jspx  
4  online.rsb.ru/hb/faces/rs/payments/PaymentReq....  
5                                                 15

In answer to your additional comments, you can't get df2.Code in df1.code in rows where df1.url doesn't match any of the regex strings, but you can provide a value (e.g. None) for these cases to be put in the column instead. This is, for example, done by adding the following line:

df1['code'] = df1.apply(lambda x: None if x.code == x.url else x.code, axis=1)

where print df1[['url', 'code']] returns the following:

                                                 url  code
0  sberbank.ru/ru/person/promo/10mnl?utm_source=V...   NaN
1  online.rsb.ru/hb/faces/system/login/rslogin.js...   NaN
2  online.rsb.ru/hb/faces/system/login/sms/sms.js...   NaN
3             online.rsb.ru/hb/faces/rs/RSIndex.jspx   NaN
4  online.rsb.ru/hb/faces/rs/payments/PaymentReq....   NaN
5              homecredit.ru/help/quest/feedback.php  15.0
Sign up to request clarification or add additional context in comments.

3 Comments

How can I get only code to df1['code]? I need get number, that df1.url is equal to df2.URL?
But if df1['url'] is a alfabank.ru/get-money/credit/credit-cash/#frame_credit_calc/buy and df2['URL'] is a alfabank.ru/get-money/credit/credit-cash/#frame_credit_calc and code from df2['Code'] is 3. When I replace I get 3/buy and after I get to `df1['code'] = '3/buy'. How can I get 3 to it?
If I'm not mistaken, adding .* to the end of that string would match any url that begins with it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.