0

I have a variable "Age" with varying measures for age stored as a string. Example:

Age = ("3 weeks" , "2 years" , "1 day", "4 weeks")

I am interested in using the time measure (weeks, years, day) to convert the variable to an integer expressing the number in the string as a fraction of a year. In other words, I want to convert 3 weeks into the equivalent of 3/52 in int form.

Any suggestions on how I can do this in pandas? Appreciate any advice that is forthcoming.

M

3
  • Could you give an example of an input and desired output? Commented May 18, 2016 at 17:44
  • I think the desired solution is clear in my original question. What I'd like is for my pandas series object to go from being a string that says "x weeks" to an int that is equal to x/52 or "x days" to become an int that is equal to x/365. Both being a fraction of a year. Commented May 18, 2016 at 18:12
  • Solved the problem with some help from you all with these lines: time = {"year" : 1, "years": 1, "days": 365, "day": 365, "month":12, "months": 12, "week": 52, "weeks": 52} num = X_train['AgeuponOutcome'].str.split().str[0].astype(float) measure = X_train['AgeuponOutcome'].str.split().str[1].replace(time.keys(), time.values()) X_train['OutcomeAge'] = num/measure Commented May 18, 2016 at 18:43

3 Answers 3

2

Using parsedatetime,

import datetime as DT
import pandas as pd
import parsedatetime as pdt

today = DT.date.today()
def parse(x, p=pdt.Calendar()):
    return DT.datetime(*p.parse(x, today.timetuple())[0][:6])

age = ("3 weeks" , "2 years" , "1 day", "4 weeks")
s = pd.Series(age)
s = s.map(parse) - today
s = s / pd.Timedelta(1, unit='Y')
print(s)

yields

0    0.057496
1    1.998672
2    0.002738
3    0.076661
dtype: float64
Sign up to request clarification or add additional context in comments.

1 Comment

OP wants it converted to an int
1

This should work:

d = {"weeks":52,"years":1,"day":365}
[float(i.split(" ")[0])/d[i.split(" ")[1]] for i in Age]

Note that this assumes that all your data is split by a whitespace, and you only have "day" in the data set - if you have instances of "days" you'd have to add that to the dict.

5 Comments

+1 for cleanest, most flexible solution. I do note however that OP asked for a solution for pandas, though didn't mention exactly the desired output.
I get an error saying " 'set' object is not subscriptable" when attempting your solution. I think it is localized to the denominator of your proposed solution. Appreciate the help!
Have you tried the above code for the Age tuple defined in your original question? Don't see where a set would come from here, unless your actual Age variable contains one!
My bad. I enclosed the values in the dict with a ". I adjusted that so my code matches yours but now I get a AttributeError: 'float' object has no attribute 'split'. I will see if I can figure this out. But if you have advice, it is appreciated. Thanks again.
These are the first 5 values in the series...to give you a better sense: 0 1 year 1 1 year 2 2 years 3 3 weeks 4 2 years
0

This will do what you want, I think, using Python lists:

#function to convert each string to fraction in years    
def word2time(strVal):
   num,word = strVal.split()
   num = int(num)
   if word == 'weeks' or word == 'week':
      return float(num)/52
   elif word == 'days' or word == 'day':
      return float(num)/365
   elif word == 'years' or word == 'year':
      return num

#demonstration on the input you provided   
Age = ['3 weeks', '2 years', '1 day', '4 weeks']

ageInYrs = []
for strVal in Age:
   ageInYrs.append(word2time(strVal))

print ageInYrs

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.