Change Column values in pandas applying another function

Question

I have a data frame in pandas, one of the columns contains time intervals presented as strings like 'P1Y4M1D'.

The example of the whole CSV:

oci,citing,cited,creation,timespan,journal_sc,author_sc
0200100000236252421370109080537010700020300040001-020010000073609070863016304060103630305070563074902,"10.1002/pol.1985.170230401","10.1007/978-1-4613-3575-7_2",1985-04,P2Y,no,no
...

I created a parsing function, that takes that string 'P1Y4M1D' and returns an integer number. I am wondering how is it possible to change all the column values to parsed values using that function?

def do_process_citation_data(f_path):
    global my_ocan

    my_ocan = pd.read_csv("citations.csv",
                          names=['oci', 'citing', 'cited', 'creation', 'timespan', 'journal_sc', 'author_sc'],
                          parse_dates=['creation', 'timespan'])
    my_ocan = my_ocan.iloc[1:]  # to remove the first row iloc - to select data by row numbers
    my_ocan['creation'] = pd.to_datetime(my_ocan['creation'], format="%Y-%m-%d", yearfirst=True)


    return my_ocan


def parse():
     mydict = dict()
     mydict2 = dict()
     i = 1
     r = 1
     for x in my_ocan['oci']:
        mydict[x] = str(my_ocan['timespan'][i])
        i +=1
     print(mydict)
     for key, value in mydict.items():
        is_negative = value.startswith('-')
        if is_negative:
            date_info = re.findall(r"P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)D)?$", value[1:])
        else:
            date_info = re.findall(r"P(?:(\d+)Y)?(?:(\d+)M)?(?:(\d+)D)?$", value)
        year, month, day = [int(num) if num else 0 for num in date_info[0]] if date_info else [0,0,0]
        daystotal = (year * 365) + (month * 30) + day
        if not is_negative:
            #mydict2[key] = daystotal
            return daystotal
        else:
           #mydict2[key] = -daystotal
            return -daystotal
     #print(mydict2)
     #return mydict2

Probably I do not even need to change the whole column with new parsed values, the final goal is to write a new function that returns average time of ['timespan'] of docs created in a particular year. Since I need parsed values, I thought it would be easier to change the whole column and manipulate a new data frame.

Also, I am curious what could be a way to apply the parsing function on each ['timespan'] row without modifying a data frame, I can only assume It could be smth like this, but I don't have a full understanding of how to do that:

      for x in my_ocan['timespan']:
          x = parse(str(my_ocan['timespan'])

How can I get a column with new values? Thank you! Peace :)

df['timespan'].apply(parse)? You should change your parse function to work on a single value though i.e. take a timespan string like 'P1Y4M1D' as it's input — Dan
– Dan, Commented May 19, 2020 at 11:11

Leonardo Fernandes · Accepted Answer · 2020-05-19 12:46:17Z

1

A df['timespan'].apply(parse) (as mentioned by @Dan) should work. You would need to modify only the parse function in order to receive the string as an argument and return the parsed string at the end. Something like this:

import pandas as pd

def parse_postal_code(postal_code):
    # Splitting postal code and getting first letters
    letters = postal_code.split('_')[0]
    return letters


# Example dataframe with three columns and three rows
df = pd.DataFrame({'Age': [20, 21, 22], 'Name': ['John', 'Joe', 'Carla'], 'Postal Code': ['FF_222', 'AA_555', 'BB_111']})

# This returns a new pd.Series
print(df['Postal Code'].apply(parse_postal_code))

# Can also be assigned to another column
df['Postal Code Letter'] = df['Postal Code'].apply(parse_postal_code)

print(df['Postal Code Letter'])

edited May 19, 2020 at 12:46

user7571182

answered May 19, 2020 at 11:25

Leonardo Fernandes

192 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Change Column values in pandas applying another function

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related