Pandas: OverflowError when trying to create a new df column using a long-int value

Question

I have a dataframe as such

      ID       NAME  group_id
0     205292   A     183144058824253894513539088231878865676           
1     475121   B     183144058824253894513539088231878865676
1     475129   C     183144058824253894513539088231878865676

I want to transform it such that row 0 is linked to the other rows in the following way

   LinkedBy  By_Id    LinkedTo  To_Id   group_id
1  A         205292   B         475121  183144058824253894513539088231878865676
2  A         205292   C         475129  183144058824253894513539088231878865676

Basically, I am compressing the first dataframe by linking 0th index row against all other such that an n row df will give me a (n-1) row df. I can accomplish this without the group id (which is of type long and stays constant) by the following code:

pd.DataFrame({"LinkedBy": df['NAME'].iloc[0],"By_Id": df['ID'].iloc[0],"LinkedTo":df['NAME'].iloc[1:],"To_Id":df['ID'].iloc[1:]})

But I am facing problems while adding a group id. When I do the following

pd.DataFrame({"LinkedBy": df['NAME'].iloc[0],"By_Id": df['ID'].iloc[0],"LinkedTo":df['NAME'].iloc[1:],"To_Id":df['ID'].iloc[1:],"GroupId":df['potential_group_id'].iloc[0]})

I get OverflowError: long too big to convert

How do I add the group_id of type long to my new df.

can it just be a str instead which should work? so basically cast the dtype using .astype(str) — EdChum
– EdChum, Commented Sep 14, 2016 at 20:45
I suspect this is because you are passing arrays of different sizes. This introduces NaNs, which forces to float and you cannot have that big of floats. If it is not cruical, I agree that str would be a better choice. — user2285236
– user2285236, Commented Sep 14, 2016 at 20:51
Ideally, I would like to keep them as Long. I would like to know what happens in the background while trying to create this df that gives the error and if there's another way to skin the cat so to speak — Fizi
– Fizi, Commented Sep 14, 2016 at 20:52
The problem is in the dict. "LinkedBy": df['NAME'].iloc[0] this has only one entry but "LinkedTo": df['NAME'].iloc[1:] this one has two. Instead of [A] you need to pass [A, A]. Maybe with 2* [df['NAME'].iloc[0]]. — user2285236
– user2285236, Commented Sep 14, 2016 at 20:58

Kartik · Accepted Answer · 2016-09-14 21:27:43Z

1

Since your group_id in all rows appears to be the same, you could try this:

res = pd.merge(left=df.iloc[0,:], right=df.iloc[1:,:], how='right', on=['group_id'])
res.columns = ['By_Id', 'LinkedBy', 'group_id', 'To_Id', 'LinkedTo']

Note that this will only work when group_id can be used as your join key.

answered Sep 14, 2016 at 21:27

Kartik

8,73345 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2016-09-14 22:25:38Z

groupby everything and then apply with custom function
cond1 make sure 'group_id' matches
cond2 make sure 'NAME' does not match
subset df in apply function
rename and drop stuff
more renaming and dropping and resetting

def find_grp(x):
    cond1 = df.group_id == x.name[2]
    cond2 = df.NAME != x.name[1]
    temp = df[cond1 & cond2]
    rnm = dict(ID='To_ID', NAME='LinkedTo')
    return temp.drop('group_id', axis=1).rename(columns=rnm)


cols = ['ID', 'NAME', 'group_id']
df1 = df.groupby(cols).apply(find_grp)
df1.index = df1.index.droplevel(-1)
df1.rename_axis(['By_ID', 'LinkedBy', 'group_id']).reset_index()

OR

df1 = df.merge(df, on='group_id', suffixes=('_By', '_To'))
df1 = df1[df1.NAME_By != df1.NAME_To]

rnm = dict(ID_By='By_ID', ID_To='To_ID', NAME_To='LinkedTo', NAME_By='LinkedBy')

df1.rename(columns=rnm)

Collectives™ on Stack Overflow

Pandas: OverflowError when trying to create a new df column using a long-int value

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related