1

I am trying to perform a linear interpolation in Python such that the interpolated values are calculated off certain values in a certain column.

In my example below, I want to interpolate Measurements for Products "a", and "b" using the actual numerical values from the relevant cells in Height column in the interpolation. Could I call the Height column the index for the interpolation?

When I am interpolating the missing Measurements for Product "a", the index values to use in interpolation are 4, 5 and 7. For interpolation of Product "b", index values to use are 1, 2.2 and 3.

I know about dataframe.interpolate() but I am struggling to modify my code to use the correct index values.

Here is the code for the dataframe I am starting with:

import pandas as pd
testdata1 = [('Product', ['a', 'a', 'a', 'b', 'b', 'b','b']),
     ('Height', ['4', '5', '7', '1',  '2.2', '3','4']),
     ('Measurement', ['35.00', '', '55.00','10.00','','30.00','40.00']),
     ]
df = pd.DataFrame.from_items(testdata1)
df

testdata1

And here is the code for the dataframe I need:

targetdf = [('Product', ['a', 'a', 'a', 'b', 'b', 'b','b']),
     ('Height', ['4', '5', '7', '1',  '2.2', '3','4']),
     ('Measurement', ['35.00', '41.67', '55.00','10.00','22.00','30.00','40.00']),
     ]
df2 = pd.DataFrame.from_items(targetdf)
df2

enter image description here

If this cannot be done with a dataframe, I am open to other ideas to do it in Python.

Any help is greatly appreciated. I am new to Python. Thank you.

1 Answer 1

2

You can use first some data cleaning:

#convert '' to `NaN`s and all strings to numeric
df['Measurement'] = pd.to_numeric(df['Measurement'], errors='coerce')
#convert strings to numeric
df['Height'] =  df['Height'].astype(float)

Then set_index, groupby and apply custom function with interpolate, last reset_index and get original order of columns by reindex:

df1 = df.set_index('Height') \
        .groupby('Product') \
        .apply(lambda x: x.interpolate(method='index')) \
        .reset_index() \
        .reindex(columns=df.columns)
print (df1)
  Product  Height  Measurement
0       a     4.0    35.000000
1       a     5.0    41.666667
2       a     7.0    55.000000
3       b     1.0    10.000000
4       b     2.2    22.000000
5       b     3.0    30.000000
6       b     4.0    40.000000
Sign up to request clarification or add additional context in comments.

2 Comments

Please could you explain the reset index, and reindex parts of your code? It's not 100% clear to me. Thanks
sure - reset-index only create column from index and reindex change order of columns names. the best is try remove this part of code for see difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.