1

I am experiencing something really weird, not sure if it is a bug (hopefully not). Anyway, when I perform DataFrame.shift method by columns, the columns either shifted incorrectly or the values returned incorrect (see output below).

Does anyone know if I am missing something or it is simply a bug with the library.

# Example 2
ind = pd.date_range('01 / 01 / 2019', periods=5, freq='12H')
df2 = pd.DataFrame({"A": [1, 2, 3, 4, 5],
                   "B": [10, 20, np.nan, 40, 50],
                   "C": [11, 22, 33, np.nan, 55],
                   "D": [-11, -24, -51, -36, -2],
                   'D1': [False] * 5,
                   'E': [True, False, False, True, True]},
                  index=ind)

df2.shift(freq='12H', periods=1, axis=1)
df2.shift(periods=1, axis=1)

print(df2.shift(periods=1, axis=1)) # shift by column -> incorrect
# print(df2.shift(periods=1, axis=0)) # correct

Output:

                     A     B     C   D     D1      E
2019-01-01 00:00:00  1  10.0  11.0 -11  False   True
2019-01-01 12:00:00  2  20.0  22.0 -24  False  False
2019-01-02 00:00:00  3   NaN  33.0 -51  False  False
2019-01-02 12:00:00  4  40.0   NaN -36  False   True
2019-01-03 00:00:00  5  50.0  55.0  -2  False   True

                      A   B     C    D   D1      E
2019-01-01 00:00:00 NaN NaN  10.0  1.0  NaN  False
2019-01-01 12:00:00 NaN NaN  20.0  2.0  NaN  False
2019-01-02 00:00:00 NaN NaN   NaN  3.0  NaN  False
2019-01-02 12:00:00 NaN NaN  40.0  4.0  NaN  False
2019-01-03 00:00:00 NaN NaN  50.0  5.0  NaN  False
[Finished in 0.4s]

1 Answer 1

2

You are right, it is bug, problem is DataFrame.shift with axis=1 shifts object columns to the next column with same dtype.

In sample columns A and D are filled by integers so A is moved to D, columns B and C are filled by floats, so B is moved to C and similar in boolean D1 and E columns.

Solution should be convert all columns to objects, shift and then use DataFrame.infer_objects:

df3 = df2.astype(object).shift(1, axis=1).infer_objects()
print (df3)
                      A  B     C     D  D1      E
2019-01-01 00:00:00 NaN  1  10.0  11.0 -11  False
2019-01-01 12:00:00 NaN  2  20.0  22.0 -24  False
2019-01-02 00:00:00 NaN  3   NaN  33.0 -51  False
2019-01-02 12:00:00 NaN  4  40.0   NaN -36  False
2019-01-03 00:00:00 NaN  5  50.0  55.0  -2  False

print (df3.dtypes)
A     float64
B       int64
C     float64
D     float64
D1      int64
E        bool
dtype: object

If use shift with axis=0 then dtypes are always same, so working correctly.

Sign up to request clarification or add additional context in comments.

3 Comments

Not something I was hoping for, and apparently it happens to any data type not just object dtype. Thanks for the info.
@JieJenn - answer was edited with better solution ;)
That worked like charm. Appreciated. Answer marked solved.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.