1

I have a dataframe that looks like this:

x        frames
0         7729.00  
0         7730.00     
0         7731.00
1         7735.00
1         7736.00
1         7737.00
1         7738.00
2         7741.00
2         7742.00

As you can see, the value for frames is sequential, but when x changes, there is a jump in frames. I want to continue frames so that it always increases by 1 and in this case, make x nan. Like this:

x        frames
0         7729.00  
0         7730.00     
0         7731.00
Nan       7732.00
Nan       7733.00
Nan       7734.00
1         7735.00
1         7736.00
1         7737.00
1         7738.00
Nan       7739.00
Nan       7740.00
2         7741.00
2         7742.00

EDIT

Here is the error I get using the first solution.

    df = df.set_index('frames').reindex(range(s.min(), s.max() + 1)).reset_index()
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 227, in wrapper
    return func(*args, **kwargs)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3856, in reindex
    return super().reindex(**kwargs)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4544, in reindex
    axes, level, limit, tolerance, method, fill_value, copy
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3744, in _reindex_axes
    index, method, copy, level, fill_value, limit, tolerance
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3766, in _reindex_index
    allow_dups=False,
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4613, in _reindex_with_indexers
    copy=copy,
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1251, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/Users/asi/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3099, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

If I have another column in the dataframe, like this:

x     y       frames
0     yes    7729.00  
0     yes    7730.00     
0     yes    7731.00
1     no     7735.00
1     no     7736.00
1     no     7737.00
1     no     7738.00
2     yes    7741.00
2     yes    7742.00

Then the solution turns all other columns (x and y) to NaN.

2 Answers 2

2

Use DataFrame.reindex with range by minimal and maximal values:

s = df['frames'].astype(int)
df = df.set_index('frames').reindex(range(s.min(), s.max() + 1)).reset_index()
print (df)
    frames    x
0     7729  0.0
1     7730  0.0
2     7731  0.0
3     7732  NaN
4     7733  NaN
5     7734  NaN
6     7735  1.0
7     7736  1.0
8     7737  1.0
9     7738  1.0
10    7739  NaN
11    7740  NaN
12    7741  2.0
13    7742  2.0

Or use right join in DataFrame.merge with helper DataFrame:

s = df['frames'].astype(int)
df = df.merge(pd.DataFrame({'frames': range(s.min(), s.max() + 1)}), how='right')
print (df)
      x  frames
0   0.0  7729.0
1   0.0  7730.0
2   0.0  7731.0
3   NaN  7732.0
4   NaN  7733.0
5   NaN  7734.0
6   1.0  7735.0
7   1.0  7736.0
8   1.0  7737.0
9   1.0  7738.0
10  NaN  7739.0
11  NaN  7740.0
12  2.0  7741.0
13  2.0  7742.0
Sign up to request clarification or add additional context in comments.

2 Comments

I get an error using the first solution. See edit. Thanks!
@connor449 - So it means there are duplicates, for remove them use df = df.drop_duplicates(subset=['frames']) before first or second solution (in second should working also if duplicates, but not sure if need this)
0

You could use the complete function from pyjanitor to expose the explicitly missing values. In this case, we pass in a dictionary, pairing the column with a callable, that generates the rows from mininum to maximum:

#pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import janitor

new_values = {"frames": lambda df: np.arange(df.min(), df.max() + 1)}

df.complete([new_values])
 
    frames    x
0   7729.0  0.0
1   7730.0  0.0
2   7731.0  0.0
3   7732.0  NaN
4   7733.0  NaN
5   7734.0  NaN
6   7735.0  1.0
7   7736.0  1.0
8   7737.0  1.0
9   7738.0  1.0
10  7739.0  NaN
11  7740.0  NaN
12  7741.0  2.0
13  7742.0  2.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.