1

for my dataframe, I want to add a new column for every single unique value in another column. The new column consists of several datetime entries that every unique value of the other column should get.

Example:

Original Df:

ID  
1             
2               
3

New Column DF:

Date
2015/01/01
2015/02/01
2015/03/01

Resulting Df:

ID    Date
1     2015/01/01
      2015/02/01
      2015/03/01
2     2015/01/01
      2015/02/01
      2015/03/01
3     2015/01/01
      2015/02/01
      2015/03/01

I tried to stick to this solution: https://stackoverflow.com/a/12394122/3856569 But it gives me the following error: Length of values does not match length of index

Anyone has a simple solution to do that? Thanks a lot!

0

2 Answers 2

1

UPDATE: replicating ids 6 times:

In [172]: %paste
data = """\
id
1
2
3
"""
df = pd.read_csv(io.StringIO(data))
# repeat each ID 6 times
df = pd.DataFrame(df['id'].tolist()*6, columns=['id'])

start_date = pd.to_datetime('2015-01-01')

df['date'] = start_date
df['date'] = df.groupby('id', as_index=False)\
               .transform(lambda x: pd.date_range(start_date,
                                                  freq='1D',
                                                  periods=len(x)))
df.sort_values(by=['id','date'])
## -- End pasted text --
Out[172]:
    id       date
0    1 2015-01-01
3    1 2015-01-02
6    1 2015-01-03
9    1 2015-01-04
12   1 2015-01-05
15   1 2015-01-06
1    2 2015-01-01
4    2 2015-01-02
7    2 2015-01-03
10   2 2015-01-04
13   2 2015-01-05
16   2 2015-01-06
2    3 2015-01-01
5    3 2015-01-02
8    3 2015-01-03
11   3 2015-01-04
14   3 2015-01-05
17   3 2015-01-06

OLD more generic answer:

prepare sample DF:

start_date = pd.to_datetime('2015-01-01')

data = """\
id
1
2
2
3
1
2
3
2
1
"""
df = pd.read_csv(io.StringIO(data))

In [200]: df
Out[200]:
   id
0   1
1   2
2   2
3   3
4   1
5   2
6   3
7   2
8   1

Solution:

In [201]: %paste
df['date'] = start_date
df['date'] = df.groupby('id', as_index=False)\
               .transform(lambda x: pd.date_range(start_date,
                                                  freq='1D',
                                                  periods=len(x)))
## -- End pasted text --

In [202]: df
Out[202]:
   id       date
0   1 2015-01-01
1   2 2015-01-01
2   2 2015-01-02
3   3 2015-01-01
4   1 2015-01-02
5   2 2015-01-03
6   3 2015-01-02
7   2 2015-01-04
8   1 2015-01-03

Sorted:

In [203]: df.sort_values(by='id')
Out[203]:
   id       date
0   1 2015-01-01
4   1 2015-01-02
8   1 2015-01-03
1   2 2015-01-01
2   2 2015-01-02
5   2 2015-01-03
7   2 2015-01-04
3   3 2015-01-01
6   3 2015-01-02
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the reply, but the result is not exactly what I want. In your final data frame id '2' got one date (2015-01-04) more assigned, whereas id '3' has one date missing. And is there a way without having the original dataframe prepared like you did with several occurences of the id's? I just have one single occurence of the id's in the dataframe. So I'm not able to assign several dates prior grouping the dataframe
@TheDude, so you just want to replicate each ID three times and add three consequent dates to them - correct?
Yes, I got consequent dates (6 different dates in total), which should be assigned to every ID (~50.000 unique values).
@TheDude, so you want to have 6*50.000 = 300.000 rows at the end?
1

A rather straightforward numpy approach, making use of repeat and tile:

import numpy as np
import pandas as pd

N     = 3  # arbitrary number of IDs/dates
ID    = np.arange(N) + 1
dates = pd.date_range('20160101', periods=N)

df = pd.DataFrame({'ID'    : np.repeat(ID, N),
                   'dates' : np.tile(dates, N)})

Resulting DataFrame:

In [1]: df
Out[1]:
   ID      dates
0   1 2016-01-01
1   1 2016-01-02
2   1 2016-01-03
3   2 2016-01-01
4   2 2016-01-02
5   2 2016-01-03
6   3 2016-01-01
7   3 2016-01-02
8   3 2016-01-03

Update

Assuming you already have a DataFrame of IDs, as pointed out by MaxU, you can tile the IDs

df = pd.DataFrame({'ID'    : np.tile(df['ID'], N),
                   'dates' : np.tile(dates, N)})
# now df needs sorting
df = df.sort_values(by=['ID', 'dates'])

Resulting DataFrame:

In [5]: df
Out[5]:
   ID      dates
0   1 2016-01-01
3   1 2016-01-01
6   1 2016-01-01
1   2 2016-01-02
4   2 2016-01-02
7   2 2016-01-02
2   3 2016-01-03
5   3 2016-01-03
8   3 2016-01-03

1 Comment

i guess, your solution will be much faster than mine:). I would also use np.tile(df['id'],N) instead of np.repeat(ID, N), because OP already has a DF, containing IDs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.