Tricky duplicate rows based on condition and add a counter in Python

Question

I have a dataframe where if a certain condition is met, I'd like to essentially create a duplicate of that row. Row should be duplicated IF 'Date' = Q4.22 or > AND type = 'live' Also, for every duplicate created the 'unit' count should be updated to reflect this (grouped by id and Date) Once the duplicate is established, the unit count should reflect the new count based on the same id and Date.

Data

   id   Date    set     type    unit    energy
   bb   Q4.22   l       live    l01     20
   bb   Q4.22   l       live    l02     20
   ba   Q3.22   l       non     l01     20
   aa   Q4.22   l       non     l01     20
   aa   Q4.22   l       live    l01     20
   cc   Q3.22   l       non     l01     20
   aa   Q4.22   l       live    l02     20

Desired

   id   Date    set     type    unit    energy
   bb   Q4.22   l       live    l01     20
   bb   Q4.22   l       live    l02     20
   bb   Q4.22   l       live    l03     20
   bb   Q4.22   l       live    l04     20
   ba   Q3.22   l       non     l01     20
   aa   Q4.22   l       non     l01     20
   aa   Q4.22   l       live    l01     20
   aa   Q4.22   l       live    l02     20
   cc   Q3.22   l       non     l01     20
   aa   Q4.22   l       live    l03     20
   aa   Q4.22   l       live    l04     20

Doing

pd.concat([df, df.loc[(df['Date'] == > 'Q4.22') & (df['live'] == 'live')]])

However, I still need to add counter to the new duplicates that are created. Any suggestion is appreciated.

In order to create the duplicate the following condition should be met. The Data column should be Q4.22 or later and the type column should contain 'live' string — Lynn
– Lynn, Commented May 11, 2022 at 20:28
how is 'unit' obtained in the first place and why is it a string f'l{i:02d}'? — Pierre D
– Pierre D, Commented May 11, 2022 at 20:51
@PierreD it is combination of the set 'l' the type and then the count for a given id and quarter (updated to lower case l) — Lynn
– Lynn, Commented May 11, 2022 at 20:52
there are several problems with this. I would have left unit to be a numerical value. Likewise, I would have left the Date to be a real date, or a Period. The way you select ... > 'Q4.22' does not generalize, because it means "lexicographically greater than or equal to". Imagine your comparison period was Q1 2022 instead. 'Q3.1900' >= 'Q1.2022', but Period('1900Q3') < Period('2022Q1'). — Pierre D
– Pierre D, Commented May 11, 2022 at 21:03

not_speshal · Accepted Answer · 2022-05-11 21:33:58Z

Try:

Convert your date column to timestamps
concat your original data with the filtered data
groupby to get the cumcount of "id" and "Date" and set the "unit" accordingly

df["Date"] = pd.to_datetime(df["Date"].str.replace(r"(Q\d).(\d+)", r"\2-\1",regex=True))

output = pd.concat([df, df[df["Date"].ge(pd.Timestamp("2022-10-01"))&df["type"].eq("live")]], ignore_index=True)
output["unit"] = output["set"]+output.groupby(["id", "Date"]).cumcount().add(1).astype(str).str.zfill(2)
output = output.sort_values("id", ignore_index=True)

#convert Date back to original format if needed
output["Date"] = output["Date"].dt.to_period("Q").astype(str).str.replace(r"\d\d(\d+)(Q\d)",r"\2.\1",regex=True)

>>> output
    id   Date set  type unit  energy
0   aa  Q4.22   l   non  l01      20
1   aa  Q4.22   l  live  l02      20
2   aa  Q4.22   l  live  l03      20
3   aa  Q4.22   l  live  l04      20
4   aa  Q4.22   l  live  l05      20
5   ba  Q3.22   l   non  l01      20
6   bb  Q4.22   l  live  l01      20
7   bb  Q4.22   l  live  l02      20
8   bb  Q4.22   l  live  l03      20
9   bb  Q4.22   l  live  l04      20
10  cc  Q3.22   l   non  l01      20

Pierre D · Accepted Answer · 2022-05-11 21:30:20Z

First, as noted in the comments, we need to convert some of the df columns into more convenient types:

int for unit (stripping any chars),
pd.Period for the Date.

df2 = df.assign(
    unit=df['unit'].str.extract(r'(\d+)').astype(int),
    period=df['Date'].str.replace(r'^(Q\d)\D*(\d+)$', r'\2\1', regex=True).apply(pd.Period)
)

>>> df2
   id   Date set  type  unit  energy  period
0  bb  Q4.22   l  live     1      20  2022Q4
1  bb  Q4.22   l  live     2      20  2022Q4
2  ba  Q3.22   l   non     1      20  2022Q3
3  aa  Q4.22   l   non     1      20  2022Q4
4  aa  Q4.22   l  live     1      20  2022Q4
5  cc  Q3.22   l   non     1      20  2022Q3
6  aa  Q4.22   l  live     2      20  2022Q4

>>> df2.dtypes
id               object
Date             object
set              object
type             object
unit              int64
energy            int64
period    period[Q-DEC]
dtype: object

With this done, now we can proceed with the logic of the question itself.

ix_repeat = (df2['period'] >= pd.Period('2022-Q4')) & (df2['type'] == 'live')
r = df2.loc[ix_repeat]
r.assign(unit=r['unit'] + r.groupby(['id', 'period'])['unit'].transform(max))

>>> r
   id   Date set  type  unit  energy  period
0  bb  Q4.22   l  live     3      20  2022Q4
1  bb  Q4.22   l  live     4      20  2022Q4
4  aa  Q4.22   l  live     3      20  2022Q4
6  aa  Q4.22   l  live     4      20  2022Q4

# finally
df2 = pd.concat([df2, r])

Optional: bringing back unit into its weird string version:

df2 = df2.assign(unit=df2['set'] + df2['unit'].astype(str).str.zfill(2))

>>> df2
   id   Date set  type unit  energy  period
0  bb  Q4.22   l  live  l01      20  2022Q4
1  bb  Q4.22   l  live  l02      20  2022Q4
2  ba  Q3.22   l   non  l01      20  2022Q3
3  aa  Q4.22   l   non  l01      20  2022Q4
4  aa  Q4.22   l  live  l01      20  2022Q4
5  cc  Q3.22   l   non  l01      20  2022Q3
6  aa  Q4.22   l  live  l02      20  2022Q4
0  bb  Q4.22   l  live  l03      20  2022Q4
1  bb  Q4.22   l  live  l04      20  2022Q4
4  aa  Q4.22   l  live  l03      20  2022Q4
6  aa  Q4.22   l  live  l04      20  2022Q4

Collectives™ on Stack Overflow

Tricky duplicate rows based on condition and add a counter in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related