3

I have a large data frame with ~20 years of data. I would like to group this data frame by YEAR, and then add the same set of new X values to each group. I'm having trouble figuring how to use pd.concat with groupby. How can I use pd.concat and df.groupby together?

Below is a subset of my data frame (I deleted a bunch of rows just to show that I have multiple years that I would like to group by.

my data frame: 
     XSNO    YEAR     X     Z
5     LOL001  1978   0.22 -0.44
6     LOL001  1978   0.95 -0.55
7     LOL001  1978   1.70 -1.01
8     LOL001  1978   2.10 -1.22
9     LOL001  1978   2.68 -1.34
10    LOL001  1978   3.27 -1.41
48    LOL001  1978  17.60 -1.86
49    LOL001  1978  18.21 -1.77
50    LOL001  1978  18.41 -1.65
51    LOL001  1978  18.67 -1.54
52    LOL001  1978  19.00 -1.5
68    LOL001  1978  23.60 -0.31
78    LOL001  1980   0.40 -0.56
79    LOL001  1980   1.50 -0.91
80    LOL001  1980   2.50 -1.25
81    LOL001  1980   3.20 -1.43
82    LOL001  1980   3.90 -1.44
83    LOL001  1980   4.50 -1.55
84    LOL001  1980   5.80 -1.22
101   LOL001  1980  21.50 -0.96
102   LOL001  1980  22.50 -0.69
103   LOL001  1980  23.60 -0.43
104   LOL001  1980  25.10 -0.09
107   LOL001  1981   0.30 -0.40
108   LOL001  1981   0.60 -0.56
109   LOL001  1981   2.40 -1.20
110   LOL001  1981   4.40 -1.34
111   LOL001  1981   7.00 -1.10
112   LOL001  1981   8.60 -1.49

What I would like the output to be (just a subset of the added values for one year):
XSNO    YEAR    X      Z
LOL004  1978    0     NaN
LOL003  1978    0.05  NaN
LOL002  1978    0.1   NaN
LOL001  1978    0.15  NaN
LOL000  1978    0.2   NaN
LOL001  1978    0.22  -0.44
LOL002  1978    0.25  NaN
LOL003  1978    0.3   NaN
LOL004  1978    0.35  NaN
LOL005  1978    0.4   NaN
LOL006  1978    0.45  NaN
LOL007  1978    0.5   NaN
LOL008  1978    0.55  NaN
LOL009  1978    0.6   NaN
LOL010  1978    0.65  NaN
LOL011  1978    0.7   NaN
LOL012  1978    0.75  NaN
LOL013  1978    0.8   NaN
LOL014  1978    0.85  NaN
LOL001  1978    0.95  -0.55


max = df.X.max()
x = np.arange(0, max, 0.05)
x = pd.DataFrame({'X': x})

concat_df = df.groupby(['YEAR']).apply(lambda x: x.concat([df1, x]))
# this doesn't work and gives me an error

concat = pd.concat([df1, x])
# this doesn't give me what I want, it just tacks all the 'x' values (new values) on at the end.  

I'm not sure how to use merge/join/concat functions with a grouped pandas data frame. I can't seem to find any other questions/answers on stack that get at what I'm looking for.

2
  • 1
    pd.concat() is like .append() it just adds the new data at the end of the first dataframe Commented Jan 29, 2021 at 19:59
  • 1
    Gotcha. Would a better approach be to create a dataframe with the x values I'm looking for as well as the year values, and use pd.concat() with that dataframe and my original dataframe? Commented Jan 29, 2021 at 20:04

1 Answer 1

1

Not a solution, I'm just not allowed to comment yet.

It should be pd.concat I think. Also, the lambda function in your groupby uses x as parameter, and so hides the x DataFrame. Name them differently, for example:

concat_df = df.groupby(['YEAR']).apply(lambda y: pd.concat([y, x]))
Sign up to request clarification or add additional context in comments.

2 Comments

This doesn't work for me, can you be more specific? Don't I need two data frames to concat? that's why I used pd.concat [df1, x]. I struggle to understand lambda formatting, so I don't understand what the "y" is doing in the pd.concat function. In my mind, it should be something like, lambda y: pd.concat([y.df, y.x]) ?
The lambda function takes as parameters the groups produced by the groupby, so the y-s are DataFrames of only one year. Do you get an error when you run this, or just a different output?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.