0

I have a dataframe (adjusted for simplicity) as follows:

                Location Code  Technology  Latitude  Longitude  ... Frequency
0                    ABLSERVP      Type A      11.1       11.1  ...       850
2                    ABLSERVP      Type A      11.1       11.1  ...       700
4                    ABLSERVP      Type B      11.1       11.1  ...       850
...                       ...         ...       ...        ...  ...       ...
1300                    CSEY3      Type A      22.2       22.2  ...      2100
1301                    CSEY3      Type A      22.2       22.2  ...       700
...                       ...         ...       ...        ...  ...       ...
265064                  CSEY1      Type A      33.3       33.3  ...       750
265065                  CSEY3      Type B      22.2       22.2  ...       850

What I'm trying to achieve:

                Location Code  Technologies  Latitude  Longitude  ...  Type A's  Type B's  ...  
0                    ABLSERVP      Type A,B      11.1       11.1  ...   700,850       850  ...
...                       ...         ...         ...       ...        
265064                  CSEY1        Type A      33.3       33.3  ...       750       n/a  ...
265065                  CSEY3      Type A,B      22.2       22.2  ...  700,2100       850  ...

Since I have multiple columns and rows, I included the ellipses to represent. Is there anyway to do this without having to loop through the entire dataframe (I've read that this is inefficient and is one of the LAST resort).

My attempt: I first sorted based on location code as follows:

x=x.sort_values(by='Location Code')

I thought I could get the required result by doing: df = x.groupby(['Location Code', 'Technology']).sum()

This obviously doesn't work as it sums the frequencies instead of listing them. Any help?

1
  • 1
    Since I don't want you guys to type everything out, I created a replica of the dataframe: # creating lists l1 =["ABLSERVP", "ABLSERVP", "ABLSERVP", "CSEY3", "CSEY3", "CSEY1", "CSEY3"] l2 =["Type A", "Type A", "Type B", "Type A", "Type A", "Type A", "Type B"] l3 =[850, 700, 850, 2100, 700,750,850] # creating the DataFrame x = pd.DataFrame(list(zip(l1, l2, l3))) x.columns =['Location Code', 'Technology', 'Frequency'] x=x.sort_values(by='Location Code') Commented Nov 18, 2021 at 20:06

1 Answer 1

1

Try with groupby, pivot and join:

tech = x.groupby(["Location Code", "Latitude", "Longitude"])["Technology"].agg(lambda x: ", ".join(x.unique().tolist()))
pivoted = (x.pivot_table(index=["Location Code", "Latitude", "Longitude"], 
                         columns="Technology", 
                         values="Frequency", 
                         aggfunc=lambda x: ", ".join(x.astype(str)))
           )
output = tech.to_frame().join(pivoted)

>>> output
                                      Technology     Type A Type B
Location Code Latitude Longitude                                  
ABLSERVP      11.1     11.1       Type A, Type B   850, 700    850
CSEY1         33.3     33.3               Type A        750    NaN
CSEY3         22.2     22.2       Type A, Type B  2100, 700    850
Sign up to request clarification or add additional context in comments.

3 Comments

The answer should work on a DataFrame of any size as long as the structure is the same :)
I didn't realize this but the extra columns that I forgot to include (for example latitude and longitude which are now updated on the original question). My apologies for it btw. They are not being included in the output. Is there any way to include these columns too?
@tareenmj - Add all the extra columns to the groupby and index of the pivot. See the edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.