1

After importing data from a .csv file, I have some data that looks similar to this (albeit order hundreds of columns and thousands of rows):

         4        5        6      7       8       9        10     11    12      13         14         15   16
0   302255Z  09005KT        1  1/4SM      BR     CLR  M00/M00  A3044   RMK    AO2A     SLP311  T10021002   $;
1   302232Z  08003KT        1    1/4      BR     CLR  M00/M00  A3044   RMK    AO2A     SLP310         $;  NaN
2   302225Z  09005KT        1  1/2SM      BR     CLR  M00/M00  A3044   RMK    AO2A     SLP309         $;  NaN
3   302155Z  08003KT        2  1/2SM      BR     CLR  M00/M00  A3043   RMK    AO2A     SLP306  T10001000   $;
4   302055Z  09004KT      3SM     BR     CLR   00/00    A3042    RMK  AO2A  SLP304  T00020002      56001   $;
5   301955Z  00000KT      3SM     BR     CLR   01/01    A3042    RMK  AO2A  SLP304  T00080008         $;  NaN
6   301855Z  09006KT      3SM     BR  FEW055   01/01    A3042    RMK  AO2A  SLP303  T00110011         $;  NaN
7   301655Z  10004KT        2  1/2SM      BR  FEW050  M00/M00  A3041   RMK    AO2A     SLP301  T10031003   $;
8   301610Z  09004KT        2  1/2SM      BR     CLR    00/00  A3041   RMK    AO2A     SLP301         $;  NaN
9   301555Z     AUTO  08005KT   4800      BR     CLR    01/01  A3041   RMK     AO2     SLP300  T00070007   $;
10  301509Z     AUTO  06003KT   4800      BR     CLR    01/01  A3041   RMK     AO2     SLP300         $;  NaN
11  301449Z     AUTO  10003KT   4000      BR     CLR    01/01  A3041   RMK     AO2     SLP300         $;  NaN
12  301355Z     AUTO  07004KT   6000      BR     CLR    02/02  A3041   RMK     AO2     SLP300  T00230023   $;
13  301255Z     AUTO  07003KT   6000      BR     CLR    02/02  A3041   RMK     AO2     SLP299  T00200020   $;
14  301055Z     AUTO  00000KT   9000      BR     CLR    04/04  A3040   RMK     AO2     SLP298  T00360036   $;

I abandoned trying to shift everything to match up correctly. Instead, I'm trying to create a new column that combines entries from columns 5 and 6 for those values ending in KT. And I'm creating a second new column for those values starting in T.

To start, I attempted pulling out all of the data that satisfied my criterion in rows 5 and 6 like so:

df1=df[df[5].str.contains("KT")].iloc[:,[0,5]]
df2=df[df[6].str.contains("KT")].iloc[:,[0,6]]

the .iloc value was an attempt to merge the results together. There has to be a slicker way to get this formatted. Any thoughts?

If helpful, here's a more simple data set:

row1=['a','b','c1K','d','e','foo','foo','f1111T','g','$']
row2=['a','b','foo','c2K','d','e','f4321T','g','$','$']
row3=['a','b','c3K','d','e','f1234T','g','$']
df=ps.DataFrame(zip(row1,row2,row3)).T
df1=df[df[2].str.contains("K")].iloc[:,[0,2]]
df2=df[df[3].str.contains("K")].iloc[:,[0,3]]

trying ps.concat([df1,df2],axis=0,join='outer') doesn't give what I'd like, it gives

   0    2    3
0  a  c1K  NaN
2  a  c3K  NaN
1  a  NaN  c2K

something like this would be prettier:

      0   
1  a  c1K  
2  a  c3K 
3  a  c2K

1 Answer 1

1

The following can be done in one pass (using loc as iloc doesn't allow boolean masking):

df1 = df[df[5].str.contains("KT")].iloc[:,[0,5]]

df1 = df.loc[df[5].str.contains("KT"), [0, 5]]

To get the end result, you could either concat these as a Series (to avoid aligning the columns), or change the name of the columns to be more descriptive before concating:

df1.columns = ['letter', 'code']
df2.columns = ['letter', 'code']
pd.concat([df1, df2], axis=0, ignore_index=True)
Sign up to request clarification or add additional context in comments.

1 Comment

This worked well. By using ignore_index=False, I could then add the combined data to the original DataFrame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.