Getting columns by list of substring values

Question

I have dataframe which is mentioned below, i have large data wanted to create diffrent data frame from substring values of column

df

 ID     ex_srr123  ex2_srr124  ex3_srr125  ex4_srr1234  ex23_srr5323
 san      12           43          0           34           0
 mat      53           0           34          76          656
 jon      82           223         23          32          21
 jack      0           12          2            0           0

i have a list of substring of column

coln1=['srr123', 'srr124']
coln2=['srr1234','srr5323']

I wanted

df2=

ID     ex_srr123  ex2_srr12
san      12           43 
mat      53           0
jon      82           223 
jack      0           12

I tried

df2=df[coln1]

i didn't get what i wanted please help me how can i get desire output

Hello thnks for your rply @QuangHoang No its not index columns — hemant c naik
– hemant c naik, Commented May 15, 2020 at 19:57
When you mean you are not getting what you want, does it mean that the code provided only works for the example but not in the whole dataset or it does not even work in this example? — Onyambu
– Onyambu, Commented May 15, 2020 at 20:01

Kurt Kline · Accepted Answer · 2020-05-15 19:42:38Z

1

Statically

df2 = df.filter(regex="srr123$|srr124$").copy()

Dynamically

coln1 = ['srr123', 'srr124']
df2 = df.filter(regex=f"{coln1[0]}$|{coln1[1]}$").copy()

The $ signifies the end of the string, so that the column ex4_srr1234 isn't also included in your result.

edited May 15, 2020 at 19:42

answered May 15, 2020 at 19:32

Kurt Kline

2,1291 gold badge15 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Onyambu Over a year ago

you should consider using a loop you do not know how many columns are chosen: eg df.filter(regex = '|'.join([col+ '$' for col in coln1]))

hemant c naik Over a year ago

Hello thanks @Onyambu I tried with loop but not getting result what i wanted

Francisco · Accepted Answer · 2020-05-15 19:31:54Z

0

Look into the filter method

df.filter(regex="srr123|srr124").copy()

edited May 15, 2020 at 19:31

answered May 15, 2020 at 19:22

Francisco

5237 silver badges20 bronze badges

Comments

Michael Hodel · Accepted Answer · 2020-05-15 19:50:12Z

I am making a few assumptions:

'ID' is a column and not the index.
The third column in df2 should read 'ex2_srr124' instead of 'ex2_srr12'.
You do not want to include columns of 'df' in 'df2' if the substring does not match everything after the underscore (since 'srr123' is a substring of 'ex4_srr1234' but you did not include it in 'df2').

# set the provided data frames
df = pd.DataFrame([['san', 12, 43, 0, 34, 0],
                   ['mat', 53, 0, 34, 76, 656],
                   ['jon', 82, 223, 23, 32, 21],
                   ['jack', 0, 12, 2, 0, 0]],
                  columns = ['ID', 'ex_srr123', 'ex2_srr124', 'ex3_srr125', 'ex4_srr1234', 'ex23_srr5323'])

# set the list of column-substrings
coln1=['srr123', 'srr124']
coln2=['srr1234','srr5323']

I suggest to solve this as follows:

# create df2 and add the ID column
df2 = pd.DataFrame()
df2['ID'] = df['ID']

# iterate over each substring in a list of column-substrings
for substring in coln1:

    # iterate over each column name in the df columns
    for column_name in df.columns.values:

        # check if column name ends with substring
        if substring == column_name[-len(substring):]:

            # assign the new column to df2
            df2[column_name] = df[column_name]

This yields the desired dataframe df2:

    ID      ex_srr123   ex2_srr124
0   san     12          43
1   mat     53          0
2   jon     82          223
3   jack    0           12

Onyambu · Accepted Answer · 2020-05-15 20:04:09Z

0

df.filter(regex = '|'.join(['ID'] + [col+ '$' for col in coln1])).copy()

     ID  ex_srr123  ex2_srr124
0   san         12          43
1   mat         53           0
2   jon         82         223
3  jack          0          12

answered May 15, 2020 at 20:04

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

Getting columns by list of substring values

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related