0

I have dataframe which is mentioned below, i have large data wanted to create diffrent data frame from substring values of column

df

 ID     ex_srr123  ex2_srr124  ex3_srr125  ex4_srr1234  ex23_srr5323
 san      12           43          0           34           0
 mat      53           0           34          76          656
 jon      82           223         23          32          21
 jack      0           12          2            0           0

i have a list of substring of column

coln1=['srr123', 'srr124']
coln2=['srr1234','srr5323']

I wanted

df2=

ID     ex_srr123  ex2_srr12
san      12           43 
mat      53           0
jon      82           223 
jack      0           12 

I tried

df2=df[coln1]   

i didn't get what i wanted please help me how can i get desire output

3
  • is ID the index column? Commented May 15, 2020 at 19:19
  • Hello thnks for your rply @QuangHoang No its not index columns Commented May 15, 2020 at 19:57
  • When you mean you are not getting what you want, does it mean that the code provided only works for the example but not in the whole dataset or it does not even work in this example? Commented May 15, 2020 at 20:01

4 Answers 4

1

Statically

df2 = df.filter(regex="srr123$|srr124$").copy()

Dynamically

coln1 = ['srr123', 'srr124']
df2 = df.filter(regex=f"{coln1[0]}$|{coln1[1]}$").copy()

The $ signifies the end of the string, so that the column ex4_srr1234 isn't also included in your result.

Sign up to request clarification or add additional context in comments.

2 Comments

you should consider using a loop you do not know how many columns are chosen: eg df.filter(regex = '|'.join([col+ '$' for col in coln1]))
Hello thanks @Onyambu I tried with loop but not getting result what i wanted
0

Look into the filter method

df.filter(regex="srr123|srr124").copy()

Comments

0

I am making a few assumptions:

  • 'ID' is a column and not the index.
  • The third column in df2 should read 'ex2_srr124' instead of 'ex2_srr12'.
  • You do not want to include columns of 'df' in 'df2' if the substring does not match everything after the underscore (since 'srr123' is a substring of 'ex4_srr1234' but you did not include it in 'df2').
# set the provided data frames
df = pd.DataFrame([['san', 12, 43, 0, 34, 0],
                   ['mat', 53, 0, 34, 76, 656],
                   ['jon', 82, 223, 23, 32, 21],
                   ['jack', 0, 12, 2, 0, 0]],
                  columns = ['ID', 'ex_srr123', 'ex2_srr124', 'ex3_srr125', 'ex4_srr1234', 'ex23_srr5323'])

# set the list of column-substrings
coln1=['srr123', 'srr124']
coln2=['srr1234','srr5323']

I suggest to solve this as follows:

# create df2 and add the ID column
df2 = pd.DataFrame()
df2['ID'] = df['ID']

# iterate over each substring in a list of column-substrings
for substring in coln1:

    # iterate over each column name in the df columns
    for column_name in df.columns.values:

        # check if column name ends with substring
        if substring == column_name[-len(substring):]:

            # assign the new column to df2
            df2[column_name] = df[column_name]

This yields the desired dataframe df2:

    ID      ex_srr123   ex2_srr124
0   san     12          43
1   mat     53          0
2   jon     82          223
3   jack    0           12

Comments

0
df.filter(regex = '|'.join(['ID'] + [col+ '$' for col in coln1])).copy()

     ID  ex_srr123  ex2_srr124
0   san         12          43
1   mat         53           0
2   jon         82         223
3  jack          0          12

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.