1

I have an application that generates data frames with different numbers of columns and their cells contains two values separated by "|".

gene_1             gene_2             ...
ashb|ESNT00011     wsefsf|ENST0008
adecasd|ENST0001   uibib|ENST0008

How can I iterate over columns and split values into two columns called gene_1_name and gene_1_ID

gene_1_name    gene_1_ID           gene_2_name     gene_2_ID         ...
ashb           ESNT00011           wsefsf          ENST0008
adecasd        ENST0001            uibib           ENST0008

1 Answer 1

3

Use stack and unstack:

result = (
    df.stack().str.split('|', expand=True)     # split the strings
        .rename(columns={0: 'name', 1: 'id'})  # rename the columns
        .unstack()                             # unstack
)

# Merge the two levels
result.columns = [f'{gene}_{col}' for col, gene in result.columns]
Sign up to request clarification or add additional context in comments.

1 Comment

You can also use result.columns.to_flat_index().str.join('_')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.