2

I have the following dataset

Chr     Position       Name      AD                                 
1       866511          A       13,21
1       881627          A       28,33
2       1599812         B       67,25 

I need to split the column AD into three columns [REF, ALT1, ALT2]. When for every row the AD has only two values I still need the ALT2 column filled in with NaN value.

The following code works if AD contains rows with three values

df['REF'], df['ALT1'], df['ALT2'] = df['AD'].str.split(',', 2).str

However, in some cases for each row, the dataset contains only two values in column AD and when I run the same line I get the following error message:

ValueError: not enough values to unpack (expected 3, got 2)

In this case, I would like to still have the third column ALT2 and fill it in with NaN values. Any suggestion? Thank you, anyone, who is willing to help.

1
  • If you know that you will only have either 3 or 2, you could use an if statement (or a try catch) to go through the two options Commented Jun 25, 2019 at 16:30

3 Answers 3

2

add an extra ','

df['REF'], df['ALT1'], df['ALT2'] = zip(*df.AD.add(',').str.split(',').str[:3])

df

   Chr  Position Name        AD REF ALT1 ALT2
0    1    866511    A     13,21  13   21     
1    1    881627    A  28,33,31  28   33   31
2    2   1599812    B     67,25  67   25     

Or without altering df

df.assign(**dict(zip('REF ALT1 ALT2'.split(), zip(*df.AD.add(',').str.split(',').str[:3]))))

   Chr  Position Name        AD REF ALT1 ALT2
0    1    866511    A     13,21  13   21     
1    1    881627    A  28,33,31  28   33   31
2    2   1599812    B     67,25  67   25     
Sign up to request clarification or add additional context in comments.

1 Comment

piRSquared thank you so much, you fixed it so quickly and it worked great for me. I've only added print(df.replace(r'^\s*$', np.nan, regex=True)) to fill in the empty spaces with NaN.
1

you can set the parameter expand to True and then do the job with:

df['REF'], df['ALT1'], df['ALT2'] = df.AD.str.split(',', 2, expand=True).values.T

I added a row with 3 values in the column AD with df.loc[3,:] = [3,5432,'C', '32,45,65'] and you get:

   Chr   Position Name        AD REF ALT1  ALT2
0  1.0   866511.0    A     13,21  13   21  None
1  1.0   881627.0    A     28,33  28   33  None
2  2.0  1599812.0    B     67,25  67   25  None
3  3.0     5432.0    C  32,45,65  32   45    65

Comments

0

You can do rename and concat:

df = pd.concat((df, df['AD'].str.split(',', expand=True)
                            .rename(columns={0:'REF',1:'ALT1',2:'ALT2'})
               ), axis=1)

Output:

   Chr  Position Name     AD REF ALT1
0    1    866511    A  13,21  13   21
1    1    881627    A  28,33  28   33
2    2   1599812    B  67,25  67   25

1 Comment

Quang Hoang I would still need the ALT2 column. PiRSquared worked really well. Thank you all!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.