4

I have a dataframe

        IDs            Types
0      1001            {251}
1      1013       {251, 101}
2      1004       {251, 701}
3      3011           {251}
4      1014            {701}
5      1114            {251}
6      1015            {251}

where df['Types'] has sets in each row. I want to convert this column into multiple columns such that I can get the following output

        IDs    Type1   Type2  
0      1001     251      -
1      1013     251     101
2      1004     251     701
3      3011     251      -
4      1014     701      -     
5      1114     251      -
6      1015     251      -

Currently, I am using the following code to achieve this

pd.concat([df['Types'].apply(pd.Series), df['IDs']], axis = 1)

But it return the following error

  Traceback (most recent call last):
  File "C:/Users/PycharmProjects/test/test.py", line 48, in <module>
    df = pd.concat([df['Types'].apply(pd.Series), df['IDs']], axis = 1)
  File "C:\Python\Python35\lib\site-packages\pandas\core\series.py", line 2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
  File "C:\Python\Python35\lib\site-packages\pandas\core\series.py", line 223, in __init__
    "".format(data.__class__.__name__))
TypeError: 'set' type is unordered

Please guide me how can I get the desired output. Thanks

0

4 Answers 4

2

I think you need DataFrame constructor first, then rename columns and last fillna.

But if use fillna with some string, it can be problem, because get mixed numeric with strings(-) data and some pandas functions can be broken.

df1 = pd.DataFrame(df['Types'].values.tolist()) \
        .rename(columns = lambda x: 'Type{}'.format(x+1)) \
        .fillna('-')
print (df1)
   Type1 Type2
0    251     -
1    251   101
2    251   701

df2 = pd.concat([df['IDs'], df1], axis = 1)
print (df2)
    IDs  Type1 Type2
0  1001    251     -
1  1013    251   101
2  1004    251   701

Another slowier solution:

df1 = df['Types'].apply(lambda x: pd.Series(list(x))) \
                 .rename(columns =lambda x: 'Type{}'.format(x+1)) \
                 .fillna('-')

df2 = pd.concat([df['IDs'], df1], axis = 1)
print (df2)
    IDs  Type1 Type2
0  1001  251.0     -
1  1013  251.0   101
2  1004  251.0   701
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I was thinking why I need to convert the set into list ?
I am not sure, but this solution is faster as .apply(Series), but .apply(lambda x: pd.Series(list(x))) can works.
2

This should work:

temp = pd.DataFrame(df.Types.values.tolist()).add_prefix('Types_').fillna('-').rename(columns={'Types_0':'Type1','Types_1':'Type2'})

df = pd.concat([df.drop('Types',axis=1), temp], axis=1)

    IDs  Types_0  Types_1
0  1001      251      NaN
1  1013      251    101.0
2  1001      251    701.0

Edit: I missed the ('-') for missing values, should be good now.

Edit2: Column names as @jezrael pointed out

3 Comments

I think your output is a bit different as OP want, please check it.
I think Types_0 Types_1
You are correct. I would simply use a rename convention, I'll change mine but your answer already provides this :thumbs up:
0

Another approach:

df['Type1'] = df['Types'].apply(lambda x: list(x)[0])
df['Type2'] = df['Types'].apply(lambda x: list(x)[1] if len(x) > 1 else '-')

Comments

0

One liner (very similar to @DmitryPolonskiy's solution):

In [96]: df.join(pd.DataFrame(df.pop('Types').values.tolist(), index=df.index)
                   .add_prefix('Type_')) \
           .fillna('-')
Out[96]:
    IDs  Type_0 Type_1
0  1001     251      -
1  1013     251    101
2  1004     251    701

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.