2

I have a df like this and I want to change the list of value into column

```

    uid   device
0   000 [1.0, 3.0]
1   001 [3.0]
2   003 [nan]
3   004 [2.0, 3.0]
4   005 [1.0]
5   006 [1.0]
6   006 [nan]
7   007 [2.0]
```

should be

```

    uid  device      NA  just_1  just_2or3  Both
0   000 [1.0, 3.0]   0     0         0        1
1   001 [3.0]        0     0         1        0
2   003 [nan]        1     0         0        0
3   004 [2.0, 3.0]   0     0        "1"       0
4   005 [1.0]        0     1         0        0
5   006 [1.0]        0     1         0        0
6   006 [nan]        1     0         0        0
7   007 [2.0]        0     1         1        0
8   008 [1.0, 2.0]   0     0         0        1

```

I want to change to dummy variable, if device only 1.0, set corresponding column value = 1, if 2.0, 3.0, [2.0,3.0],set just_2or3 = 1.

Only if 1.0 in list, like [1.0,3.0],[1.0,2.0],set both = 1

How can I do that? thank you

3 Answers 3

1

You can use custom function f with list comprehensions, last cast boolean values to int by astype:

df = pd.DataFrame({'uid':['000','001','002','003','004','005','006','007'],
                   'device':[[1.0,3.0],[3.0],[np.nan],[2.0,3.0],
                             [1.0],[1.0],[np.nan],[2.0]]})

print (df)
       device  uid
0  [1.0, 3.0]  000
1       [3.0]  001
2       [nan]  002
3  [2.0, 3.0]  003
4       [1.0]  004
5       [1.0]  005
6       [nan]  006
7       [2.0]  007

def f(x):
    #print (x)
    NA = [np.nan in x][0]
    just_1  = [1 in x and not(2 in x or 3 in x)][0]
    both = [1 in x and (2 in x or 3 in x)][0]
    just_2or3 = [1 not in x and (2 in x or 3 in x)][0]
    return pd.Series([NA, just_1, just_2or3, both], 
                     index=['NA','just_1','just_2or3', 'both'])

print (df.set_index('uid').device.apply(f).astype(int).reset_index())
   uid  NA  just_1  just_2or3  both
0  000   0       0          0     1
1  001   0       0          1     0
2  002   1       0          0     0
3  003   0       0          1     0
4  004   0       1          0     0
5  005   0       1          0     0
6  006   1       0          0     0
7  007   0       0          1     0
Sign up to request clarification or add additional context in comments.

1 Comment

nice one, I completely forgot about astype(int)
0

You can create such columns by expressing the conditions as booleans and converting those to int, all wrapped in list comprehension:

df['just_1'] = [int(1 in x and not(2 in x or 3 in x)) for x in df.device]

and

df['both'] = [int(1 in x and (2 in x or 3 in x)) for x in df.device]

and

df['just_2or3'] = [int(1 not in x and (2 in x or 3 in x)) for x in df.device]

and

df['NA'] = [int(np.nan in x) for x in df.device]

and so on.

Comments

0

You can use custom function with pandas.DataFrame.apply and pandas.get_dummies function:

def worker(x):
    ch1 = 1 in x
    ch23 = any(i in x for i in [2,3])
    if ch1 and ch23:
        return 'both'
    elif ch1:
        return 'just_1'
    elif ch23:
        return 'just_2or3'
    else:
        return 'NA'

>>> res = pd.get_dummies(df.device.apply(worker))
>>> res
   NA  both  just_1  just_2or3
0   0     1       0          0
1   0     0       0          1
2   1     0       0          0
3   0     0       0          1
4   0     0       1          0
5   0     0       1          0
6   1     0       0          0
7   0     0       0          1

old answer

def worker(x):
    ch1 = 1 in x
    ch23 = any(i in x for i in [2,3])
    if ch1 and ch23:
        return {'both':1}
    elif ch1:
        return {'just_1':1}
    elif ch23:
        return {'just_2or3':1}
    else:
        return {'NA':1}

>>> res = df.device.apply(worker).apply(pd.Series).fillna(0).astype(int)
>>> res
   NA  both  just_1  just_2or3
0   0     1       0          0
1   0     0       0          1
2   1     0       0          0
3   0     0       0          1
4   0     0       1          0
5   0     0       1          0
6   1     0       0          0
7   0     0       0          1

if you need merged dataset:

>>> pd.concat([df, res], axis=1)
       device  uid  NA  both  just_1  just_2or3
0  [1.0, 3.0]  000   0     1       0          0
1       [3.0]  001   0     0       0          1
2       [nan]  002   1     0       0          0
3  [2.0, 3.0]  003   0     0       0          1
4       [1.0]  004   0     0       1          0
5       [1.0]  005   0     0       1          0
6       [nan]  006   1     0       0          0
7       [2.0]  007   0     0       0          1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.