replace strings in every column with numbers

Question

This question is an extension of this question. Consider the pandas DataFrame visualized in the table below.

	respondent	brand	engine	country	aware	aware_2	aware_3	age	tesst	set
0	a	volvo	p	swe	1	0	1	23	set	set
1	b	volvo	None	swe	0	0	1	45	set	set
2	c	bmw	p	us	0	0	1	56	test	test
3	d	bmw	p	us	0	1	1	43	test	test
4	e	bmw	d	germany	1	0	1	34	set	set
5	f	audi	d	germany	1	0	1	59	set	set
6	g	volvo	d	swe	1	0	0	65	test	set
7	h	audi	d	swe	1	0	0	78	test	set
8	i	volvo	d	us	1	1	1	32	set	set

To convert a column with String entries, one should do a map and then pandas.replace().

For example:

mapping = {'set': 1, 'test': 2}
df.replace({'set': mapping, 'tesst': mapping})

This would lead to the following DataFrame (table):

	respondent	brand	engine	country	aware	aware_2	aware_3	age	tesst	set
0	a	volvo	p	swe	1	0	1	23	1	1
1	b	volvo	None	swe	0	0	1	45	1	1
2	c	bmw	p	us	0	0	1	56	2	2
3	d	bmw	p	us	0	1	1	43	2	2
4	e	bmw	d	germany	1	0	1	34	1	1
5	f	audi	d	germany	1	0	1	59	1	1
6	g	volvo	d	swe	1	0	0	65	2	1
7	h	audi	d	swe	1	0	0	78	2	1
8	i	volvo	d	us	1	1	1	32	1	1

As seen above, the last two column's strings are replaced with numbers representing these strings.

The question is then: Is there a faster and not so hands-on approach to replace all the strings into a number? Can one automatically create a mapping (and output it somewhere for human reference)?

Something that makes the DataFrame end up like:

	respondent	brand	engine	country	aware	aware_2	aware_3	age	tesst	set
0	1	1	1	1	1	0	1	23	1	1
1	2	1	2	1	0	0	1	45	1	1
2	3	2	1	2	0	0	1	56	2	2
3	4	2	1	2	0	1	1	43	2	2
4	5	2	3	3	1	0	1	34	1	1
5	6	3	3	3	1	0	1	59	1	1
6	7	1	3	1	1	0	0	65	2	1
7	8	3	3	1	1	0	0	78	2	1
8	9	1	3	2	1	1	1	32	1	1

Also output:

[{'volvo': 1, 'bmw': 2, 'audi': 3}, {'p': 1, 'None': 2, 'd': 3}, {'swe': 1, 'us': 2, 'germany': 3}]

Note that the output list of maps (dicts) should not be hard-coded but instead produced by the code.

Netim · Accepted Answer · 2021-05-18 14:24:15Z

0

You can adapte the code given in this response https://stackoverflow.com/a/39989896/15320403 (inside the post you linked) to generate a mapping for each column of your choice and apply replace as you suggested

all_brands = df.brand.unique()
brand_dic = dict(zip(all_brands, range(len(all_brands))))

answered May 18, 2021 at 14:24

Netim

1338 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Shivam Roy · Accepted Answer · 2021-05-18 14:41:39Z

0

You will need to first change the type of the columns to Categorical and then create a new column or overwrite the existing column with codes:

df['brand'] = pd.Categorical(df['brand'])
df['brand_codes'] = df['brand'].cat.codes

If you need the mapping:

dict(enumerate(df['brand'].cat.categories )) #This will work only after you've converted the column to categorical

edited May 18, 2021 at 14:41

answered May 18, 2021 at 14:29

Shivam Roy

2,0713 gold badges12 silver badges25 bronze badges

Comments

IntiM · Accepted Answer · 2021-05-19 07:09:57Z

From the other answers, I've written this function to do solve the problem:

import pandas as pd

def convertStringColumnsToNum(data):
    columns = data.columns
    columns_dtypes = data.dtypes
    maps = []
    
    for col_idx in range(0, len(columns)):
        # don't change columns already comprising of numbers
        if(columns_dtypes[col_idx] == 'int64'): # can be extended to more dtypes
            continue
        # inspired from Shivam Roy's answer 
        col = columns[col_idx]
        tmp = pd.Categorical(data[col])
        data[col] = tmp.codes
        maps.append(tmp.categories)

    return maps

This function returns the mapss used to replace strings with a numeral code. The code is the index in which a string resides inside the list. This function works, yet it comes with the SettingWithCopyWarning.

if it ain't broke don't fix it, right? ;)

*but if anyone has a way to adapt this function so that the warning is no longer shown, feel free to comment on it. Yet it works *shrugs* *

Collectives™ on Stack Overflow

replace strings in every column with numbers

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

	respondent	brand	engine	country	aware	aware_2	aware_3	age	tesst	set
0	1	1	1	1	1	0	1	23	1	1
1	2	1	2	1	0	0	1	45	1	1
2	3	2	1	2	0	0	1	56	2	2
3	4	2	1	2	0	1	1	43	2	2
4	5	2	3	3	1	0	1	34	1	1
5	6	3	3	3	1	0	1	59	1	1
6	7	1	3	1	1	0	0	65	2	1
7	8	3	3	1	1	0	0	78	2	1
8	9	1	3	2	1	1	1	32	1	1

	respondent	brand	engine	country	aware	aware_2	aware_3	age	tesst	set
0	1	1	1	1	1	0	1	23	1	1
1	2	1	2	1	0	0	1	45	1	1
2	3	2	1	2	0	0	1	56	2	2
3	4	2	1	2	0	1	1	43	2	2
4	5	2	3	3	1	0	1	34	1	1
5	6	3	3	3	1	0	1	59	1	1
6	7	1	3	1	1	0	0	65	2	1
7	8	3	3	1	1	0	0	78	2	1
8	9	1	3	2	1	1	1	32	1	1

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

	respondent	brand	engine	country	aware	aware_2	aware_3	age	tesst	set
0	1	1	1	1	1	0	1	23	1	1
1	2	1	2	1	0	0	1	45	1	1
2	3	2	1	2	0	0	1	56	2	2
3	4	2	1	2	0	1	1	43	2	2
4	5	2	3	3	1	0	1	34	1	1
5	6	3	3	3	1	0	1	59	1	1
6	7	1	3	1	1	0	0	65	2	1
7	8	3	3	1	1	0	0	78	2	1
8	9	1	3	2	1	1	1	32	1	1