Removing integer values from a alphanumeric column in python

Question

I am new to python and struggling in one trivial task. I have one alphanumeric column known as region. It has both entries beginning with / such as /health/blood pressure etc and integer values. So typically few observations look like:

/health/blood pressure
/health/diabetes
7867
/fitness
9087
/health/type1 diabetes

Now I want to remove all the rows/cases with integer values. So after importing the data set into python shell, it is showing region as object. I intended to solve this problem with a sort of regular expression. So I did the following:

pattern='/'
data.region=Series(data.region)
matches=data.region.str.match(pattern)
matches

Here it gives a boolean object explaining whether each pattern is in the data set or not. So I get something like this:

0  true
1 false
2 true
3 true
.........
so on.

Now I am stuck further how to remove rows of matches boolean object with false tag. If statement is not working. If anyone can offer some sort of assistance, that would be great!!

Thanks!!

just do data[matches]

behzad.nouri
– behzad.nouri

2014-07-14 11:42:48 +00:00
Commented Jul 14, 2014 at 11:42 — behzad.nouri
– behzad.nouri, Commented Jul 14, 2014 at 11:42

Serbitar · Accepted Answer · 2014-07-15 08:02:30Z

1

It seems like you are using the pandas framework. So I am not completely sure if this is working:

You can try:

matches = [i for i in data.region if i.str.match(pattern)]

In python this is called a list comprehension that goes through every entry in data.region and checks your pattern and puts it in the list if the pattern is matching (and the expression after 'if' is thus true).

See: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

If you want to map those for every region you can try to create a dictionary that maps the regions to the lists with the following dict-comprehension:

matches = {region: [i for i in data.region if i.str.match(pattern)] for region in data}

See: https://docs.python.org/2/tutorial/datastructures.html#dictionaries

However you are definitely leaving the realm of the pandas framework. This could eventually fail of regions is not an integer/string but a list itself (as Is aid I don't know pandas enough to judge).

In that case you could try:

matches = {}
for region in list_of_regions:
    matches[region] = [i for i in data.region if i.str.match(pattern)]

which is basically the same just with a given list of region and the dict comprehension made explicit in a for loop.

edited Jul 15, 2014 at 8:02

answered Jul 14, 2014 at 11:55

Serbitar

2,22422 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user2906657 Over a year ago

Thanks!! That worked. I did it in both ways!!. Even as suggested in above comment data[matches] also solved my task. Now additionally, I want to form mapping variable for each region such as: Region 1, region2, region n for each non-integer values of region. Is that feasible using list comprehension as well?

user2906657 Over a year ago

I want the these two columns together. I mean after getting 100 valid value of regions I want to add one more column region_map with values region 1, region2 against the first,second and subsequent entry of the region variable. Would the above code yield both the variables together?

Serbitar Over a year ago

This is getting too complex. Consider opening a new question where you give a sample of how the input and the output should look like.

Collectives™ on Stack Overflow

Removing integer values from a alphanumeric column in python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related