0

I am new to python and struggling in one trivial task. I have one alphanumeric column known as region. It has both entries beginning with / such as /health/blood pressure etc and integer values. So typically few observations look like:

/health/blood pressure
/health/diabetes
7867
/fitness
9087
/health/type1 diabetes

Now I want to remove all the rows/cases with integer values. So after importing the data set into python shell, it is showing region as object. I intended to solve this problem with a sort of regular expression. So I did the following:

pattern='/'
data.region=Series(data.region)
matches=data.region.str.match(pattern)
matches

Here it gives a boolean object explaining whether each pattern is in the data set or not. So I get something like this:

0  true
1 false
2 true
3 true
.........
so on.

Now I am stuck further how to remove rows of matches boolean object with false tag. If statement is not working. If anyone can offer some sort of assistance, that would be great!!

Thanks!!

1
  • just do data[matches] Commented Jul 14, 2014 at 11:42

1 Answer 1

1

It seems like you are using the pandas framework. So I am not completely sure if this is working:

You can try:

matches = [i for i in data.region if i.str.match(pattern)]

In python this is called a list comprehension that goes through every entry in data.region and checks your pattern and puts it in the list if the pattern is matching (and the expression after 'if' is thus true).

See: https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

If you want to map those for every region you can try to create a dictionary that maps the regions to the lists with the following dict-comprehension:

matches = {region: [i for i in data.region if i.str.match(pattern)] for region in data}

See: https://docs.python.org/2/tutorial/datastructures.html#dictionaries

However you are definitely leaving the realm of the pandas framework. This could eventually fail of regions is not an integer/string but a list itself (as Is aid I don't know pandas enough to judge).

In that case you could try:

matches = {}
for region in list_of_regions:
    matches[region] = [i for i in data.region if i.str.match(pattern)]

which is basically the same just with a given list of region and the dict comprehension made explicit in a for loop.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks!! That worked. I did it in both ways!!. Even as suggested in above comment data[matches] also solved my task. Now additionally, I want to form mapping variable for each region such as: Region 1, region2, region n for each non-integer values of region. Is that feasible using list comprehension as well?
I want the these two columns together. I mean after getting 100 valid value of regions I want to add one more column region_map with values region 1, region2 against the first,second and subsequent entry of the region variable. Would the above code yield both the variables together?
This is getting too complex. Consider opening a new question where you give a sample of how the input and the output should look like.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.