3

I have a dataframe

name    col1
satya    12
satya    abc
satya    109.12
alex     apple
alex     1000

So now i need to display the rows where column 'col1' has int value in it.O/p looks like

name    col1
satya    12
alex     1000

if search for string value

name    col1
satya    abc
alex     apple

Like wise..please suggest some code lines(may be using reg).

1
  • Usually col values are the same type in pandas. For your data, it will be stored more likely col1 and col2 with col1 having int and col2 having str with NaN at appropriate location to fill the holes. Commented Apr 3, 2016 at 6:52

5 Answers 5

4

Let's start with a simple regex that will evaluate to True if you have an integer and False otherwise:

import re
regexp = re.compile('^-?[0-9]+$')
bool(regexp.match('1000'))
True
bool(regexp.match('abc'))
False

Once you have such a regex you can proceed as follows:

mask = df['col1'].map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

To search for strings you'll do:

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

EDIT

The above code would work if dataframe were created by:

df = pd.read_clipboard()

(or, alternatively, all variables were supplied as strings).

If the regex approach works depends on how the df was created. E.g., if it were created with:

df = pd.DataFrame({'name': ['satya','satya','satya', 'alex', 'alex'],
                   'col1': [12,'abc',109.12,'apple',1000] },
                   columns=['name','col1'])

the above code would fail with TypeError: expected string or bytes-like object

To make it work in any case, one would need to explicitly coerce type to str:

mask = df['col1'].astype('str').map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

and the same for strings:

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].astype('str').map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

EDIT2

To find a float:

regexp_float = re.compile('^[-\+]?[0-9]*(\.[0-9]+)$')
mask_float = df['col1'].astype('str').map(lambda x: bool(regexp_float.match(x)))
df.loc[mask_float]

    name    col1
2   satya   109.12
Sign up to request clarification or add additional context in comments.

2 Comments

creating mask throwing error, something as "TypeError: expected string or buffer".I am using pandas 0.16.1 , python 3.4.If it has anything to deal with versions please mention.I have imported re module successfully as well.
@Sergey--what will be the regexp for creating mask for float type? Thanks for explanations, helped me a lot.
1

In pandas you would do something like this:

mask = df.col1.apply(lambda x: type(x) == int)
print df[mask]

Which would yield your expected output.

3 Comments

I might disappoint you, but this "would NOT yield your expected output."
@Sergey--can you please explain in which case Primer's code can fail to reproduce expected.(just curious to know).
@Satya Integer entered as a string would not be identified as integer. I happened to generate your df by pd.read_clipboard(). In this case this did not work either. If the suggested solution produces desired output, or not, depends on how the df was created.
1

You can check whether the value contains only digits:

In [104]: df
Out[104]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

Integers:

In [105]: df[~df.col1.str.contains(r'\D')]
Out[105]:
    name  col1
0  satya    12
4   alex  1000

Non-integers:

In [106]: df[df.col1.str.contains(r'\D')]
Out[106]:
    name    col1
1  satya     abc
2  satya  109.12
3   alex   apple

if you want to filter all numeric values (integers/float/decimal) you can use pd.to_numeric(..., errors='coerce'):

In [75]: df
Out[75]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

In [76]: df[pd.to_numeric(df.col1, errors='coerce').notnull()]
Out[76]:
    name    col1
0  satya      12
2  satya  109.12
4   alex    1000

In [77]: df[pd.to_numeric(df.col1, errors='coerce').isnull()]
Out[77]:
    name   col1
1  satya    abc
3   alex  apple

Comments

0
def is_integer(element):
    try:
        int(element) #if this is str then there will be error
        return 1
    except:
        return 0

You can simply define a function as below then list your items with for loop.

def list_str(list_of_data):
    str_list=[]
    for item in list_of_data: #list_of_data = [[names],[col1s]] if just col1s replace item[2] with item[1]
        if not is_integer(item[2]):
            str_list.append(item)
    return str_list

def list_int(list_of_data):
    int_list=[]
    for item in list_of_data:
        if is_integer(item[2]):
            int_list.append(item)
    return int_list

Hope this can help you

Comments

0

You can use df.applymap(np.isreal)

df = pd.DataFrame({'col1': [12,'abc',109.12,'apple',1000], 'name': ['satya','satya','satya', 'alex', 'alex']})
df
col1    name
0   12  satya
1   abc     satya
2   109.12  satya
3   apple   alex
4   1000    alex

df2 = df[df.applymap(np.isreal)]
df2
col1    name
0   12  NaN
1   NaN     NaN
2   109.12  NaN
3   NaN     NaN
4   1000    NaN

df2 = df2[df2.col1.notnull()]
df2
col1    name
0   12  NaN
2   109.12  NaN
4   1000    NaN

index_list = df2.index.tolist()
index_list
[0, 2, 4]

df = df.iloc[index_list]
df
col1    name
0   12  satya
2   109.12  satya
4   1000    alex

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.