Eliminating rows with a specific value in a column using Python

Question

How could I delete the rows which have '0' as a value on 5th column? Or even better, Can we choose the range (ie. remove the rows which have values between -50 and 30 on 5th column)?

data looks like this:

 0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
 0  4028.50  3455014.50    -5.86  0        0.0003   0.39
 0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
 0  8828.62  4543414.50    -3.05  0        0.0021   0.61
 0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
 0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
 0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25
 0  8828.62  4543414.50    -3.05  0        0.0021   0.61

operator.itemgetter(4)... then compare it.

JBernardo
– JBernardo

2011-08-09 01:15:37 +00:00
Commented Aug 9, 2011 at 1:15 — JBernardo
– JBernardo, Commented Aug 9, 2011 at 1:15
@Chad: Did you get this working yet?

johnsyweb
– johnsyweb

2011-08-11 22:41:01 +00:00
Commented Aug 11, 2011 at 22:41 — johnsyweb
– johnsyweb, Commented Aug 11, 2011 at 22:41

agf · Accepted Answer · 2011-08-09 15:45:33Z

4

goodrows = [row for row in data if row.split()[4] != '0']

or

goodrows = [row for row in data if not (-50 <= float(row.split()[4]) <= 30)]

Edit:

If your data is actually in a NumPy array, which your comment seems to indicate even if your post didn't:

goodrows = [row for row in data if row[4] != 0]

or

goodrows = [row for row in data if not (-50 <= row[4] <= 30)]

should work. There is definitely a NumPy internal way to do this though.

edited Aug 9, 2011 at 15:45

answered Aug 9, 2011 at 0:30

agf

178k45 gold badges300 silver badges241 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

johnsyweb Over a year ago

I've just tested this to see if they are identical: they're not. int(row.split()[4]) raises when it encounters -117.00. That may explain the -1...

agf Over a year ago

@Johnsyweb absolutely right, good catch. +1 to your answer. Note: I was not one of the downvoters.

Chad Over a year ago

I get 'AttributeError: 'numpy.ndarray' object has no attribute 'split'' error with this one too.

agf Over a year ago

Ok, if it's already in an array, not in a list of strings in a file, just do row[4]. See my edit. Next time, make sure to say in your question if the data is in a numPy array. We all assumed it was in a file in the format you posted.

HYRY · Accepted Answer · 2011-08-09 01:12:16Z

2

you can use numpy to do this quickly:

data="""
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  0        0.0003   0.39
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61
"""
from StringIO import StringIO
import numpy as np
d = np.loadtxt(StringIO(data)) # load the text in to a 2d numpy array

print d[d[:,4]!=0]  # choose column 5 != 0
print d[(d[:,4]>=50)|(d[:,4]<=-30)] # choose column 5 >=50 or <=-30

answered Aug 9, 2011 at 1:12

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

3 Comments

JBernardo Over a year ago

I don't know if numpy is the right tool as it's not on std library... A list comprehension seems better

Chad Over a year ago

I got this error: File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 796, in loadtxt items = [conv(val) for (conv, val) in zip(converters, vals)] ValueError: could not convert string to float: [[

HYRY Over a year ago

the program above can only convert numbers split by space. From the error message, it seems that you are trying some other data format.

johnsyweb · Accepted Answer · 2011-08-09 22:16:07Z

1

Assuming your data is in a plain text file like this:

$ cat data.txt 
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  0        0.0003   0.39
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61

And you are not using any external libraries. The following will read the data into a list of strings, omiting the undesirable lines. You can feed these lines into any other function you choose. I call print merely to demonstrate. N.B: The fifth column has index '4', since list indices are zero-based.

$ cat data.py 
#!/usr/bin/env python

print "1. Delete the rows which have '0' as a value on 5th column:"

def zero_in_fifth(row):
    return row.split()[4] == '0'

required_rows = [row for row in open('./data.txt') if not zero_in_fifth(row)]
print ''.join(required_rows)

print '2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column):'

def should_ignore(row):
    return -50 <= float(row.split()[4]) <= 30

required_rows = [row for row in open('./data.txt') if not should_ignore(row)]
print ''.join(required_rows)

When you run this you will get:

$ python data.py 
1. Delete the rows which have '0' as a value on 5th column:
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25

2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column):
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25

edited Aug 9, 2011 at 22:16

answered Aug 9, 2011 at 0:46

johnsyweb

143k26 gold badges197 silver badges253 bronze badges

12 Comments

agf Over a year ago

Don't you think lambdas are overkill for this?

JBernardo Over a year ago

What's the point of naming a lamda function? That's just wrong. Just use the def keyword.

johnsyweb Over a year ago

@JBernardo: A named function would probably be better, you're right. I just extracted the lambda from the generator expression to reduce the line-length.

user780363 Over a year ago

As said above, that's not the place to use lambdas. Wrong in many levels. Try reading that...

Chad Over a year ago

@Johnsyweb: I loaded the data from a text file via pylab.loadtxt and try your code but I got the same error with the same line. what am I missing here?

|

Collectives™ on Stack Overflow

Eliminating rows with a specific value in a column using Python

3 Answers 3

4 Comments

3 Comments

12 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

3 Comments

12 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related