1

How could I delete the rows which have '0' as a value on 5th column? Or even better, Can we choose the range (ie. remove the rows which have values between -50 and 30 on 5th column)?

data looks like this:

 0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
 0  4028.50  3455014.50    -5.86  0        0.0003   0.39
 0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
 0  8828.62  4543414.50    -3.05  0        0.0021   0.61
 0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
 0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
 0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25
 0  8828.62  4543414.50    -3.05  0        0.0021   0.61
2
  • operator.itemgetter(4)... then compare it. Commented Aug 9, 2011 at 1:15
  • @Chad: Did you get this working yet? Commented Aug 11, 2011 at 22:41

3 Answers 3

4
goodrows = [row for row in data if row.split()[4] != '0']

or

goodrows = [row for row in data if not (-50 <= float(row.split()[4]) <= 30)]

Edit:

If your data is actually in a NumPy array, which your comment seems to indicate even if your post didn't:

goodrows = [row for row in data if row[4] != 0]

or

goodrows = [row for row in data if not (-50 <= row[4] <= 30)]

should work. There is definitely a NumPy internal way to do this though.

Sign up to request clarification or add additional context in comments.

4 Comments

I've just tested this to see if they are identical: they're not. int(row.split()[4]) raises when it encounters -117.00. That may explain the -1...
@Johnsyweb absolutely right, good catch. +1 to your answer. Note: I was not one of the downvoters.
I get 'AttributeError: 'numpy.ndarray' object has no attribute 'split'' error with this one too.
Ok, if it's already in an array, not in a list of strings in a file, just do row[4]. See my edit. Next time, make sure to say in your question if the data is in a numPy array. We all assumed it was in a file in the format you posted.
2

you can use numpy to do this quickly:

data="""
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  0        0.0003   0.39
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61
"""
from StringIO import StringIO
import numpy as np
d = np.loadtxt(StringIO(data)) # load the text in to a 2d numpy array

print d[d[:,4]!=0]  # choose column 5 != 0
print d[(d[:,4]>=50)|(d[:,4]<=-30)] # choose column 5 >=50 or <=-30

3 Comments

I don't know if numpy is the right tool as it's not on std library... A list comprehension seems better
I got this error: File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 796, in loadtxt items = [conv(val) for (conv, val) in zip(converters, vals)] ValueError: could not convert string to float: [[
the program above can only convert numbers split by space. From the error message, it seems that you are trying some other data format.
1

Assuming your data is in a plain text file like this:

$ cat data.txt 
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  0        0.0003   0.39
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25
0  8828.62  4543414.50    -3.05  0        0.0021   0.61

And you are not using any external libraries. The following will read the data into a list of strings, omiting the undesirable lines. You can feed these lines into any other function you choose. I call print merely to demonstrate. N.B: The fifth column has index '4', since list indices are zero-based.

$ cat data.py 
#!/usr/bin/env python

print "1. Delete the rows which have '0' as a value on 5th column:"

def zero_in_fifth(row):
    return row.split()[4] == '0'

required_rows = [row for row in open('./data.txt') if not zero_in_fifth(row)]
print ''.join(required_rows)

print '2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column):'

def should_ignore(row):
    return -50 <= float(row.split()[4]) <= 30

required_rows = [row for row in open('./data.txt') if not should_ignore(row)]
print ''.join(required_rows)

When you run this you will get:

$ python data.py 
1. Delete the rows which have '0' as a value on 5th column:
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  4028.50  3455014.50    -5.86  -11.00   0.0003   0.39
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25

2. Choose the range (i.e. remove the rows which have values between -50 and 30 on 5th column):
0  4028.44  4544434.50    -6.76  -117.00  0.0002   0.12
0  7028.56  4523434.50    -4.95  -137.00  0.0005   0.25
0  4028.44  4544434.50    -6.76  -107.00  0.0002   0.12
0  7028.56  4523434.50    -4.95  -127.00  0.0005   0.25

12 Comments

Don't you think lambdas are overkill for this?
What's the point of naming a lamda function? That's just wrong. Just use the def keyword.
@JBernardo: A named function would probably be better, you're right. I just extracted the lambda from the generator expression to reduce the line-length.
As said above, that's not the place to use lambdas. Wrong in many levels. Try reading that...
@Johnsyweb: I loaded the data from a text file via pylab.loadtxt and try your code but I got the same error with the same line. what am I missing here?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.