Selecting specific elements and finding their median using numpy

Question

I have the following three data sets.

2 3
4 5
6 6

5 7
7 4
9 9

1 8
2 3
3 2

Basically, i want to create a column where the elements will be the median of the corresponding elements of the second column. The first elements of the second column for each of the sets are (3,7,8) and median=7, second elements of the second column of the data sets are (5,4,3) and median=4 and third elements of the second column of data sets are (6,9,2) and median =6. So I want my output to be a numpy array like [(7,4,6)].

I tried the following approach:

import numpy as np
filelist=[]
for i in range (1,4):
    filelist.append("/Users/Hrihaan/Desktop/A_%s.txt" %i)
for fname in filelist:
    data=np.loadtxt(fname)
    x=data[:,1]
    for j in range (0,3):
        y=np.median(x[j,1]) # tried this method and thought would get the arrays i want (3,7,8) , (5,4,3) and (6,9,2) and their medians
        print(y)

Received the following error : (IndexError: too many indices for array)

Any suggestion would mean a lot.

Divakar · Accepted Answer · 2017-09-16 20:50:47Z

1

Slice the second columns and use np.median along the appropriate axis -

np.median([a[:,1],b[:,1],c[:,1]],axis=0)

Or wrap as an array, then slice and finally use np.median -

np.median(np.asarray([a,b,c])[...,1], axis=0)

Or use np.median, that will take care of conversion to array under the hoods and then slice -

np.median([a,b,c],axis=0)[:,1]

So, if you have arrays as input, go with the first method for efficiency, otherwise the latter two would work just as well with arrays/lists.

Sample run -

In [10]: a = np.array([[2,3],[4,5],[5,6]])
    ...: b = np.array([[5,7],[7,4],[9,9]])
    ...: c = np.array([[1,8],[2,3],[3,2]])
    ...: 

In [11]: np.median([a[:,1],b[:,1],c[:,1]],axis=0)
Out[11]: array([ 7.,  4.,  6.])

To make it work with the posted code in the question :

# Grab filenames
filelist=[]
for i in range (1,4):
    filelist.append("/Users/Hrihaan/Desktop/A_%s.txt" %i)

# Grab second columns off each
data_list = []
for fname in filelist:
    data=np.loadtxt(fname)
    data_list.append(data[:,1])

desired_output = np.median(data_list,axis=0)

edited Sep 16, 2017 at 20:50

answered Sep 16, 2017 at 19:39

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

Hrihaan Over a year ago

Would you please suggest what should be the approach if I have to do the same thing but say for example 100 rows except only 3 like in this case. It would be difficult to define numpy arrays as a, b and c as you instructed.

Divakar Over a year ago

@Hrihaan Three datasets each with 100 rows, you mean?

Hrihaan Over a year ago

Yes, that would be so kind if you can instruct me for that approach so that I can do this for any situations@Divakar

Divakar Over a year ago

@Hrihaan So, filelist holds the three input datasets, right? Then, just replace [a,b,c] with filelist. Should work. That is : np.median(filelist,axis=0)[:,1] or np.median(np.asarray(filelist)[...,1], axis=0).

Hrihaan Over a year ago

for fname in filelist: data=np.loadtxt(fname) x=np.median(filelist,axis=0)[:,1] print(x) (I tried the above thing, got this error (TypeError: cannot perform reduce with flexible type)

|

Collectives™ on Stack Overflow

Selecting specific elements and finding their median using numpy

1 Answer 1

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related