0

I have the following three data sets.

  1. 2 3
  2. 4 5
  3. 6 6

  1. 5 7
  2. 7 4
  3. 9 9

  1. 1 8
  2. 2 3
  3. 3 2

Basically, i want to create a column where the elements will be the median of the corresponding elements of the second column. The first elements of the second column for each of the sets are (3,7,8) and median=7, second elements of the second column of the data sets are (5,4,3) and median=4 and third elements of the second column of data sets are (6,9,2) and median =6. So I want my output to be a numpy array like [(7,4,6)].

I tried the following approach:

import numpy as np
filelist=[]
for i in range (1,4):
    filelist.append("/Users/Hrihaan/Desktop/A_%s.txt" %i)
for fname in filelist:
    data=np.loadtxt(fname)
    x=data[:,1]
    for j in range (0,3):
        y=np.median(x[j,1]) # tried this method and thought would get the arrays i want (3,7,8) , (5,4,3) and (6,9,2) and their medians
        print(y)

Received the following error : (IndexError: too many indices for array)

Any suggestion would mean a lot.

1 Answer 1

1

Slice the second columns and use np.median along the appropriate axis -

np.median([a[:,1],b[:,1],c[:,1]],axis=0)

Or wrap as an array, then slice and finally use np.median -

np.median(np.asarray([a,b,c])[...,1], axis=0)

Or use np.median, that will take care of conversion to array under the hoods and then slice -

np.median([a,b,c],axis=0)[:,1]

So, if you have arrays as input, go with the first method for efficiency, otherwise the latter two would work just as well with arrays/lists.

Sample run -

In [10]: a = np.array([[2,3],[4,5],[5,6]])
    ...: b = np.array([[5,7],[7,4],[9,9]])
    ...: c = np.array([[1,8],[2,3],[3,2]])
    ...: 

In [11]: np.median([a[:,1],b[:,1],c[:,1]],axis=0)
Out[11]: array([ 7.,  4.,  6.])

To make it work with the posted code in the question :

# Grab filenames
filelist=[]
for i in range (1,4):
    filelist.append("/Users/Hrihaan/Desktop/A_%s.txt" %i)

# Grab second columns off each
data_list = []
for fname in filelist:
    data=np.loadtxt(fname)
    data_list.append(data[:,1])

desired_output = np.median(data_list,axis=0)
Sign up to request clarification or add additional context in comments.

14 Comments

Would you please suggest what should be the approach if I have to do the same thing but say for example 100 rows except only 3 like in this case. It would be difficult to define numpy arrays as a, b and c as you instructed.
@Hrihaan Three datasets each with 100 rows, you mean?
Yes, that would be so kind if you can instruct me for that approach so that I can do this for any situations@Divakar
@Hrihaan So, filelist holds the three input datasets, right? Then, just replace [a,b,c] with filelist. Should work. That is : np.median(filelist,axis=0)[:,1] or np.median(np.asarray(filelist)[...,1], axis=0).
for fname in filelist: data=np.loadtxt(fname) x=np.median(filelist,axis=0)[:,1] print(x) (I tried the above thing, got this error (TypeError: cannot perform reduce with flexible type)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.