Pandas/numpy array filling

Question

I´ve a Pandas dataframe that I read from csv and contains X and Y coordinates and a value that I need to put in a matrix and save it to a text file. So, I created a numpy array with max(X) and max(Y) extension.

I´ve this file:

fid,x,y,agblongo_tch_alive
2368458,1,1,45.0126083457747
2368459,1,2,44.8996854102889
2368460,2,2,45.8565022933761
2358154,3,1,22.6352522929758
2358155,3,3,23.1935887499899

And I need this one:

   45.01    44.89 -9999.00    
-9999.00    45.85 -9999.00
   22.63 -9999.00    23.19

To do that, I´m using a loop like this:

for row in data.iterrows():
    p[int(row[1][2]),int(row[1][1])] = row[1][3]

and then I save it to disk using np.array2string. It works.

As the original csv has 68 M lines, it´s taking a lot of time to process, so I wonder if there´s another more pythonic and fast way to do that.

Could you provide a minimal reproducible example? It's not really clear what you're trying to do here — sacuL
– sacuL, Commented Jun 11, 2018 at 21:14
I think you want stackoverflow.com/questions/45640852/… and then just write the output to a text file using array2string — user3483203
– user3483203, Commented Jun 11, 2018 at 21:16
What are you actually trying to solve with the matrix? The sensible way might be to keep the matrix in memory rather than write to disk. — roganjosh
– roganjosh, Commented Jun 11, 2018 at 21:21
if the 68M row are a flattened representation of your matrix, then it's ~8250 points which is already pretty huge. — roganjosh
– roganjosh, Commented Jun 11, 2018 at 21:26
I edited the question, I need to write it to disk because I need the file in a specific format, not the matrix itself. I´ll check the solution user3483203. — Mauro Assis
– Mauro Assis, Commented Jun 11, 2018 at 21:31

Paul Panzer · Accepted Answer · 2018-06-11 21:46:31Z

0

Assuming the columns of your df are 'x', 'y', 'value', you can use advanced indexing

>>> x, y, value = data['x'].values, data['y'].values, data['value'].values
>>> result = np.zeros((y.max()+1, x.max()+1), value.dtype)
>>> result[y, x] = value

This will, however, not work properly if coordiantes are not unique. In that case it is safer (but slower) to use add.at:

>>> result = np.zeros((y.max()+1, x.max()+1), value.dtype)
>>> np.add.at(result, (y, x), value)

Alternatively, you can create a sparse matrix since your data happen to be in sparse coo format. Using the '.A' property you can then convert that to a normal (dense) array as needed:

>>> from scipy import sparse
>>> spM = sparse.coo_matrix((value, (y, x)), (y.max()+1, x.max()+1))
>>> (spM.A == result).all()
True

Update: if the fillvalue is not zero the above must be modified.

Method 1: replace second line with (remember this should only be used if coordinates are unique):

>>> result = np.full((y.max()+1, x.max()+1), fillvalue, value.dtype)

Method 2: does not work

Method 3: after creating spM do

>>> spM.sum_duplicates()
>>> assert spM.has_canonical_format
>>> spM.data -= fillvalue
>>> result2 = spM.A + fillvalue

edited Jun 11, 2018 at 21:46

answered Jun 11, 2018 at 21:31

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mauro Assis Over a year ago

Paul, tks! It´s almost there, however, I cant sum the duplicate values because it will double the corresponding value. Instead of it, I need to delete one of the lines.

Paul Panzer Over a year ago

@MauroAssis if all rows with the same coordinates also have the same value and you don't want to count their multiplicity you can actually use Method 1. Because in that case it doesn't matter that the order of assignment is undefined.

Mauro Assis Over a year ago

Paul, in fact, I checked and there´s only one duplicated value, so I think I will keep method three. Thank you very much!

Collectives™ on Stack Overflow

Pandas/numpy array filling

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related