I have a speed/efficiency related question about python:
I need to write a large number of very large R dataframe-ish files, about 0.5-2 GB sizes. This is basically a large tab-separated table, where each line can contain floats, integers and strings.
Normally, I would just put all my data in numpy dataframe and use np.savetxt to save it, but since there are different data types it can't really be put into one array.
Therefore I have resorted to simply assembling the lines as strings manually, but this is a tad slow. So far I'm doing:
1) Assemble each line as a string 2) Concatenate all lines as single huge string 3) Write string to file
I have several problems with this: 1) The large number of string-concatenations ends up taking a lot of time 2) I run of of RAM to keep strings in memory 3) ...which in turn leads to more separate file.write commands, which are very slow as well.
So my question is: What is a good routine for this kind of problem? One that balances out speed vs memory-consumption for most efficient string-concatenation and writing to disk.
... or maybe this strategy is simply just bad and I should do something completely different?
Thanks in advance!