2

I have a .dat file of coordinates (x,y and z), separated by a marker (an integer). Here's a snippet of it:

500
0.14166    0.09077      0
0.11918    0.08461      0
0.09838    0.07771      0
0.07937    0.07022      0
0.06223    0.06222      0
0.04705    0.05386      0
0.03388    0.04528      0
0.02281    0.03663      0
0.01391    0.02808      0
42
0.00733    0.01969      0
0.00297    0.01152      0
0.01809    -0.01422     0
0.03068    -0.01687     0
0.14166    0.09077      0
0.11918    0.08461      0
0.09838    0.07771      0
0.07937    0.07022      0
42
0.14166    0.09077      0
0.11918    0.08461      0
0.09838    0.07771      0
0.07937    0.07022      0

What's the best way to separate it in chunks (preferably, one array per interval between markers)?

It's just a fraction of the data, in reality there are a few thousand points.

5
  • Read it line by line, adding each line to a chunk until there is a line that contains only one number. Then start a new chunk. Represent each chunk as a list. Commented Jan 16, 2023 at 20:30
  • @mkrieger1 is it really the only alternative? Does read time increase with file size? Commented Jan 16, 2023 at 20:31
  • No, there are infinitely many alternatives. Of course the time increases with the file size. Commented Jan 16, 2023 at 20:31
  • 1
    @LucasPelizzarim, are those chunks uniformly consist of 9 lines? Does the input file always start with a marker line? Commented Jan 16, 2023 at 20:35
  • No, they arent always 9 lines. The file always starts with 500 and the blocks of coordinates start and end with 42, except for the last one. Commented Jan 16, 2023 at 20:46

1 Answer 1

1

I would suggest to apply the power of pandas and numpy libraries.

We start with loading the input file into dataframe with skipping the 1st row (skiprows=1) and explicitly specifying the number of columns via column names (names=['x','y','z']) meaning that marker lines will be treated as 1-column row with NaN values (like 42.00000 NaN NaN):

import pandas as pd
import numpy as np

coords = pd.read_table('test.dat', delim_whitespace=True, header=None,
                       engine='python', skiprows=1, names=['x','y','z'])

Then finding the positions of marker lines on which the coords dataframe will be splitted into chunks:

na_markers = coords.loc[coords['y'].isna()].index

Finally splitting and getting the needed numpy arrays:

coords = [chunk.dropna().to_numpy() for chunk in np.split(coords, na_markers)]

That's it, now coords contains a list of the needed coordinates "chunks":

[array([[0.14166, 0.09077, 0.     ],
       [0.11918, 0.08461, 0.     ],
       [0.09838, 0.07771, 0.     ],
       [0.07937, 0.07022, 0.     ],
       [0.06223, 0.06222, 0.     ],
       [0.04705, 0.05386, 0.     ],
       [0.03388, 0.04528, 0.     ],
       [0.02281, 0.03663, 0.     ],
       [0.01391, 0.02808, 0.     ]]), array([[ 0.00733,  0.01969,  0.     ],
       [ 0.00297,  0.01152,  0.     ],
       [ 0.01809, -0.01422,  0.     ],
       [ 0.03068, -0.01687,  0.     ],
       [ 0.14166,  0.09077,  0.     ],
       [ 0.11918,  0.08461,  0.     ],
       [ 0.09838,  0.07771,  0.     ],
       [ 0.07937,  0.07022,  0.     ]]), array([[0.14166, 0.09077, 0.     ],
       [0.11918, 0.08461, 0.     ],
       [0.09838, 0.07771, 0.     ],
       [0.07937, 0.07022, 0.     ]])]
Sign up to request clarification or add additional context in comments.

1 Comment

this worked flawlessly!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.