selecting data based on index values

Question

I have three column data saved in inp.dat

1.0   2.0   5.0 
2.0   3.0   6.0 
3.0   4.0   8.0 
4.0   1.0   7.0 
5.0   2.0   8.0 
5.0   2.0   8.0
1.0   2.0   5.0 
2.0   3.0   6.0

Additionally,A particular index value is provided for each column i.e. for column 1 index value is 3, for column 2 index value is 4, and for column 3 index value is 4 which is depicted as index_value=[3,4,4]. I want to select data 2 index values before and 2 index values after the given index values in the list, and all others values to be zero.

The expected output should be saved as file.out as shown below.

0.0   0.0   0.0 
2.0   0.0   0.0 
3.0   4.0   8.0 
4.0   1.0   7.0 
5.0   2.0   8.0 
5.0   2.0   8.0
0.0   2.0   5.0 
0.0   0.0   0.0

My code:

import numpy as np
import pandas as pd
data=np.loadtxt("inp.dat")
print(data.shape)

index_value=[3,4,4]

for i,data in enumerate(data):
    print(i,data)
    data=data[index_value[0]-2:index_value[0]+2]
np.savetxt('file.out',data)

I am not getting expected output using my trial code.Moreover I want to apply it to many columns of data in future. As I am a beginner I hope experts may help me overcoming this problem. Thanks in advance.

Rabinzel · Accepted Answer · 2022-07-10 22:25:52Z

1

You could apply your task column by column while looping through your index_list at the same time with zip. Then use a mask to set several values to 0.

import pandas as pd

df = pd.read_csv('data.dat',header=None, sep='\s+')
#this is only how I read the data to get the same example data you showed us
print(df)

     0    1    2
0  1.0  2.0  5.0
1  2.0  3.0  6.0
2  3.0  4.0  8.0
3  4.0  1.0  7.0
4  5.0  2.0  8.0
5  5.0  2.0  8.0
6  1.0  2.0  5.0
7  2.0  3.0  6.0

index_list = [3, 4, 4]

for target_idx, col in zip(index_list, df.columns):
    
    mask = (df.index >= target_idx-2) & (df.index < target_idx + 3)
    # for the first column mask looks like this:
    # [False  True  True  True  True  True False False]
    
    df.loc[~mask, col] = 0 # set all values NOT in the mask to 0

print(df)

     0    1    2
0  0.0  0.0  0.0
1  2.0  0.0  0.0
2  3.0  4.0  8.0
3  4.0  1.0  7.0
4  5.0  2.0  8.0
5  5.0  2.0  8.0
6  0.0  2.0  5.0
7  0.0  0.0  0.0

#If you like to save it:
df.to_csv('file.out',header=False, index=False, sep='\t')

edited Jul 10, 2022 at 22:25

answered Jul 10, 2022 at 22:06

Rabinzel

7,9533 gold badges12 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user19520518 Over a year ago

In reality i have thousands of column where it is not possible to write manually this line names='col1 col2 col3.............'.split() can u make it automatic

Rabinzel Over a year ago

no need for it. that was just for me reading in the data. If header=None and no names defined, then pandas will just name them numerical, starting with 0,1,2 and so on. in the loop itself we just call df.columns, so you also don't need to know/write down any names for the columns. I can change my answer if you want ?

user19520518 Over a year ago

if possible please update it @Rabinzel.Thanks

Rabinzel Over a year ago

I did. Since you will go through all columns back to back you don't need to worry about their names. Be aware that your index_list should be as long as the number of your columns. if this list is shorter, not all columns get applied with the mask.

Collectives™ on Stack Overflow

selecting data based on index values

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related