1

I have three column data saved in inp.dat

1.0   2.0   5.0 
2.0   3.0   6.0 
3.0   4.0   8.0 
4.0   1.0   7.0 
5.0   2.0   8.0 
5.0   2.0   8.0
1.0   2.0   5.0 
2.0   3.0   6.0

Additionally,A particular index value is provided for each column i.e. for column 1 index value is 3, for column 2 index value is 4, and for column 3 index value is 4 which is depicted as index_value=[3,4,4]. I want to select data 2 index values before and 2 index values after the given index values in the list, and all others values to be zero.

The expected output should be saved as file.out as shown below.

0.0   0.0   0.0 
2.0   0.0   0.0 
3.0   4.0   8.0 
4.0   1.0   7.0 
5.0   2.0   8.0 
5.0   2.0   8.0
0.0   2.0   5.0 
0.0   0.0   0.0

My code:

import numpy as np
import pandas as pd
data=np.loadtxt("inp.dat")
print(data.shape)

index_value=[3,4,4]

for i,data in enumerate(data):
    print(i,data)
    data=data[index_value[0]-2:index_value[0]+2]
np.savetxt('file.out',data)

I am not getting expected output using my trial code.Moreover I want to apply it to many columns of data in future. As I am a beginner I hope experts may help me overcoming this problem. Thanks in advance.

1 Answer 1

1

You could apply your task column by column while looping through your index_list at the same time with zip. Then use a mask to set several values to 0.

import pandas as pd

df = pd.read_csv('data.dat',header=None, sep='\s+')
#this is only how I read the data to get the same example data you showed us
print(df)

     0    1    2
0  1.0  2.0  5.0
1  2.0  3.0  6.0
2  3.0  4.0  8.0
3  4.0  1.0  7.0
4  5.0  2.0  8.0
5  5.0  2.0  8.0
6  1.0  2.0  5.0
7  2.0  3.0  6.0

index_list = [3, 4, 4]

for target_idx, col in zip(index_list, df.columns):
    
    mask = (df.index >= target_idx-2) & (df.index < target_idx + 3)
    # for the first column mask looks like this:
    # [False  True  True  True  True  True False False]
    
    df.loc[~mask, col] = 0 # set all values NOT in the mask to 0

print(df)

     0    1    2
0  0.0  0.0  0.0
1  2.0  0.0  0.0
2  3.0  4.0  8.0
3  4.0  1.0  7.0
4  5.0  2.0  8.0
5  5.0  2.0  8.0
6  0.0  2.0  5.0
7  0.0  0.0  0.0

#If you like to save it:
df.to_csv('file.out',header=False, index=False, sep='\t')
Sign up to request clarification or add additional context in comments.

4 Comments

In reality i have thousands of column where it is not possible to write manually this line names='col1 col2 col3.............'.split() can u make it automatic
no need for it. that was just for me reading in the data. If header=None and no names defined, then pandas will just name them numerical, starting with 0,1,2 and so on. in the loop itself we just call df.columns, so you also don't need to know/write down any names for the columns. I can change my answer if you want ?
if possible please update it @Rabinzel.Thanks
I did. Since you will go through all columns back to back you don't need to worry about their names. Be aware that your index_list should be as long as the number of your columns. if this list is shorter, not all columns get applied with the mask.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.