0

I have three CSV files. The first (csv1) can be considered a positive dataset where the first column (column 1) consists of certain IDs. The same goes second column as well. The data in csv1 are paired data meaning the corresponding entries in the CSV cells are pairs. Ex:

colA    colB
 A.1     B.1
 C.1     D.1

Here, A.1 and B.1 can be considered a pair, and the same goes for C.1 and D.1. In the second file (csv2), it only consists of the data of entries of Column A of file 1. Ex:

Col   X1    X2    X3    X4 
A.1  0.1   0.2   0.3   0.4
C.1  0.2   0.3   0.4   0.5

And similarly, the third file (csv3) consists of the data of entries of Column B of file 1. Ex:

Col   X1    X2    X3    X4 
B.1  0.1   0.2   0.3   0.4
D.1  0.2   0.3   0.4   0.5

I am writing a code where I first import all the three files and then iterate through the length of column A of file 1 and assign the values of the first cell of Column A and Column B to x and y respectively. I want to write a code where after assigning the respective values to x and y I will search whether these values are in file2 (x value) and file 3 (y value). If it is there then I want to extract the corresponding rows and concatenate them and save them in a separate CSV.

So, if my "x" is assigned a value of A.1 (hereby assigning I am assigning the string A.1) and "y" is assigned a value of B.1, then I want my code to first search if A.1 is there in file2 and B.1 is there in file3. If it is there, I want to extract the corresponding row values for A.1 (0,1,0.2,0.3,0.4...) and B.1 (0.2,0.3,0.4,0.5...) and concatenate their values:

  col     x1    x2    x3    x4   x5   x6
A.1_B.1   0.1   0.2   0.3   0.4  0.2  0.3  

This is what I have written, but I am facing a "Keyerror". Whereas, when I checked my CSV file the ID is there. Any help would be much appreciated.

file1 = pd.read_csv("/home/file1.csv")
file2 = pd.read_csv("/home/file2.csv")
file3 = pd.read_csv("/home/file3.csv")

for i in range(len(file1['ID'])):
    x = ID_A[i]
    y = ID_B[i]
    if x in CT_ID_A:
        if y in CT_ID_B:
            d1 = file2.loc[x]
            d2 = file3.loc[y]
            d3 = pd.concat([d1,d2],axis=1)

here, ID_A and ID_B consist of the corresponding IDs of columns of file1, and CT_ID_A and CT_IS_B consist of IDs of file2 and file3, that is:

ID_A = ['A.1','C.1']
ID_B = ['B.1','D.1']
CT_ID_A = ['A.1','C.1']
CT_ID_B = ['B.1','C.1']

1 Answer 1

1

If your key error is ID then there is a possibility that yoyr csv file header does not have any column with the name ID

Sign up to request clarification or add additional context in comments.

1 Comment

This answer is right. Also you are using ID_A, ID_B, CT_ID_A and CT_ID_B when you haven't declared them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.