0

I have a CSV file with two columns with no header. I want to compare these two columns and find out if column one matches with the list of column 2 and extract the matching values into a new CSV file (output.csv), and delete the whole row if column 2 does not have matching values with column 1. For example,

Input.csv:

1,"[0, 10, 12, 13, 16, 25, 32, 35, 60, 86, 98, 108, 168, 172, 222, 251, 275, 278, 325, 365]"
60,"[12014, 25665, 28278]"
86,"[0, 6, 7, 10, 12, 25, 76, 156, 174, 176, 181, 188, 365, 392, 438]"
108,"[1, 16, 21, 32, 35, 61, 81, 83, 95, 138, 153, 204, 222]"
438,"[30549]"
28278,"[60, 120, 140, 505, 3939, 4034, 7213, 7308, 8784, 14126, 14147, 15197, 16842, 20022, 28229]"

output.csv:

1,"[60, 108]"
60,"[28278]"
108,"[1]"
28278,"[60]"

I have tried this code,

    import csv

    with open('input.csv', 'r') as csvfile:
        csvreader = csv.reader(csvfile, delimiter='\t')

    nodes_in_1 = set()
    nodes_in_2 = set()

    for line in csvreader:
        nodes_in_1.add(line[0])
        nodes_in_2.add(line[1])

    nodes_in_both = nodes_in_1.intersection(nodes_in_2)

    with open('output.csv', 'w') as f_out:
        f_out.write(nodes_in_both + '\n')

I am a beginner. Thank you for the help.

3
  • Welcome to SO. This isn't a discussion forum or tutorial. Please take the tour and take the time to read How to Ask and the other links found on that page. Invest some time with the Tutorial practicing the examples. It will give you an idea of the tools Python offers to help you solve your problem. “Can someone help me?” not an actual question?. Commented Feb 22, 2021 at 20:13
  • This is not a hard problem. What have you done so far? You will need to process the file twice. First pass, you gather the contents of column 1. Second pass, you filter based on column 2. Commented Feb 22, 2021 at 20:31
  • @TimRoberts I have tried that code, but it does not work. I have tried to work with Pandas, which I have read is easier, but I have not been able to find a way to compare to the second column, as I get an empty output file or inverted columns. Commented Feb 22, 2021 at 20:52

2 Answers 2

1

This can indeed be done in pandas:

import pandas as pd
from ast import literal_eval
df = pd.read_csv("test.csv",header=None, converters={1: literal_eval}) # load csv, use literal_eval to load the lists as lists in stead of strings
df[1] = df[1].apply(lambda x: [i for i in x if i in df[0].tolist()]) # keep only the values in the lists in the second column that match with a value in the first column
df = df[df[1].map(len) > 0] # drop rows with empty lists  
df.to_csv('output.csv', index=False, header=None) # write df to csv

Output df:

|    |     0 | 1             |
|---:|------:|:--------------|
|  0 |     1 | [60, 86, 108] |
|  1 |    60 | [28278]       |
|  2 |    86 | [438]         |
|  3 |   108 | [1]           |
|  5 | 28278 | [60]          |
Sign up to request clarification or add additional context in comments.

1 Comment

It works perfectly!! Thank you for the help.
0
import re
def run():
    c1,c3 = [],[]
    with open('stack1.txt') as f:
        # c1 holds 1st column values
        for line in f:
            c1.append(line.split(',')[0].replace(' ',''))
        f.seek(0)
        for line in f:
            cx = line.split(',')[0].replace(' ','')
            # get list stored in column 2
            c2 = re.search('\[.*\]', line).group()[1:-1].replace(' ','').split(',')
            # find elements common with c1 (first column)
            c2 = [i for i in c2 if i in c1] 
            if c2:
                c3.append(cx + ',"[{}]"'.format(','.join(c2)))
    with open('stack1.out','w') as f:
        f.write('\n'.join(c3))
if __name__ == '__main__':
    run()

Above code does what you want to do. I hope I am not abusing your class homework ;-)

1 Comment

Thank you for the help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.