0

I have three csv files each with three named columns, 'Genus', 'Species', and 'Source'. I merged the files into a new document and now I need to alphabetize the columns, first by genus and then by species. I figured I could do this by first alphabetizing the species, and then the genus and then they should be in the proper order, but I haven't been able to find anything online that addresses how to sort named columns of strings. I tried lots of different ways of sorting, but it either didn't change anything or replaced all the string in the first column with the last string.

Here's my code for merging the files:

import csv, sys

with open('Footit_aphid_list_mod.csv', 'r') as inny:
    reader = csv.DictReader(inny)

    with open('Favret_aphid_list_mod.csv', 'r') as inny:
        reader1 = csv.DictReader(inny)

        with open ('output_al_vonDohlen.csv', 'r') as inny:
            reader2 = csv.DictReader(inny)

            with open('aphid_list_complete.csv', 'w') as outty:
                fieldnames = ['Genus', 'Species', 'Source']
                writer = csv.DictWriter(outty, fieldnames = fieldnames)
                writer.writeheader() 

                for record in reader:
                    writer.writerow(record)
                for record in reader1:
                    writer.writerow(record)
                for record in reader2:
                    writer.writerow(record)

                for record in reader:
                    g = record['Genus']
                    g = sorted(g)
                    writer.writerow(record)

inny.closed
outty.closed
2
  • 2
    first store all the data in a list of rows then sort, then write back to file. Commented Nov 21, 2017 at 21:35
  • you may find this page useful: stackoverflow.com/questions/4233476/… Commented Nov 21, 2017 at 21:39

1 Answer 1

2

If you files aren't insanely large, then read all the rows into a single list, sort it, then write it back:

#!python2
import csv

rows = []

with open('Footit_aphid_list_mod.csv','rb') as inny:
    reader = csv.DictReader(inny)
    rows.extend(reader)

with open('Favret_aphid_list_mod.csv','rb') as inny:
    reader = csv.DictReader(inny)
    rows.extend(reader)

with open('output_al_vonDohlen.csv','rb') as inny:
    reader = csv.DictReader(inny)
    rows.extend(reader)

rows.sort(key=lambda d: (d['Genus'],d['Species']))

with open('aphid_list_complete.csv','wb') as outty:
    fieldnames = ['Genus','Species','Source']
    writer = csv.DictWriter(outty,fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(rows)
Sign up to request clarification or add additional context in comments.

2 Comments

This worked! The only thing is that because I'm using 2.7, I had to remove all the 'newline=' attributes from 'open'- but everything was just fine without them.
@birdoptera Updated. Note use of binary mode instead of newline='' for Python 2 per csv documentation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.