2

I've got a small issue. I'm trying to create a script that takes large (~2gb) csv files (id, integer, integer), sorts them by the first integer and then writes, to a new file, the top x rows (as defined by the user).

I'm able to get the sort function to work as required and extracting the top X rows works also but I can't work out how to get this output to write to a csv. To check it has been working, I have included a print function and it all seems to work out fine.

I feel like I'm missing a really basic concept in the csv module but I can't work out what it is!

import csv
import operator

def csv_to_list(csv_file, delimiter=','):

    with open(csv_file, 'r') as csv_con:
        reader = csv.reader(csv_con, delimiter=delimiter)
        return list(reader)

def sort_by_column(csv_cont, col, reverse=True):

    header = csv_cont[1]
    body = csv_cont[1:]
    if isinstance(col, str):  
        col_index = header.index(col)
    else:
        col_index = col
    body = sorted(body, 
           key=operator.itemgetter(col_index), 
           reverse=reverse)
    #body.insert(0, header)
    return body

def print_csv(csv_content):
    for row in csv_content:
        row = [str(e) for e in row]
        print('\t'.join(row))

def write_csv(dest, csv_cont):
    with open(dest, 'w') as out_file:
        writer = csv.writer(out_file, delimiter=',')
        for row in csv_cont:
            writer.writerow(row)

csv_cont = csv_to_list(input_hep.csv)
row_count = sum(1 for row in csv_cont)
num_rows = int(input("Skim size?: "))
output_file = input("Output: ")

csv_sorted = sort_by_column(csv_cont, 1)
for row in range(num_rows):
    print(csv_sorted[row])

My main idea was to try:

with open(output_file+'.csv','w') as f:
    writer = csv.writer(f, delimiter =',')
    for row in range(num_rows):
        writer.writerow(row)

But then I get a "_csv.Error: iterable expected, not int" error. I get why but I'm struggling to understand how I can get the output (as it is printed) to write within a csv. Any tips or pointers would be appreciated.

2 Answers 2

6

If your array is a multidimensional list, you can use writerows directly without iterating

with open(output_file+'.csv','w') as f:
    writer = csv.writer(f, delimiter =',')
    writer.writerows(sorted_csv_cont)

Assuming your list is in following format

[
  ["R1_C1","R1_C2"],
  ["R2_C1","R2_C2"]
]
Sign up to request clarification or add additional context in comments.

Comments

0

I just write to csv like this

hs = open(filepath,"w+")
for mline in rows:
    hs.write(",".join(mline)+"\r")

but I load up the csv as a multi-dimensional list with each row representing a row in in the csv and those lists having items that represent an item in the row

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.