Sorting a CSV-file into different CSVs by column value

Question

I'm still very much a beginner when it comes to programming and I've run into some problems with my code. Searched here for solutions but sadly nothing helped.

What I'm trying to do: I have a csv-file (which I imported from multiple txt.files). One of my columns lists years from 2015 to 1991 and I want to sort all rows of my file into different csvs depending on the according years. My current code looks something like this (even though I changed it around quite a bit, trying to work in tips I found on this side)

einzel = pd.read_csv("501-1000.csv", sep='\t',header=0,index_col=False,usecols = ("TI","AB","PY","DI"),dtype = str)

with open("501-1000.csv", "r",encoding="utf_8"):

    for row in einzel:
        if einzel["PY"] == ["2015","2014","2013","2012","2011"]:
            with open("a.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["2010","2009","2008","2007","2006"]:
            with open("b.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["2005","2004","2003","2002","2001"]:
            with open("c.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["2000","1999","1998","1997","1996"]:
            with open("d.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["1995","1994","1993","1992","1991"]:
            with open("e.csv","a") as out:
                writer.writerow(row)

Now...this does not work and I get an error

ValueError: Arrays were different lengths: 489 vs 5

The Traceback is

ValueError                                Traceback (most recent call last)
<ipython-input-10-72280961cb7d> in <module>()
 19    # writer = csv.writer(out)
 20     for row in einzel:
---> 21         if einzel["PY"] == ["2015","2014","2013","2012","2011"]:
 22             with open("a.csv","a") as out:
 23                 writer.writerow(row)

~\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
859 
860             with np.errstate(all='ignore'):
--> 861                 res = na_op(values, other)
862             if is_scalar(res):
863                 raise TypeError('Could not compare %s type with Series' %

~\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
763 
764         if is_object_dtype(x.dtype):
--> 765             result = _comp_method_OBJECT_ARRAY(op, x, y)
766         else:
767 

~\Anaconda3\lib\site-packages\pandas\core\ops.py in _comp_method_OBJECT_ARRAY(op, x, y)
741             y = y.values
742 
--> 743         result = lib.vec_compare(x, y, op)
744     else:
745         result = lib.scalar_compare(x, y, op)

pandas\_libs\lib.pyx in pandas._libs.lib.vec_compare()

ValueError: Arrays were different lengths: 489 vs 5

I searched for the error around here but sadly none of the solutions worked or I did not understand them. I started out using something like this instead, which didn't work either..

with open("501-1000.csv", "r",encoding="utf_8") as inp:
#reader = csv.reader(inp)
#writer = csv.writer(out)

I'd be really glad about any hints or corrections, if there's anything wrong with the way I've asked the question I'll correct it..First post and all that.

are you using pandas? Also: post the full traceback, not only the error. The error alone is not useful to understand what causes the problem — Francesco Montesano
– Francesco Montesano, Commented Jan 24, 2018 at 13:39
Yes, I'm using pandas. Will edit the traceback in, sorry for missing that! — Seeamoebe
– Seeamoebe, Commented Jan 24, 2018 at 13:47
next time add the pandas tag for pandas questions. Also: once you edit the question, make sure that the code indetation is correct (all the if in the for loop are not indented) — Francesco Montesano
– Francesco Montesano, Commented Jan 24, 2018 at 13:48
Corrected the indetation, not sure why they were not corrected. Thanks for pointing out the pandas tag, forgot about that one. — Seeamoebe
– Seeamoebe, Commented Jan 24, 2018 at 13:57

jpp · Accepted Answer · 2018-01-24 14:03:22Z

1

Here is a pandas solution.

import pandas as pd

filemap_dict = {'a': set(range(2011, 2016)),
                'b': set(range(2006, 2011)),
                'c': set(range(2001, 2006)),
                'd': set(range(1996, 2001)),
                'e': set(range(1991, 1996))}

# check your mappings are mutually exclusive
assert not set.intersection(*list(filemap_dict.values())), "Year ranges are not mutually exclusive!"

# load data; note dtype not set to str since there appear to be numeric columns
cols = ['TI', 'AB', 'PY', 'DI']
df = pd.read_csv('501-1000.csv', sep='\t', header=None, index_col=False, names=cols, usecols=cols)

# cycle through filename_dict, slice and export to csv
for k, v in filemap_dict.items():
    df[df['PY'].isin(v)].to_csv(k+'.csv', index=False)

edited Jan 24, 2018 at 14:03

answered Jan 24, 2018 at 13:50

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Seeamoebe Over a year ago

Do I have to import something else? Currently it tells me filemap_dict is not defined.

Seeamoebe Over a year ago

Yes, I can see that. Still I get an error on this line: 16 # cycle through filename_dict, slice and export to csv ---> 17 for k, v in file_map_dict.items():. And it tells me, the name is not defined.

jpp Over a year ago

I just corrected a typo, from file_map_dict to filemap_dict.

Seeamoebe Over a year ago

Correcting this solves the error (could've seen that myself..) but reading one of the output csvs returned them empty? (I tried that with test = pd.read_csv("b.csv") and test.tail())

jpp Over a year ago

You have to check the year column is being read in as integers rather than strings. If read in as string, you may need to do df['PY'].astype(int).isin(v). The code should work, but you will have to do some sanity checks on data types.

|

Collectives™ on Stack Overflow

Sorting a CSV-file into different CSVs by column value

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related