3

I'm still very much a beginner when it comes to programming and I've run into some problems with my code. Searched here for solutions but sadly nothing helped.

What I'm trying to do: I have a csv-file (which I imported from multiple txt.files). One of my columns lists years from 2015 to 1991 and I want to sort all rows of my file into different csvs depending on the according years. My current code looks something like this (even though I changed it around quite a bit, trying to work in tips I found on this side)

einzel = pd.read_csv("501-1000.csv", sep='\t',header=0,index_col=False,usecols = ("TI","AB","PY","DI"),dtype = str)

with open("501-1000.csv", "r",encoding="utf_8"):

    for row in einzel:
        if einzel["PY"] == ["2015","2014","2013","2012","2011"]:
            with open("a.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["2010","2009","2008","2007","2006"]:
            with open("b.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["2005","2004","2003","2002","2001"]:
            with open("c.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["2000","1999","1998","1997","1996"]:
            with open("d.csv","a") as out:
                writer.writerow(row)
        elif einzel["PY"] ==  ["1995","1994","1993","1992","1991"]:
            with open("e.csv","a") as out:
                writer.writerow(row)

Now...this does not work and I get an error

ValueError: Arrays were different lengths: 489 vs 5

The Traceback is

ValueError                                Traceback (most recent call last)
<ipython-input-10-72280961cb7d> in <module>()
 19    # writer = csv.writer(out)
 20     for row in einzel:
---> 21         if einzel["PY"] == ["2015","2014","2013","2012","2011"]:
 22             with open("a.csv","a") as out:
 23                 writer.writerow(row)

~\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
859 
860             with np.errstate(all='ignore'):
--> 861                 res = na_op(values, other)
862             if is_scalar(res):
863                 raise TypeError('Could not compare %s type with Series' %

~\Anaconda3\lib\site-packages\pandas\core\ops.py in na_op(x, y)
763 
764         if is_object_dtype(x.dtype):
--> 765             result = _comp_method_OBJECT_ARRAY(op, x, y)
766         else:
767 

~\Anaconda3\lib\site-packages\pandas\core\ops.py in _comp_method_OBJECT_ARRAY(op, x, y)
741             y = y.values
742 
--> 743         result = lib.vec_compare(x, y, op)
744     else:
745         result = lib.scalar_compare(x, y, op)

pandas\_libs\lib.pyx in pandas._libs.lib.vec_compare()

ValueError: Arrays were different lengths: 489 vs 5

I searched for the error around here but sadly none of the solutions worked or I did not understand them. I started out using something like this instead, which didn't work either..

with open("501-1000.csv", "r",encoding="utf_8") as inp:
#reader = csv.reader(inp)
#writer = csv.writer(out)

I'd be really glad about any hints or corrections, if there's anything wrong with the way I've asked the question I'll correct it..First post and all that.

4
  • are you using pandas? Also: post the full traceback, not only the error. The error alone is not useful to understand what causes the problem Commented Jan 24, 2018 at 13:39
  • Yes, I'm using pandas. Will edit the traceback in, sorry for missing that! Commented Jan 24, 2018 at 13:47
  • next time add the pandas tag for pandas questions. Also: once you edit the question, make sure that the code indetation is correct (all the if in the for loop are not indented) Commented Jan 24, 2018 at 13:48
  • 1
    Corrected the indetation, not sure why they were not corrected. Thanks for pointing out the pandas tag, forgot about that one. Commented Jan 24, 2018 at 13:57

1 Answer 1

1

Here is a pandas solution.

import pandas as pd

filemap_dict = {'a': set(range(2011, 2016)),
                'b': set(range(2006, 2011)),
                'c': set(range(2001, 2006)),
                'd': set(range(1996, 2001)),
                'e': set(range(1991, 1996))}

# check your mappings are mutually exclusive
assert not set.intersection(*list(filemap_dict.values())), "Year ranges are not mutually exclusive!"

# load data; note dtype not set to str since there appear to be numeric columns
cols = ['TI', 'AB', 'PY', 'DI']
df = pd.read_csv('501-1000.csv', sep='\t', header=None, index_col=False, names=cols, usecols=cols)

# cycle through filename_dict, slice and export to csv
for k, v in filemap_dict.items():
    df[df['PY'].isin(v)].to_csv(k+'.csv', index=False)
Sign up to request clarification or add additional context in comments.

6 Comments

Do I have to import something else? Currently it tells me filemap_dict is not defined.
Yes, I can see that. Still I get an error on this line: 16 # cycle through filename_dict, slice and export to csv ---> 17 for k, v in file_map_dict.items():. And it tells me, the name is not defined.
I just corrected a typo, from file_map_dict to filemap_dict.
Correcting this solves the error (could've seen that myself..) but reading one of the output csvs returned them empty? (I tried that with test = pd.read_csv("b.csv") and test.tail())
You have to check the year column is being read in as integers rather than strings. If read in as string, you may need to do df['PY'].astype(int).isin(v). The code should work, but you will have to do some sanity checks on data types.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.