2

I have a list of integers saved in a csv sheet, the rows are not all the same length. Like the following example:

22,-14,-24,2,-26,18,20,-4,12,16,8,-6,-10
20,12,-16,18,28,24,4,-22,26,8,-10,-14,2,6
10,-26,-20,30,24,-22,18,-28,12,14,-6,-2,8,-16,-4
16, 22, 30, -18, -26, -28, 24, -8, 32, -14, 12, 4, 20, -10, 2, 6
32, 10, -14, 20, -22, 24, -4, -26, 34, 28, -30, 2, 12, 18, 6, -8, 16
8, -20, 34, 18, 30, 24, -4, 6, 28, -32, -12, -36, 10, 16, -38, 2, 14, -22, -26

I need to call a function where the input is an array consisting of one such row. So I need exactly the following.

input = [22,-14,-24,2,-26,18,20,-4,12,16,8,-6,-10]

Using the standard approach

import csv
with open(file.csv, 'r') as f:
        reader = csv.reader(f)
        for line in reader:
            print(line)

yields the output

['22', '-14', '-24', '2', '-26', '18', '20', '-4', '12', '16', '8', '-6', '-10']

which I can't use since the elements are not integers. I have tried to use different formatting parameters, like csv.QUOTE_NONE but nothing works. This makes sense as far as I know since csv files do not know integer data types.

My files have between 100'000-1'000'000 rows so any solution must be efficient. Since the number of columns is not fixed I also was not able to cast manually, I couldn't figure out how to loop through the columns of one row. Does anyone have an idea how I could solve this problem? I don't know if it could help but I am not bound to csv files, I could probably use something else.

4
  • 1
    line = [int(x) for x in line] Commented Jan 3, 2023 at 15:39
  • Related: stackoverflow.com/questions/31537187/… Commented Jan 3, 2023 at 15:39
  • 1
    Since the number of columns is not fixed I also was not able to cast manually I don't understand how this would stop you from casting manually. Commented Jan 3, 2023 at 16:02
  • @JohnGordon Thanks, I didn't mean that this would stop you, it just stopped me since I used bad code for casting. Commented Jan 3, 2023 at 16:34

2 Answers 2

2

You can just convert them to int:

elems = ['22', '-14', '-24', '2', '-26', '18', '20', '-4', '12', '16', '8', '-6', '-10']
elems = [int(i) for i in elems]

Output: [22, -14, -24, 2, -26, 18, 20, -4, 12, 16, 8, -6, -10]

The better handle the csv, you could also use Pandas:

import pandas as pd

df = pd.read_csv('line.csv', header=None, sep = ';')
df = df.T
for row, col in df.iteritems():
    line = list(df[row].dropna())
    print(line)

and the output is:

[22.0, -14.0, -24.0, 2.0, -26.0, 18.0, 20.0, -4.0, 12.0, 16.0, 8.0, -6.0, -10.0]
[20.0, 12.0, -16.0, 18.0, 28.0, 24.0, 4.0, -22.0, 26.0, 8.0, -10.0, -14.0, 2.0, 6.0]
[10.0, -26.0, -20.0, 30.0, 24.0, -22.0, 18.0, -28.0, 12.0, 14.0, -6.0, -2.0, 8.0, -16.0, -4.0]
[16.0, 22.0, 30.0, -18.0, -26.0, -28.0, 24.0, -8.0, 32.0, -14.0, 12.0, 4.0, 20.0, -10.0, 2.0, 6.0]
[32.0, 10.0, -14.0, 20.0, -22.0, 24.0, -4.0, -26.0, 34.0, 28.0, -30.0, 2.0, 12.0, 18.0, 6.0, -8.0, 16.0]
[8.0, -20.0, 34.0, 18.0, 30.0, 24.0, -4.0, 6.0, 28.0, -32.0, -12.0, -36.0, 10.0, 16.0, -38.0, 2.0, 14.0, -22.0, -26.0]
Sign up to request clarification or add additional context in comments.

Comments

1

As your CSV doesn't have any column names you don't really need the csv module (let alone pandas). You could just do this:

FILENAME = 'file.csv'

def parse(filename):
    with open(filename) as data:
        for line in map(str.strip, data):
            yield list(map(int, line.rstrip(',').split(',')))

for line in parse(FILENAME):
    print(line)

Output:

[22, -14, -24, 2, -26, 18, 20, -4, 12, 16, 8, -6, -10]
[20, 12, -16, 18, 28, 24, 4, -22, 26, 8, -10, -14, 2, 6]
[10, -26, -20, 30, 24, -22, 18, -28, 12, 14, -6, -2, 8, -16, -4]
[16, 22, 30, -18, -26, -28, 24, -8, 32, -14, 12, 4, 20, -10, 2, 6]
[32, 10, -14, 20, -22, 24, -4, -26, 34, 28, -30, 2, 12, 18, 6, -8, 16]
[8, -20, 34, 18, 30, 24, -4, 6, 28, -32, -12, -36, 10, 16, -38, 2, 14, -22, -26]

3 Comments

Thanks, this is exacly the (hopefully) lightweight and fast solution I was looking for.
@bolsch I created a file with 1 million rows and 20 columns per row. This code parses the entire file in 2.54s so it's quite efficient
Your solution worked perfectly, thank you. But sadly I've made a mistake in my data sheet. Now my rows end with a comma, an example would be 22,-14,-24,2,-26,18,20,-4,12,16,8,-6,-10, . Is there an easy way to account for this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.