1

I really want to read the following csv file: ID; First Name; Last Name; Phone; 123; Max; Smith; 0193849843 124; John; Doe; 0012943843

..and extract it into the following Format:

[OrderedDict([('ID', '123'), ('Last Name', 'Smith')]), OrderedDict([('ID', '124'), ("Last Name", "Doe")])]

However, with my Code displayed below, im only able to get the OrderedDict with all keys inside. How is it possible to only Access certain columns within the csv file? I Need the exact Output in order to later transform the Code into JSON.

import csv

csvfilepath = r"csvpath"
jsonfilepath = r"jsonpath"

data = []

with open(csvfilepath) as csvfile:
    csvReader = csv.DictReader(csvfile,delimiter=";")

    for csvRow in csvReader:
        ID = csvRow["ID"]
        data.append(csvRow) 

Thanks a lot! Jonas

0

1 Answer 1

1

The short answer is yes, you can read specific columns (but with a caveat). However it's going to be much simpler if you just read all the columns and then build a dictionary from the columns that you need. It's much simpler and might even perform better.


You can use fieldnames argument to explictly define the columns you are interested in. The caveat is that the other columns will still be present in the dictionary under the None key (unless you provide another key with the restkey argument).

From the docs:

The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames. Regardless of how the fieldnames are determined, the ordered dictionary preserves their original ordering.

If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with None.

You can use fieldnames to specify the columns you want and then use .pop to remove the None key (and its values).

Consider the following file:

header1,header2
a,b
c,d
e,f

Then:

with open('test.csv') as csvfile:
    csvReader = csv.DictReader(csvfile, fieldnames=['header1'])
    print([row for row in csvReader])
    # [OrderedDict([('header1', 'header1'), (None, ['header2'])]),
    #  OrderedDict([('header1', 'a'), (None, ['b'])]),
    #  OrderedDict([('header1', 'c'), (None, ['d'])]), 
    #  OrderedDict([('header1', 'e'), (None, ['f'])])]

If we pop the None key:

csvReader = list(csvReader)
[row.pop(None) for row in csvReader]
# yes, abusing list comprehension for a side effect for sake of a simple example.
# Don't do that in production code
print([row for row in csvReader])
# [OrderedDict([('header1', 'header1')]), OrderedDict([('header1', 'a')]),
#  OrderedDict([('header1', 'c')]), OrderedDict([('header1', 'e')])]
Sign up to request clarification or add additional context in comments.

2 Comments

Hey, thanks for your answer. The only Thing is that my dict Looks as follows when I select the columns "ID" and "Last Name". As you can see Last Name does not match. Do you know why? Also, is there a way to exclude the first part of the dict containing only the headers? Output: [OrderedDict([('ID', 'ID'), ('Last Name', 'First Name')]), OrderedDict([('ID', '123'), ('Last Name', 'Max')]), OrderedDict([('ID', '124'), ('Last Name', 'John')])]. Thanks again!
@J.Weiser This has more to do with the format of the csv file you are trying to read than the code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.