How to exclude columns within csv DictReader using Python?

Question

I really want to read the following csv file: ID; First Name; Last Name; Phone; 123; Max; Smith; 0193849843 124; John; Doe; 0012943843

..and extract it into the following Format:

[OrderedDict([('ID', '123'), ('Last Name', 'Smith')]), OrderedDict([('ID', '124'), ("Last Name", "Doe")])]

However, with my Code displayed below, im only able to get the OrderedDict with all keys inside. How is it possible to only Access certain columns within the csv file? I Need the exact Output in order to later transform the Code into JSON.

import csv

csvfilepath = r"csvpath"
jsonfilepath = r"jsonpath"

data = []

with open(csvfilepath) as csvfile:
    csvReader = csv.DictReader(csvfile,delimiter=";")

    for csvRow in csvReader:
        ID = csvRow["ID"]
        data.append(csvRow)

Thanks a lot! Jonas

DeepSpace · Accepted Answer · 2018-10-08 12:10:04Z

1

The short answer is yes, you can read specific columns (but with a caveat). However it's going to be much simpler if you just read all the columns and then build a dictionary from the columns that you need. It's much simpler and might even perform better.

You can use fieldnames argument to explictly define the columns you are interested in. The caveat is that the other columns will still be present in the dictionary under the None key (unless you provide another key with the restkey argument).

From the docs:

The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames. Regardless of how the fieldnames are determined, the ordered dictionary preserves their original ordering.

If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with None.

You can use fieldnames to specify the columns you want and then use .pop to remove the None key (and its values).

Consider the following file:

header1,header2
a,b
c,d
e,f

Then:

with open('test.csv') as csvfile:
    csvReader = csv.DictReader(csvfile, fieldnames=['header1'])
    print([row for row in csvReader])
    # [OrderedDict([('header1', 'header1'), (None, ['header2'])]),
    #  OrderedDict([('header1', 'a'), (None, ['b'])]),
    #  OrderedDict([('header1', 'c'), (None, ['d'])]), 
    #  OrderedDict([('header1', 'e'), (None, ['f'])])]

If we pop the None key:

csvReader = list(csvReader)
[row.pop(None) for row in csvReader]
# yes, abusing list comprehension for a side effect for sake of a simple example.
# Don't do that in production code
print([row for row in csvReader])
# [OrderedDict([('header1', 'header1')]), OrderedDict([('header1', 'a')]),
#  OrderedDict([('header1', 'c')]), OrderedDict([('header1', 'e')])]

edited Oct 8, 2018 at 12:10

answered Oct 8, 2018 at 12:04

DeepSpace

82.2k12 gold badges119 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

J.Weiser Over a year ago

Hey, thanks for your answer. The only Thing is that my dict Looks as follows when I select the columns "ID" and "Last Name". As you can see Last Name does not match. Do you know why? Also, is there a way to exclude the first part of the dict containing only the headers? Output: [OrderedDict([('ID', 'ID'), ('Last Name', 'First Name')]), OrderedDict([('ID', '123'), ('Last Name', 'Max')]), OrderedDict([('ID', '124'), ('Last Name', 'John')])]. Thanks again!

DeepSpace Over a year ago

@J.Weiser This has more to do with the format of the csv file you are trying to read than the code

Collectives™ on Stack Overflow

How to exclude columns within csv DictReader using Python?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related