1

I got the basics of parsing .csv files and putting certain lines into lists and/or dictionaries but this one I cant crack.

There are 9 lines with general information like

  • client name
  • invoice number
  • invoice date
  • ...etc

And then there is detailed listing of product and price. What I wish to do is:

  1. get 'Invoice #', 'Issue date', 'Due date' and 'Amount due' from the first 9 lines
  2. Get just the 'Description' and 'Amount' from the remaining lines

into a dictionary. I will then write this data into a mySql database. Can someone suggest how do I start adding items to a dictionary after this "header" (line 9)?

Thanks.

ExampleCSV:

Bill to Client                          
Billing ID  xxxx-xxxx-xxxx                          
Invoice number  3359680287                          
Issue date  1/31/2016                           
Due Date    3/1/2016                            
Currency    EUR                         
Invoice subtotal    2,762,358.40                            
VAT (0%)    0                           
Amount due  2,762,358.40                            

Account ID  Account Order   Purchase Order  Product Description Quantity    Units   Amount
xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client- Google Search       Google AdWords  Belgium_GDN_january_(FR)    1   Impressions 0.04
xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client- Google Search       Google AdWords  UK_GDN_january  392 Impressions 2.92
xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client- Google Search       Google AdWords  Poland_GDN_january  12  Impressions 0.05    

xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client      Google AdWords  Switzerland Family vacation 251 Clicks  4,718.91
xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client      Google              
xxx-xxx-xxxx    Client - Search, GDN, Youtube   Client      Google AdWords  Invalid activity            -16.46

When I try this code:

import csv

with open('test.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=",")
    for row in readCSV:
        print(row[0])

I get this in terminal:

Bill to
Billing ID
Invoice number
Issue date
Due Date
Currency Invoice
subtotal
VAT (0%)
Amount due
Traceback (most recent call last): File "xlwings_test.py", line 7, in print(row[0]) IndexError: list index out of range xlwings git:master ❯

3 Answers 3

1

You could use the csv module and enumerate the reader object.

import csv

dict1 = {}
dict2 = {}

with open("test.csv", "rb") as f:
    reader = csv.reader(f, delimiter="\t")
    for i, line in enumerate(reader):
        if i in [3, 4, 5, 9]:
            prop_name = line[0]
            prop_val = line[1]
            dict1[prop_name] = prop_value # Invoice number, Issue date, Due date or Amount date
        elif i > 11:
            # Fetch other information like 'description' and 'amount'
            print "Description: " + line[5]
            print "Amount: " + line[-1]
            dict2[line[5]] = line[-1]

print dict1
print dict2
Sign up to request clarification or add additional context in comments.

3 Comments

How would I store this into a dictionary? Dict1 {key:value pairs form lines 3, 4, 5, 9}, Dict2 {key:value pairs from line 11 to the end of the file, just 'Description' and 'Amount'}
@AlexStarbuck - I've edited the answer. Do have a look.
I still get the same error: Traceback (most recent call last): File "xlwings_test.py", line 11, in <module> dict1[line[0]] = line[1] # Invoice number IndexError: list index out of range
1

Simplest solution is to split specific rows in list by commas and read amount and description data from end to start of list. You probably got error because you have blank rows in your file and you must not split them. Here is code:

import csv

general_info=dict()
rest_of_file_list=[]

row_counter=0
with open('test.csv', 'rb') as file:
reader = csv.reader(file)
    for row in file:
        if row_counter==2:
            #invoice row
            general_info['Invoice number'] = row.split(',')[1].rstrip()
        elif row_counter==3:
            #issue date row
            general_info['Issue date'] = row.split(',')[1].rstrip()
        elif row_counter==4:
            #due date row
            general_info['Due date'] = row.split(',')[1].rstrip()
        elif row_counter==8:
            #amount due row
            general_info['Amount due'] = row.split(',')[1].rstrip()
        elif row_counter > 10:
            #last and 4th item from the end of the list are amount and description
            if row and not row.isspace():
                item=dict()
                lista=row.split(',')

                item['Description']=lista[len(lista)-4].rstrip()
                item['Amount']=lista[len(lista)-1].rstrip()
                rest_of_file_list.append(item)
        row_counter+=1

print(general_info)
print(rest_of_file_list)    

Comments

0

What I recomend you is to read the general information separately and then parse the remaining lines using the csv module as a string. For the first purpose I will create the header_attributes dictionary, the rest will be read using a csv.DictReader class instance.

import csv
from StringIO import StringIO

CLIENT_PROPERTY_LINE_COUNT = 10

f = open("test.csv")

#When reading the file, headers are comma separated in the following format: Property, Value. 
#The if inside the forloop is used to ignore blank lines or lines with only one attribute.
for i in xrange(CLIENT_PROPERTY_LINE_COUNT):
    splitted_line = f.readline().rsplit(",", 2)

    if len(splitted_line) == 2:
        property_name, property_value = splitted_line
        stripped_property_name = property_name.strip()
        stripped_property_value = property_value.strip()
        header_attributes[stripped_property_name] = stripped_property_value

print(header_attributes)
account_data = f.read()

account_data_memory_file = StringIO()
account_data_memory_file.write(account_data)
account_data_memory_file.seek(0)

account_reader = csv.DictReader(account_data_memory_file)

for account in account_reader:
    print(account['Units'], account['Amount']

2 Comments

Sorry forgot to mention - using Python 2.7. I would like to disregard he general info , first 9 lines but I need 4 out o these 9.
@AlexStarbuck I have updated my answer, you can check it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.