6
\$\begingroup\$

At the end of each billing cycle, Amazon generates a raw transaction file for my store's orders that cycle. I am converting that raw transaction file into a .csv file to be imported into my accounting software. My converter program takes two input files.

The first input, the Amazon data file, contains hundreds of lines, and looks like this:

settlement-id   settlement-start-date   settlement-end-date deposit-date    total-amount    currency    transaction-type    order-id    merchant-order-id   adjustment-id   shipment-id marketplace-name    amount-type amount-description  amount  fulfillment-id  posted-date posted-date-time    order-item-code merchant-order-item-id  merchant-adjustment-item-id sku quantity-purchased  promotion-id
11774871501 2019-04-01 13:09:26 UTC 2019-04-07 14:54:57 UTC 2019-04-09 14:54:57 UTC 11591.38    USD                                                                     
11774871501                     Order   111-3282062-5204245 111-3282062-5204245     DW6NY7djJ   Amazon.com  ItemPrice   Principal   29.95   AFN 2019-04-03  2019-04-03 17:13:29 UTC 04346910081818          D0-FMT7-C3G9    1   
11774871501                     Order   111-3282062-5204245 111-3282062-5204245     DW6NY7djJ   Amazon.com  ItemPrice   Shipping    4.42    AFN 2019-04-03  2019-04-03 17:13:29 UTC 04346910081818          D0-FMT7-C3G9    1   
11774871501                     Order   111-3282062-5204245 111-3282062-5204245     DW6NY7djJ   Amazon.com  ItemFees    FBAPerUnitFulfillmentFee    -3.19   AFN 2019-04-03  2019-04-03 17:13:29 UTC 04346910081818          D0-FMT7-C3G9    1   
11774871501                     Order   111-3282062-5204245 111-3282062-5204245     DW6NY7djJ   Amazon.com  ItemFees    Commission  -4.49   AFN 2019-04-03  2019-04-03 17:13:29 UTC 04346910081818          D0-FMT7-C3G9    1   
11774871501                     Order   111-3282062-5204245 111-3282062-5204245     DW6NY7djJ   Amazon.com  ItemFees    ShippingChargeback  -4.42   AFN 2019-04-03  2019-04-03 17:13:29 UTC 04346910081818          D0-FMT7-C3G9    1   
11774871501                     Order   114-8130626-1298654 114-8130626-1298654     D7RCVz0SP   Amazon.com  ItemPrice   Principal   173.08  AFN 2019-04-03  2019-04-03 22:32:57 UTC 50221749590266          E6-0OOH-4ASK    1   
11774871501                     Order   114-8130626-1298654 114-8130626-1298654     D7RCVz0SP   Amazon.com  ItemFees    FBAPerUnitFulfillmentFee    -9.02   AFN 2019-04-03  2019-04-03 22:32:57 UTC 50221749590266          E6-0OOH-4ASK    1   
11774871501                     Order   114-8130626-1298654 114-8130626-1298654     D7RCVz0SP   Amazon.com  ItemFees    Commission  -25.96  AFN 2019-04-03  2019-04-03 22:32:57 UTC 50221749590266          E6-0OOH-4ASK    1   
…

The second input is a CSV file that stores the necessary accounting codes, which contains a few dozen lines, and looks like this:

combined-type,AccountCode,Description
Non-AmazonOrderItemFeesFBAPerUnitFulfillmentFee,501,Non-Amazon - Order - ItemFees - FBAPerUnitFulfillmentFee
Amazon.comOrderItemPricePrincipal,400,Amazon.com - Order - ItemPrice - Principal
Amazon.comOrderItemPriceShipping,403,Amazon.com - Order - ItemPrice - Shipping
Amazon.comOrderItemPriceShippingTax,202,Amazon.com - Order - ItemPrice - ShippingTax
…

I used Python 3 and Pandas. Can you help me make this code bulletproof? My concern is the code does not catch errors and/or does not execute my intent.

Landmark 1: Read in the settlement file (raw transaction file). Read in the account codes file to match transactions to accounting codes.

Landmark 2: Clean up the settlement file by collecting the summary data and converting the four columns into a combined column for comparison and matching to the account codes file.

Landmark 3: Find the matching account code for each transaction and sum the total. So now we should have a summary amount for each account code.

Landmark 4: Clean up the new combined dataframe into the proper template format for our accounting software. Export new file as .csv for uploading into accounting software.

#!/usr/bin/env python3.6
# -*- coding: utf-8 -*-

import pandas as pd
import numpy as np

#Read in Amazon settlement - can be picked up by Python script later
az_data = pd.read_csv('/Users/XXX/Desktop/az_data.txt', sep='\t', header=0, parse_dates=['settlement-start-date', 'settlement-end-date'])
df = pd.DataFrame(az_data)

#Read in Account codes - this can be SQL storage later
acct_codes = pd.read_csv('/Users/XXX/Desktop/acct_codes.csv', sep=',', header=0)
df_accts = pd.DataFrame(acct_codes)

#Take summary data from first row of Amazon settlement and use as check-data
settlement_id = df.iloc[0,0]
settlement_start_date = df.iloc[0,1]
settlement_end_date = df.iloc[0,2]
deposit_date = df.iloc[0,3]
invoice_total = df.iloc[0,4]

#Drop summary row as it is no longer needed and doesn't match
df.drop(df.index[0], inplace=True)

#Replace blank values in 'marketplace-name' column with 'alt-transaction' so groupby doesn't skip those values. And replace all other blank values with 'NA' value
fillvalues = {'marketplace-name': 'alt-transaction'}
df.fillna(value=fillvalues, inplace=True)
df.fillna('NA', inplace=True)

#Create combined column to use as a key
df['combined-type'] = df['marketplace-name'] + df['transaction-type'] + df['amount-type']+ df['amount-description']

#Groupby combined column and take sum of categories
df_mod = df.groupby(['combined-type'])[['amount']].sum()

#Merge dataframes to get account codes and descriptions
df_results = df_mod.merge(df_accts, on='combined-type', how='left')

#Drop row from un-used account
df_results = df_results[df_results['combined-type'] != 'Non-AmazonOrderItemPricePrincipal']

#Rename columns to match Xero template
df_results.rename(columns={'amount':'UnitAmount'}, inplace=True)

#Drop the now un-needed combined-type column
df_results.drop(columns=['combined-type'], inplace=True)

#Add invoice template columns with data
df_results['ContactName'] = 'Amazon.com'
df_results['InvoiceNumber'] = 'INV_' + str(settlement_id)
df_results['Reference'] = 'AZ_Xero_Py_' + str(settlement_id)
df_results['InvoiceDate'] = settlement_start_date
df_results['DueDate'] = settlement_end_date
df_results['Quantity'] = 1
df_results['Currency'] = 'USD'
df_results['TaxType'] = 'Tax on Sales'
df_results['TaxAmount'] = 0
df_results['TrackingName1'] = 'Channel'
df_results['TrackingOption1'] = 'Amazon'

#Re-order columns to match template
all_column_list = ['ContactName','EmailAddress','POAddressLine1','POAddressLine2','POAddressLine3','POAddressLine4','POCity','PORegion','POPostalCode','POCountry','InvoiceNumber','Reference','InvoiceDate','DueDate','InventoryItemCode','Description','Quantity','UnitAmount','Discount','AccountCode','TaxType','TrackingName1','TrackingOption1','TrackingName2','TrackingOption2','Currency','BrandingTheme']
df_final = df_results.reindex(columns=all_column_list)

print (df_final.dtypes)

#Re-format the datetimes to be dates
df_final['InvoiceDate'] = df_final['InvoiceDate'].dt.date
df_final['DueDate'] = df_final['DueDate'].dt.date

#Export final df to csv
df_final.to_csv('/Users/XXX/Desktop/invoice' + str(settlement_id) + '.csv', index=False)

\$\endgroup\$
0

1 Answer 1

3
\$\begingroup\$

This is already decent. Points of feedback -

# -*- coding: utf-8 -*- is not necessary (see e.g. Working with UTF-8 encoding in Python source). You're in Python 3, so PEP 263 is no longer relevant.

It's usually not a good idea to hard-code absolute paths like your /Users/ in scripts. Either write and document the script to assume that the current directory makes sense for relative paths, or control paths through command-line options.

PEP8 asks for a space between the hash # and comment content.

sep=',' is the default and can be removed.

Remove df = pd.DataFrame(az_data) and df_accts = pd.DataFrame(acct_codes). The output of read_csv is already a DataFrame.

The summary row assignment can be combined to a single unpack expression over df.iloc[0, :5].

I consider .iloc[1:] to be simpler than a .drop() on the first index.

You've written df_mod to be indexed by a one-item list to be a dataframe, but it needn't be a dataframe - it should just be a series, and rather than member .merge(), just call pd.merge with left and right parameters.

The section Add invoice template columns with data should do multi-column assignment with an index list on the left and a value tuple on the right:

df_results[[
    'ContactName', 'InvoiceNumber', 'Reference',
    'InvoiceDate', 'DueDate', 'Quantity', 'Currency',
    'TaxType', 'TaxAmount', 'TrackingName1', 'TrackingOption1',
]] = (
    'Amazon.com', 'INV_' + str(settlement_id), 'AZ_Xero_Py_' + str(settlement_id),
    settlement_start_date, settlement_end_date, 1, 'USD',
    'Tax on Sales', 0, 'Channel', 'Amazon',
)

or probably more legibly, a call to assign().

all_column_list should be broken up to multiple lines.

df.fillna('NA', inplace=True) is dtype-incompatible and throws warnings. Pay attention to those warnings! In this case it doesn't affect the output so I remove it.

Re-format the datetimes to be dates is not an accurate description; it's not a format - it's effectively a cast from the datetime64 dtype to an object dtype.

'/Users/XXX/Desktop/invoice' + str(settlement_id) + '.csv' should use f-string interpolation.

At the bottom I demonstrate how to call into the Pandas in-built testing framework to make sure that changes don't introduce regression.

#!/usr/bin/env python3

import pandas as pd

# Read in Amazon settlement - can be picked up by Python script later
df = pd.read_csv(
    'az_data.txt', sep='\t', header=0,
    parse_dates=['settlement-start-date', 'settlement-end-date'],
)

# Read in Account codes - this can be SQL storage later
df_accts = pd.read_csv('acct_codes.csv', header=0)

# Take summary data from first row of Amazon settlement and use as check-data
(
    settlement_id, settlement_start_date, settlement_end_date, deposit_date, invoice_total,
) = df.iloc[0, :5]

# Drop summary row as it is no longer needed and doesn't match
df = df.iloc[1:]

# Replace blank values in 'marketplace-name' column with 'alt-transaction' so
# groupby doesn't skip those values.
fillvalues = {'marketplace-name': 'alt-transaction'}
df.fillna(value=fillvalues, inplace=True)

# Create combined column to use as a key
df['combined-type'] = df['marketplace-name'] + df['transaction-type'] + df['amount-type']+ df['amount-description']

# Groupby combined column and take sum of categories
df_mod = df.groupby(['combined-type'])['amount'].sum()

# Merge dataframes to get account codes and descriptions
df_results = pd.merge(left=df_mod, right=df_accts, on='combined-type', how='left')

# Drop row from un-used account
df_results = df_results[df_results['combined-type'] != 'Non-AmazonOrderItemPricePrincipal']

# Rename columns to match Xero template
df_results.rename(columns={'amount': 'UnitAmount'}, inplace=True)

# Drop the now un-needed combined-type column
df_results.drop(columns=['combined-type'], inplace=True)

# Add invoice template columns with data
df_results = df_results.assign(
    ContactName='Amazon.com',
    InvoiceNumber='INV_' + str(settlement_id),
    Reference='AZ_Xero_Py_' + str(settlement_id),
    InvoiceDate=settlement_start_date,
    DueDate=settlement_end_date,
    Quantity=1,
    Currency='USD',
    TaxType='Tax on Sales',
    TaxAmount=0,
    TrackingName1='Channel',
    TrackingOption1='Amazon',
)

# Re-order columns to match template
all_column_list = [
    'ContactName', 'EmailAddress', 'POAddressLine1', 'POAddressLine2',
    'POAddressLine3', 'POAddressLine4', 'POCity', 'PORegion', 'POPostalCode',
    'POCountry', 'InvoiceNumber', 'Reference', 'InvoiceDate', 'DueDate',
    'InventoryItemCode', 'Description', 'Quantity', 'UnitAmount', 'Discount',
    'AccountCode', 'TaxType', 'TrackingName1', 'TrackingOption1', 'TrackingName2',
    'TrackingOption2', 'Currency', 'BrandingTheme',
]
df_final = df_results.reindex(columns=all_column_list)

# Map the datetimes to dates
df_final['InvoiceDate'] = df_final['InvoiceDate'].dt.date
df_final['DueDate'] = df_final['DueDate'].dt.date

# Export final df to csv
filename = f'invoice{settlement_id}.csv'
# Original: df_final.to_csv(filename, index=False)
# Writing for reference: df_final.to_csv(filename, index=True)
reference = pd.read_csv(
    filename, index_col=0, parse_dates=['InvoiceDate', 'DueDate'],
)
reference['InvoiceDate'] = reference['InvoiceDate'].dt.date
reference['DueDate'] = reference['DueDate'].dt.date
pd.testing.assert_frame_equal(reference, df_final)
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.