Skipping header in CSV file

Question

I'm adding data from csv file using lambda function the data is added but there's an error in my table in dynamodb I see my headers also a row in table here's my code :

import boto3
s3=boto3.client("s3")

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('maysales')

def lambda_handler(event, context):
    bucketna=event['Records'][0]['s3']['bucket']['name']
    s3_name=event['Records'][0]['s3']['object']['key']
    response=s3.get_object(Bucket=bucketn,Key=s3_name)
    data=response['Body'].read().decode("utf-8")
    salesnbs=data.split("\n")
    for ko in salesnbs:
        kos=ko.split(",")
        table.put_item(
            Item = { 
            "Date": kos[0],
            "name": kos[1],
            "fam": kos[2],
            "locati": kos[3],
            "adress": kos[4],
            "country": kos[5],
            "city": kos[6]
        })

my table contains headers already:

You might want to consider using the CSV module, which has a DictReader class. This automatically converts your CSV into a dictionary where the headers are used a dictionary keys. This means your header is also skipped when parsing the rows. — Alastair McCormack
– Alastair McCormack, Commented Apr 17, 2020 at 15:02

E.J. Brennan · Accepted Answer · 2020-04-17 14:35:59Z

1

The first row of most CSV files contain the header labels, if you don't want to add that row to your dynamodb table, you need to skip past that first row before you start doing your insertions, i.e:

row = 0
for ko in salesnbs:
    if row == 0:
       continue # don't process this line

    row = row + 1
    kos=ko.split(",")
    table.put_item(
        Item = { 
        "Date": kos[0],
        "name": kos[1],
        "fam": kos[2],
        "locati": kos[3],
        "adress": kos[4],
        "country": kos[5],
        "city": kos[6]
    })

(syntax might not be 100% correct, but that is the idea)

answered Apr 17, 2020 at 14:35

E.J. Brennan

46.9k8 gold badges94 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

VGupta Over a year ago

row = row + 1 should be under IF statement block else it won't do anything

alexis-donoghue · Accepted Answer · 2020-04-17 15:54:33Z

1

It's not entirely clear what's the problem from description, but I suggest using Python's built-in module csv to handle CSV data. This way you won't need to worry about headers and splitting file into columns, since module provides tools for that.

import csv
...

# Here you can also specify delimiter if need be
reader = csv.DictReader(response['Body'])
for row in reader:
    table.put_item(
            Item = { 
            "Date": row["Date"],
            "name": row["name"],
            "fam": row["fam"],
            ...
        })

Module uses first row of the file for column names.

edited Apr 17, 2020 at 15:54

answered Apr 17, 2020 at 14:11

alexis-donoghue

3,43713 silver badges25 bronze badges

2 Comments

Alastair McCormack Over a year ago

If you provide an example using the CSV module, this would be a good answer.

Alastair McCormack Over a year ago

@OleksiiDonoha nice

Eric Aya · Accepted Answer · 2022-12-28 13:39:39Z

Now from below modified code of @E.J. Brennan, we can able to skip header while pushing csv file from s3 into dynamodb. The below piece of code to replaced into your lambda function.

import boto3
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('yourdynamodbtablename')
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    s3_file_name = event['Records'][0]['s3']['object']['key']
    response = s3_client.get_object(Bucket=bucket,Key=s3_file_name)
    fileData = response['Body'].read().decode("utf-8")
    print(fileData)
    modelData = fileData.split("\n")
    header = 0
    for row in modelData:
        print(row)
        if header == 0:
            header = header+1
            continue
        row_data = row.split(",")
        try:
            table.put_item(
                Item = {
                    'ID': row_data[0],
                    'NAME': row_data[1],
                    'SUBJECT': row_data[2]
                }
            )
        except Exceptions as e:
            print('End of File')
    return 'news rows were inserted successful without header into db'

Tanmay Singhal · Accepted Answer · 2020-04-17 14:47:20Z

-1

Use boto3 client instead of resource. Install dynamodb-json

from dynamodb_json import json_util as dynamo_json
import json
import boto3
s3=boto3.client("s3")

dynamodb = boto3.client('dynamodb')

def lambda_handler(event, context):
    bucketna=event['Records'][0]['s3']['bucket']['name']
    s3_name=event['Records'][0]['s3']['object']['key']
    response=s3.get_object(Bucket=bucketn,Key=s3_name)
    data=response['Body'].read().decode("utf-8")
    salesnbs=data.split("\n")
    for ko in salesnbs:
        kos=ko.split(",")
        data = { 
            "Date": kos[0],
            "name": kos[1],
            "fam": kos[2],
            "locati": kos[3],
            "adress": kos[4],
            "country": kos[5],
            "city": kos[6]
        }
       client.put_item(TableName='maysales',Item=json.loads(dynamo_json.dumps(data)))

answered Apr 17, 2020 at 14:47

Tanmay Singhal

417 bronze badges

2 Comments

Alastair McCormack Over a year ago

How does this answer the OP's problem?

user191152 Over a year ago

Your answer could be improved by elaborating on why using boto3.client over boto3.resource will answer the question. stackoverflow.com/help/how-to-answer

Collectives™ on Stack Overflow

Skipping header in CSV file

4 Answers 4

1 Comment

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related