CSV to Struct Advice

Question

If I have a csv read into a struct how can I manipulate the input to build the struct how I want? I am getting stuck in circles following various tutorials. This is the closest I have come.

I essentially want to open a csv, read selected columns, ensure the value is recorded from the same row when referencing the column. Then the resulting data in a format which can be put into a database.

Example CSV:

Ignore,Customer,Fruit,Number
123,A,Apple,1
123,A,Apple,3
123,B,Orange,4
123,C,Melon,5

Example Code:

package main
import (
    "bufio"
    "encoding/csv"
    "encoding/json"
    "fmt"
    "io"
    "log"
    "os"
)

type Account struct {
    Customer string `json:"Customer"`
    LineItem *LineItem  `json:"LineItem"`
}

type LineItem struct {
    ProductName string `json:"ProductName"`
    Count string `json:"Count"`
}


func main() {
    csvFile, _ := os.Open("/home/frank/gocode/src/local/billing/fruit.csv")

    reader := csv.NewReader(bufio.NewReader(csvFile))
    var billData []Account
    for {
        line, error := reader.Read()
        if error == io.EOF {
            break
        } else if error != nil {
            log.Fatal(error)
        }
        billData = append(billData, Account{
            Customer: line[1],
            LineItem: &LineItem{
                ProductName:   line[2],
                Count: line[3],
            },
        })
    }

    billingJson, _ := json.Marshal(billData)
    fmt.Println(string(billingJson))
}

The current output is:

[{"Customer":"Customer","LineItem":{"ProductName":"Fruit","Count":"Number"}},{"Customer":"A","LineItem":{"ProductName":"Apple","Count":"1"}},{"Customer":"A","LineItem":{"ProductName":"Apple","Count":"3"}},{"Customer":"B","LineItem":{"ProductName":"Orange","Count":"4"}},{"Customer":"C","LineItem":{"ProductName":"Melon","Count":"5"}}]

I would like to get rid of first record so the headers are not kept. e.g.

[{"Customer":"A","LineItem":{"ProductName":"Apple","Count":"1"}},{"Customer":"A","LineItem":{"ProductName":"Apple","Count":"3"}},{"Customer":"B","LineItem":{"ProductName":"Orange","Count":"4"}},{"Customer":"C","LineItem":{"ProductName":"Melon","Count":"5"}}]

Consolidate so Customer A is one record with both LineItems e.g.

[{"Customer":"A","LineItem":{"ProductName":"Apple","Count":"1"},"LineItem":{"ProductName":"Apple","Count":"3"}},{"Customer":"B","LineItem":{"ProductName":"Orange","Count":"4"}},{"Customer":"C","LineItem":{"ProductName":"Melon","Count":"5"}}]

Any best practices - alternate methods welcomed (not sure if a map is better here). Hopefully enough info to give me a hand.

Kaedys · Accepted Answer · 2018-02-20 18:20:34Z

Getting rid of the first entry is as easy as billData = billData[1:]. That, or do an initial read to pull the column names.

On the second part, your current data structure does not tolerate a one-to-many relationship (each Account has one and only one LineItem). You'll need to do some processing on the list afterwards. CSV files are necessarily 1:1, as each line is considered a single independent record. The easiest way is to make it one-to-many is by using a map, but you can also simply loop over a slice (which retains closer to your existing code):

https://play.golang.org/p/3uevo0taKR5

package main

import (
    "bytes"
    "encoding/csv"
    "encoding/json"
    "fmt"
    "io"
    "log"
)

var data = `Ignore,Customer,Fruit,Number
123,A,Apple,1
123,A,Apple,3
123,B,Orange,4
123,C,Melon,5`

type Account struct {
    Customer  string     `json:"Customer"`
    LineItems []LineItem `json:"LineItems"`
}

type LineItem struct {
    ProductName string `json:"ProductName"`
    Count       string `json:"Count"`
}

func main() {
    reader := csv.NewReader(bytes.NewBufferString(data))

    // Read column label data and discard
    if _, err := reader.Read(); err != nil {
        log.Fatal(err)
    }

    var billData []Account
    for {
        line, err := reader.Read()
        if err == io.EOF {
            break
        }
        if err != nil {
            log.Fatal(err)
        }
        found := false
        for i := range billData {
            if billData[i].Customer == line[1] {
                found = true
                billData[i].LineItems = append(billData[i].LineItems, LineItem{
                    ProductName: line[2],
                    Count:       line[3],
                })
                break
            }
        }
        if !found {
            billData = append(billData, Account{
                Customer: line[1],
                LineItems: []LineItem{
                    {
                        ProductName: line[2],
                        Count:       line[3],
                    },
                },
            })
        }
    }

    billingJson, err := json.MarshalIndent(billData, "", "  ")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(string(billingJson))
}

Output:

[
    {
        "Customer": "A",
        "LineItems": [
            {
                "ProductName": "Apple",
                "Count": "1"
            },
            {
                "ProductName": "Apple",
                "Count": "3"
            }
        ]
    },
    {
        "Customer": "B",
        "LineItems": [
            {
                "ProductName": "Orange",
                "Count": "4"
            }
        ]
    },
    {
        "Customer": "C",
        "LineItems": [
            {
                "ProductName": "Melon",
                "Count": "5"
            }
        ]
    }
]

Lastly, I recommend using err or similar for your error variable. error is the name of the built in error type, so by naming your variable that, you're shadowing the type and making it impossible to declare a variable of that type within the same scope. While this doesn't affect your current code, it's still quite bad practice and liable to get you into trouble eventually.

Brilliant. Thank you for the detailed response, notes and improvements. Is there a performance benefit using one method or the other?
For an extremely large list, the map is likely going to be more efficient, since it has amortized constant insertion and search time, while the slice version has O(n) search time. The map version loops over the whole set once at the end to build the slice, while the slice version loops over the partial set on every insertion.

mpromonet · Accepted Answer · 2021-07-07 19:20:59Z

It is possible to use :

csvutil to read the csv into a slice of structure
a map to aggregate same customer lines
convert to the output structure

This could be like :

package main

import (
    "encoding/json"
    "fmt"
    "io/ioutil"

    "github.com/jszwec/csvutil"
)

type Account struct {
    Customer  string     `json:"Customer"`
    LineItems []LineItem `json:"LineItems"`
}

type LineItem struct {
    ProductName string `json:"ProductName"`
    Count       string `json:"Count"`
}

type CsvEntry struct {
    Customer string
    Fruit    string
    Number   string
}

func main() {
    // read csv
    content, _ := ioutil.ReadFile("./fruits.csv")
    var entries []CsvEntry
    csvutil.Unmarshal(content, &entries)

    // aggregate by customer name
    customersMap := map[string][]LineItem{}
    for _, bill := range entries {
        customersMap[bill.Customer] = append(customersMap[bill.Customer], LineItem{bill.Fruit, bill.Number})
    }

    // build output structure
    accounts := []Account{}
    for customer, bill := range customersMap {
        accounts = append(accounts, Account{customer, bill})
    }

    // print json
    billingJson, _ := json.Marshal(accounts)
    fmt.Println(string(billingJson))
}

This support column permutation.

Collectives™ on Stack Overflow

CSV to Struct Advice

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related