CSV Helper: write a csv with multiple nested lists of complex objects

Question

I've been puzzling with this one for a while. Given these classes:

public class Person
{
    public int Id { get; set; }
    public string Name { get; set;}
    public List<Book> Books { get; set; }
}
public class Book
{
    public Guid Id { get; set; }
    public string Name { get; set;}
    public List<Chapter> Chapters{ get; set; }
    public List<Author> Authors { get; set; }
}

public class Author 
{
    public string Name { get; set; }
}

public class Chapter
{
    public string Name { get; set;}
}

I'd like to spit out a csv that doesn't repeat the top level relationship, and starts new rows in the correct column. For Example

personId | Name  | BookName  | AuthorName  | ChapterName |

   1     | name1 | Book1Name | AuthorName1 | chapterName1|   
         |       |           | AuthorName2 |             |
         |       | Book2Name | AuthorName1 | chapterName1|
   2     | name2 | Book3Name | AuthorName3 | chapterName3|   
   3     | name3 | Book1Name |             | chapterName4|
         |       |           |             | chapterName5|

I've tried messing around with the csv maps and calling references, and also using csv.WriteField(null) to skip to the column I want, but nothing works exactly like I need it to.

You probably will need to project your data model to some sort of simpler intermediate DTO. Also, the logic you wan to use for flattening your Book model is unclear. When there are multiple authors and multiple chapters, how do you want to pair them up? In the example shown you have multiple authors or multiple chapters, but never both. — dbc
– dbc, Commented Apr 27, 2023 at 17:12
This is not a well formed CSV and you are better suited persisting this to JSON — Narish
– Narish, Commented Apr 27, 2023 at 17:18
Do you absolutely need to use a csv for this for some reason? Seems like a poor fit. — Broots Waymb
– Broots Waymb, Commented Apr 27, 2023 at 17:19
I agree that csv is not a good format for this, it supposed to be an exported high level summary of a lot of data. I was overruled by the business team. The classes/data above is just generic for the use of illustrating the concept of what I'm looking for, I've edited it to be correct. These are already projected into simple DTO's but the relationships prevent anymore flattening I think — Morks
– Morks, Commented Apr 27, 2023 at 18:20

dbc · Accepted Answer · 2023-04-27 19:29:37Z

The difference in structure between your your data model object graph and your desired final output model is too extreme for CsvHelper (or any other serializer) to do the mapping "on the fly". Instead, you will need to create some DTO that represents your desired final CSV row format, then create some elaborate LINQ query that projects your object graph to an enumerable of DTOs. Having done that, you will be able to serialize the DTOs using CsvHelper.

First, for a DTO, you can use a simple record like the following:

public record PersonAndBookDTO(int? personId, string? Name, string? BookName, string? AuthorName, string? ChapterName);

Next, define the following query:

var query = 
    from p in people
    from b in p.Books.DefaultIfEmpty().Select((book, index) => (book, index))
    let firstBookOfPerson = b.index == 0
    from pair in (b.book?.Authors?.Select(a => a.Name)).DefaultIfEmptyOrNull()
        .Merge((b.book?.Chapters?.Select(c => c.Name)).DefaultIfEmptyOrNull(), 
               (author, chapter) => (author, chapter))
        .Select((pair, index) => (pair.author, pair.chapter, index))
    let firstPairOfBook = pair.index == 0
    select new PersonAndBookDTO(firstBookOfPerson && firstPairOfBook? p.Id : (int?)null,
                                firstBookOfPerson && firstPairOfBook? p.Name : null,
                                firstPairOfBook ? b.book?.Name : null,
                                pair.author, pair.chapter);

Using the following extension methods:

public static class EnumerableExtensions
{
    // Copied from this anser https://stackoverflow.com/a/6624756/3744182 by https://stackoverflow.com/users/203499/damian
    // To https://stackoverflow.com/questions/1190657/add-two-lists-of-different-length-in-c-sharp
    public static IEnumerable<TResult> Merge<TFirst,TSecond,TResult>(this IEnumerable<TFirst> first,
                IEnumerable<TSecond> second, Func<TFirst, TSecond, TResult> operation) {
        using (var iter1 = first.GetEnumerator()) {
            using (var iter2 = second.GetEnumerator()) {
                while (iter1.MoveNext()) {
                    if (iter2.MoveNext()) {
                        yield return operation(iter1.Current, iter2.Current);
                    } else {
                        yield return operation(iter1.Current, default(TSecond));
                    }
                }
                while (iter2.MoveNext()) {
                    yield return operation(default(TFirst),  iter2.Current);
                }
            }
        }
    }
    
    public static IEnumerable<TSource?> DefaultIfEmptyOrNull<TSource>(this IEnumerable<TSource>? source) => (source ?? Enumerable.Empty<TSource>()).DefaultIfEmpty();
}

And now you will be able to serialize to CSV as follows:

var sb = new StringBuilder();
using (var writer = new StringWriter(sb)) // Or use new StreamWriter(filename) if serializing to a file
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
    csv.WriteRecords(query);
}

Notes:

Books.DefaultIfEmpty() is used to ensure one row is generated for each person even if that person has no books.
Authors.DefaultIfEmptyOrNull() and Chapters.DefaultIfEmptyOrNull() are used to ensure one row is generated per book even if the book has no authors or chapters.
Enumerable.Select((item, index) => (item, index)) is used to keep track of whether a given book or author/chapter pair is the first one encountered in its parent. This is necessary because you don't want to duplicate person or book names.
Your question doesn't indicate how to pair together items from the list of authors and list of chapters when there are more than one of both. I adopted the Merge() solution from this answer by damian to Add two Lists of different length in C# which pairs them by index and uses a default value when one list is longer than the other. This matches the required output for the simple cases shown in your question.

If you require a different pairing algorithm, adjust EnumerableExtensions.Merge() as necessary.

This generates the following CSV:

personId,Name,BookName,AuthorName,ChapterName
1,name1,Book1Name,AuthorName1,chapterName1
,,,AuthorName2,
,,Book2Name,AuthorName1,chapterName1
2,name2,Book3Name,AuthorName3,chapterName3
3,name3,Book1Name,,chapterName4
,,,,chapterName5

Which results in the following table when imported into Excel:

Demo fiddle here.

Wow. That is insane and pretty much exactly what I needed. It's not fun to look at and the solution pretty much shows why I was right to push back in the first place, but thank you.

Collectives™ on Stack Overflow

CSV Helper: write a csv with multiple nested lists of complex objects

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related