0

I am tasked with the work of Scraping data from a webpage and write them a long with other information into a CSV. Currently I used JSoup to scrape the website but my problem is not sure how to write them to a CSV.

I store the data of each scraped page inside of an Object calls CSVObject:

public class CSVObject {
    String name;
    String title;
    String description;
    String ArrayList<String> color;
    String ArrayList<String> size;
    String ArrayList<float> price;
}

I store these Objects in an ArrayList<CSVObject>

The name, title, description is from the scraped data but the color, size and price are from user input. They can choose multiple and it will add to the ArrayList in the Object.

The desired file output is something like this:

Name         Title           Description         Color         Size         Price
Shirt        Holiday Shirt   Shirt Description   Black         S            15.99
Shirt                                            Black         M            19.99
Shirt                                            Black         L            24.99
Shirt                                            Green         S            15.99
Shirt                                            Green         M            19.99
Shirt                                            Green         L            24.99
Pants        Movie Pants     Pants Description   Red           S            17.99
...

I did some digging and found Java CSV Library in How to serialize object to CSV file? can help write file to CSV but I am not sure how to format it to the desire output. So what should I do to write the file as intended?

2 Answers 2

0

Flat file

Comma-Separated Values (CSV) and Tab-Delimited formats are for flat files, a single table in each. That means one set of rows that all share the same set of columns.

To export the data as seen in your example data, repeat the values in the first columns that you have suppressed. Then you would have a set of rows all sharing the same set of columns.

Hierarchy

According to your Java class, you have a hierarchy of data. That does not fit CSV format. Square peg, round hole.

To match the structure of your Java class, you should be serializing your data in a hierarchical format such as XML or JSON.

Not-really-CSV

If you insist on using that not-really-CSV format you showed, you need nested loops.

Loop your set of objects. For each of those objects, loop the lists contained within.

On the first time through the lists, write out all columns. For subsequent times in the inner loop, suppress those values, writing only a COMMA character to maintain the column count.

Straight-forward logic, nothing tricky, following the same steps you would take if writing these values by hand to paper.

Of course, any field values containing your delimiter character (COMMA, etc.) must be enclosed within quotes. Or just enclose all fields in quotes.

Sign up to request clarification or add additional context in comments.

4 Comments

I understand that. But the requirement is for CSV file since it is used for importing to Shopify website. The format of the file is like that so I cannot change that.
@user3417256 I added last section.
Thanks for the answer but I still can picture it clearly. So for the outer most loop I loop for number of Object in the CSVObject Arraylist, inside that I loop for the color ArrayList and in that loop I loop for the size and so on?
@user3417256 Yes. Two loops, nested. The outer loop is based on count of objects. Inner loop is based on size of any of your lists (your three lists are the same size).
0

Here's a quick and dirty, it assumes your lists of color, prices and sizes always have the same length

interface CSVObject {
    String name();
    String title();
    String description();
    List<String> color();
    List<String> size();
    List<Double> price();
}

List<CSVObject> data = List.of();

String csv =data.stream()
            .flatMap(co->IntStream.range(0,co.color().size())
                    .mapToObj(i->new String[]{co.name(),co.title(),co.description(),co.color().get(i),co.size().get(i),co.price().get(i).toString()} ))
                    .map(sa-> Arrays.stream(sa).collect(Collectors.joining(",")))
                    .collect(Collectors.joining("\n"));

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.