0

I have a comma separated CSV file contains NASDAQ symbols . I use Scanner to read a file

  s = new Scanner(new File("C:\\nasdaq_companylist.csv")).useDelimiter("\\s*,\\s*");    

I'm getting exception on second field .The problem is that this field , like some others fields in file contain commas too, for example "1-800 FLOWERS.COM, Inc.":

FLWS,"1-800 FLOWERS.COM, Inc.",2.8,76022800,n/a,1999,Consumer Services,Other Specialty Stores,http://www.nasdaq.com/symbol/flws    

How to avoid this problem ? My code is :

List<Stock> theList = new ArrayList<Stock>();
    StringBuilder sb = new StringBuilder();

    //get the title
    String title = s.nextLine();
    System.out.println("title: "+title);

    while (s.hasNext()) 
    {

        String symbol = s.next();
        String name = s.next();
        double lastSale = s.nextDouble();           
        long marketCap = s.nextLong();
        String adr =s.next();
        String ipoYear=s.next();
        String sector=s.next();
        String industry = s.next();
        String summaryQuote = s.next();
        theList.add(newStock(symbol,lastSale));} 

Thanks

2

4 Answers 4

3

Unless this is homework you should not parse CSV yourself. Use one of existing libraries. For example this one: http://commons.apache.org/sandbox/csv/

Or google "java csv parser" and choose another.

But if you wish to implement the logic yourself you should use negative lookahead feature of regular expressions (see http://download.oracle.com/javase/1,5.0/docs/api/java/util/regex/Pattern.html)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Alex , I found some libraries for parsing CSV files . Finally I used opencsv
1

Your safest bet is you use csv parsing library. Your comma is enclosed in quotes. You'd need to implement logic to look for quoted commas. However you'd also need to plan for other situations, like quote within a quote, escape sequences etc. Better use some ready for use and tested solution. Use google, you'll find some. CSV files can be tricky to use on your own.

Comments

1

As others have correctly pointed out, rolling your own csv parser is not a good idea as it will usually leave huge security holes in your system.

That said, I use this regex:

"((?:\"[^\"]*?\")*|[^\"][^,]*?)([,]|$)"

which does a good job with well-formed csv data. You will need to use a Pattern and a Matcher with it.

This is what it does:

/*
 ( - Field Group
   (?: - Non-capturing (because the outer group will do the capturing) consume of quoted strings
    \"  - Start with a quote
    [^\"]*? - Non-greedy match on anything that is not a quote
    \" - End with a quote
   )* - And repeat
  | - Or
   [^\"] - Not starting with a quote
   [^,]*? - Non-greedy match on anything that is not a comma
 ) - End field group
 ( - Separator group
  [,]|$ - Comma separator or end of line
 ) - End separator group 
*/

Note that it parses the data into two groups, the field and the separator. It also leaves the quote characters in the field, you may wish to remove them and replace "" with " etc.

1 Comment

Paul , thank you for sharing the regex . Since I saw your answer I started to work on my own regex.
0

I hope you can remove \ \ s * from your regular expression. Then have:

while (s.hasNext() {
    String symbol = s.next();
    if (symbol.startsWith("\"")) {
        while ((symbol.endsWith("\"") || symbol.length() == 1) && s.hasNext()) {
            symbol += "," + s.next();
        }
    }
...

1 Comment

Thank you Joop for nice sharing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.