3

I have database outputs like the following:

$row = '(one,"pika chu",,"")'

If I send this string as parameter to str_getcsv it will output ['one', 'pika chu', '', '']. The third element despite being absent has been turned into an empty string. This is very annoying since I must recognize empty values (no values) from empty strings. The output I would expect is ['one', 'pika chu', null, ''].

The inputs I get are from a PostgreSQL database and are represented as composite values.

By example, if a table is pokemon_id => int4, name => text then a query will output strings like '(1, "pika chu")'. A unique constraint on the name field by example will allow the following two records to exist: (100, '') and (101, null).

When fetched, they are formatted as raw values like: '98,whatever' '99,"pika chu"' '100,""' '101,' '102,","'

I need to read those strings and this example must output the following arrays: ['98', 'whatever'] ['99', 'pika chu'] ['100', ''] ['101', null] ['102', ',']

Is there a way to do that in PHP?

Update 1: @deceze kindly sent me this link stating there are no NULLs in CSV (TL;DR because there were no nulls in XML basically, this problem has been tackled since then.) How to parse CSV with NULLs then?

Update 2: I had propositions to create a dedicated parser in PHP using preg_match_* functions. I am a bit reluctant to go that way because 1) of the performance impact compared to str_getcsv and 2) the fact preg_match used to segfault if the string passed was over 8kb (which can happen in a CSV context).

Update 3: I looked at str_getcsvsource code to see if it was possible to propose a patch to add parsing options like it is in some other languages. I now understand PHP’s underlying philosophy better. @daniel-vérité raised the idea to implement a state machine to parse CSV strings. Even though input can have thousands of lines that weight dozens of kilobytes with embedded CSV structures, it might be the best way.

Thank you for your help.

13
  • 2
    CSV doesn't have any type information, there isn't even any distinction between a string and a number. No value and an empty string mean the same thing in CSV. If you want to differentiate those, you'll need to write your own CSV parser. Commented Feb 20, 2017 at 14:38
  • 1
    Or assign special values instead of nulls (-1 for example) before sending them as a parameter Commented Feb 20, 2017 at 14:41
  • 1
    It is about types. "No value" and "empty string" are indistinguishable in CSV because the only type it has are strings. Since "no value" isn't a representable type in CSV you cannot differentiate it from "empty string". Commented Feb 20, 2017 at 14:46
  • 1
    It "should" only if you redefine how CSV works and/or specify your own flavour of CSV. Standard CSV has no null. See garretwilson.com/blog/2009/04/23/csvnull.xhtml, stackoverflow.com/a/5968530/476 Commented Feb 20, 2017 at 15:38
  • 2
    str_getcsv supposedly conforms to RFC4180, and the distinction you want does not conform to this RFC. Like @deceze, I think you need to write your own parser. It's about 100 lines of code for a hand-made finite state machine that implements the full spec. As for the performance in pure php, well...you'll see :) Commented Feb 21, 2017 at 15:13

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.