0

I need to split a column in a table which comma seperated values into different columns in a new view or table.
At the moment the best running solution for me is

CREATE VIEW clearTable as select ID,Timestamp,s1,s3,s4,s5,s6,s7,s8,substr(s8r,1,instr(s8r,",")-1) as s9 from
    (select ID,Timestamp,s1,s3,s4,s5,s6,s7,substr(s7r,1,instr(s7r,",")-1) as s8,substr(s7r,instr(s7r,",")+1) as s8r from
        (select ID,Timestamp,s1,s3,s4,s5,s6,substr(s6r,1,instr(s6r,",")-1) as s7,substr(s6r,instr(s6r,",")+1) as s7r from
            (select ID,Timestamp,s1,s3,s4,s5,substr(s5r,1,instr(s5r,",")-1) as s6,substr(s5r,instr(s5r,",")+1) as s6r from
                (select ID,Timestamp,s1,s3,s4,substr(s4r,1,instr(s4r,",")-1) as s5,substr(s4r,instr(s4r,",")+1) as s5r from
                    (select ID,Timestamp,s1,s3,substr(s3r,1,instr(s3r,",")-1) as s4,substr(s3r,instr(s3r,",")+1) as s4r from
                        (select ID,Timestamp,s1,substr(s2r,1,instr(s2r,",")-1) as s3,substr(s2r,instr(s2r,",")+1) as s3r from
                            (select ID,Timestamp,s1,substr(s1r,1,instr(s1r,",")-1) as s2,substr(s1r,instr(s1r,",")+1) as s2r from
                                (select ID,Timestamp,cast(substr(payload,1,instr(payload,",")-1) as TIME) as s1,substr(payload,instr(payload,",")+1) as s1r from thebasetable))))))))

as you see - for each seperation-char a new Level of subquery.
The result is that, what I wont, but I'm searching for betters ways to get there - maybe a more efficient solution.
As a working example you can use this SQL Fiddle.
Furthermore I'd like to mention that at the moment the data is stored within SQLite but that might Change, thus optimization is not necessary targeting only on SQLite.
All hints are welcome.

1
  • SQLite is designed to be embedded into another language. Commented May 28, 2015 at 10:06

1 Answer 1

1

Let me start with an error in your current solution (apart from being efficient or not): for third row it returns 0 in column s1. As far as I understood your intentions, you wanted to return first element from the payload, which for row 3 is A, not 0.

It also does not return s2 - I don't know it it was intentional, or not. My solution does return it.

Now, to answer your question I have worked out a query which runs a bit faster (when tested on my local sqlite it gave me 3ms, while running your original query took 11ms on average) and does not nest selects this much. It's a bit complicated, so I will explain afterwards. Here's the query:

SELECT id,
       timestamp,
       max(CASE WHEN col = 1 THEN item ELSE '' END) AS s1,
       max(CASE WHEN col = 2 THEN item ELSE '' END) AS s2,
       max(CASE WHEN col = 3 THEN item ELSE '' END) AS s3,
       max(CASE WHEN col = 4 THEN item ELSE '' END) AS s4,
       max(CASE WHEN col = 5 THEN item ELSE '' END) AS s5,
       max(CASE WHEN col = 6 THEN item ELSE '' END) AS s6,
       max(CASE WHEN col = 7 THEN item ELSE '' END) AS s7,
       max(CASE WHEN col = 8 THEN item ELSE '' END) AS s8,
       max(CASE WHEN col = 9 THEN item ELSE '' END) AS s9
  FROM (
       WITH RECURSIVE tmp (
               id,
               timestamp,
               item,
               data,
               col
           )
           AS (
               SELECT id,
                      timestamp,
                      substr(payload, 1, instr(payload, ',') - 1),
                      payload,
                      1
                 FROM thebasetable
               UNION ALL
               SELECT id,
                      timestamp,
                      substr(substr(data, instr(data, ',') + 1), 1, instr(substr(data, instr(data, ',') + 1), ',') - 1),
                      substr(data, instr(data, ',') + 1),
                      col + 1
                 FROM tmp
                WHERE instr(data, ',') > 0 AND 
                      col < 9
                ORDER BY 1
           )
           SELECT id,
                  timestamp,
                  item,
                  col
             FROM tmp
       )
 GROUP BY id,
          timestamp;

The query uses Common Table Expression (CTE). You can read more about it in SQLite's SQL syntax documentation (look for WITH statement).

The CTE part is this one:

   WITH RECURSIVE tmp (
           id,
           timestamp,
           item,
           data,
           col
       )
       AS (
           SELECT id,
                  timestamp,
                  substr(payload, 1, instr(payload, ',') - 1),
                  payload,
                  1
             FROM thebasetable
           UNION ALL
           SELECT id,
                  timestamp,
                  substr(substr(data, instr(data, ',') + 1), 1, instr(substr(data, instr(data, ',') + 1), ',') - 1),
                  substr(data, instr(data, ',') + 1),
                  col + 1
             FROM tmp
            WHERE instr(data, ',') > 0 AND 
                  col < 9
            ORDER BY 1
       )
       SELECT id,
              timestamp,
              item,
              col
         FROM tmp

What it does is it reads all rows with initial payload, gets "first" element from payload and adds col value equal to 1 for it. Then it passes payload to the next iteration of the CTE, but it cuts off the first element from payload, so the next iteration sees second element as it was first. It also increments the initial 1 value for each next iteration.

It goes through whole payload recurrently, shifting the first element for each iteration, until it reaches the end of payload (WHERE instr(data, ',') > 0).

I have also added the second condition to WHERE: col < 9 - this one controls how many columns will be extracted from the payload. The number should be equal to number of columns you will be reading from. If you set it to smaller number, then remaining columns will be empty in the results. If you set it to bigger number, it will do no harm, except the query will be a tiny bit slower, unnecessarily.

Finally, the CTE is enclosed in the SELECT which groups results from CTE by the ID and Timestamp, then getting values from rest of columns by detecting whether there is any value for the row, or not. It's hard to explain. It will be better if you execute the CTE part by yourself, see what it returns, then you will understand what the outer SELECT does.

Note - this solution requires SQLite 3.8.3, as that's the version when the CTE was introduced to SQLite.

CTE is a common feature in databases. It's supported by most popular databases (I've just looked up and it's present in MySQL, MS SQL, Oracle, PostgreSQL, so it looks quite nice).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, a quite fine solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.