2

I have a multi-million row table with a varchar field. Some of the rows of the varchar are floating point numbers. There is no constraint on other fields that can completely isolate which rows are numbers and which are not. I want to create queries with an ORDER BY on the rows with the numeric value in the varchar field (ignoring non-numerics). I can't just simply call MyField::numeric because it pukes on the non-numeric rows.

I thought of two basic options:
(a) Use a regex to determine if the row is numeric.
(b) Catch the cast exception and return a null for all non-numeric values.

Speed is crucial. I tried option (a) and it is painfully slow. I created a stored procedure to use a regex to check the value before the cast. Non-numerics return as null. I created an index using that stored procedure. The regex is so expensive. But I'm wondering if catching an exception is even worth the effort.

Is there a simple way to have MyField::numeric return nulls for non-numeric data? Any suggestions to make this run faster?

Thanks

2 Answers 2

1

Is the schema fixed, or can you change it? In particular could you add another (nullable) column to store the floating point value, if any? Then a trigger for insert/update could make sure that the numeric column always has the right value. This assumes that you're going to be querying more often than inserting/updating of course.

Sign up to request clarification or add additional context in comments.

3 Comments

Yeah, I thought about that. Would that be a lot of wasted space for the nulls? I was also thinking of a table with a OneToOne relation that would be populated on a trigger. There are criteria I could use on other fields to tell if a value will never be numeric.
I don't know how Postgres stores nulls... you should probably read up on the exact table structure or just experiment.
Storing a numeric datatype seems to have worked. I think it's worth the extra storage. Partial indexes the most. Now my queries went from 7 minutes to 20 seconds..not bad.
1

For speed, I would go with the solution of keeping a separate column with a numeric type that is updated by a trigger. Nulls don't waste any space. Otherwise, the solution with a stored procedure (or a case expression should be enough) that checks the value and then casts it sounds right. Catching exceptions will probably be the most expensive solution of them all.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.