0

I'm trying to determine whether a user already exists in my database. I know one way to do this is by running:

SELECT * FROM users WHERE email = $1

and checking if the number of rows is greater than 0 or now. However, I know that a more efficient way to run this command is by using the "EXISTS" keyword because it doesn't need to run through all the rows in the database. However, running

EXISTS (SELECT 1 FROM users WHERE email = $1)

yields

error: syntax error at or near "EXISTS"

I've also tried simply running

SELECT 1 FROM users WHERE email = $1

as this should have the same efficiency optimizations but it doesn't output any row data.

I'm using the "pg" driver. Any help is greatly appreciated. Thank you in advance!

1
  • Thank you @wildplasser! Saved me so much time. Commented Feb 17, 2021 at 15:04

2 Answers 2

1

EXISTS(...) is an operator, not a command :


SELECT EXISTS (SELECT 1 FROM users WHERE email = $1) AS it_does_exist; 

EXISTS(...) yields a boolean; its argument is some kind of expression: (most often) a table-expression.

EXISTS(...) only checks its argument for non-emptiness; EXISTS(SELECT * FROM users WHERE email = $1); would give the same result.

Sign up to request clarification or add additional context in comments.

5 Comments

This is absolutely the correct answer! Do you know why "SELECT 1 FROM users WHERE email = $1" doesn't return an object with the row that contains that email?
It selects 1, which is just a numeric literal. EXISTS() only checks its argument for non-emptiness; EXISTS(SELECT * FROM ...) would give the same result.
I don't seem to understand. To be clear, SELECT 1 returns an array with "?column?" as the only property whereas SELECT * returns the actual row.
That is a different question. Maybe ask another question? (or do some reading)
1 is a literal, it does not have a (column)name. If you want a name, try select 1 AS one;
1

You won't gain much performance by omitting columns from the select list unless your table has many or some of them contain a lot of data. A little benchmark on localhost using python, averaged over 1000 query executions...

57 µs SELECT * FROM foo WHERE login=%s
48 µs SELECT EXISTS(SELECT * FROM foo WHERE login=%s)
40 µs SELECT 1 FROM foo WHERE login=%s
26 µs EXECUTE myplan(%s) -- using a prepared statement

And the same over a gigabit network:

499 us SELECT * FROM foo WHERE login=%s
268 us SELECT EXISTS(SELECT * FROM foo WHERE login=%s)
272 us SELECT 1 FROM foo WHERE login=%s
278 us EXECUTE myplan(%s)

Most of that is network latency, which is variable, which means it's difficult to tell which query executes fastest on the database itself when benchmarking it over a network with such very small queries. If the queries took longer, that would be another story.

It also shows that ORMs and libraries that use prepared statements and do several network roundtrips to prepare then execute queries should be avoided like the plague. Postgres has a specific protocol PQexecParams using bind+execute in a single message to avoid this and get the advantages of prepared queries (no SQL injection) with no speed penalty. The client library should use it. If you do a lot of small queries this could be significant.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.