How do SQL databases query non-indexed columns?

Question

I'm looking for some resources on what happens on the low-level with databases.

I've found that indexed data is looked up using b+ trees, but could not find anything on non-indexed columns.

Is there any algorithm better than fetching values of every row?

orlp · Accepted Answer · 2022-03-22 12:22:52Z

You do in principle have to fetch every row, yes, often called a (linear) scan. In analytics focused database systems there are a couple more common optimizations though:

Most analytical systems nowadays are columnar, instead of storing rows contiguously they store each column contiguously. This helps for scans because you only have to fetch the data for the columns your query is interested in from disk (or out-of-cache RAM).
Queries are optimized to minimize how often a linear scan needs to happen. For example when joining two tables on non-indexed columns instead of scanning over the second table for each element in the first, it builds a temporary hash table on one table before scanning the second in one go. Essentially building the optimal index on-the-fly.
Predicate push-down allows one to reduce the amount of data being passed around by 'pushing down' predicates such as (WHERE age > 12) into the scan, where the scan operator will only return the relevant values.
It is common to store columns in groups/chunks of data, where each group can be compressed and read individually. Compression reduces the amount of very expensive disk I/O in exchange for some cheap CPU time.
In addition, it's common to store cheap statistics for each group, such as average, sum, median, min, max, etc. Then for example if you're checking WHERE age > 12, and you see that the maximum for a group is 11, you can skip the whole group, or if you're aggregating a sum you can simply take the sum attribute for the whole group.

There's many more, but these are some from the top of my head.

Rinkesh P · Accepted Answer · 2022-03-22 10:13:42Z

Theoretically speaking, I am not aware of any such algorithm. A linear search or Full Table Scan is performed when the rdbms doesn't find an index. This is what MySQL documentation says

Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs.

However, there are steps taken to try optimizing this issue. MySQL uses InnoDB storage engine by default, which requires a table to have a primary key. If you don't specify one, it will automatically create one for you as a part of extended index(indexes managed by MySQL).

Another scenario arises when a query uses ORDER BY. When an index is available and can satisfy the order, it is used. In other cases, MySQL performs a filesort, which is a version of quicksort, as an extra step to satisfy the order by clause.

I have taken MySQL for reference but it should be similar in case of other databases.

Stack Exchange Network

How do SQL databases query non-indexed columns?

2 Answers 2

Your Answer

Hot Network Questions

How do SQL databases query non-indexed columns?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions