1
$\begingroup$

I'm looking for some resources on what happens on the low-level with databases.

I've found that indexed data is looked up using b+ trees, but could not find anything on non-indexed columns.

Is there any algorithm better than fetching values of every row?

$\endgroup$

2 Answers 2

2
$\begingroup$

You do in principle have to fetch every row, yes, often called a (linear) scan. In analytics focused database systems there are a couple more common optimizations though:

  • Most analytical systems nowadays are columnar, instead of storing rows contiguously they store each column contiguously. This helps for scans because you only have to fetch the data for the columns your query is interested in from disk (or out-of-cache RAM).

  • Queries are optimized to minimize how often a linear scan needs to happen. For example when joining two tables on non-indexed columns instead of scanning over the second table for each element in the first, it builds a temporary hash table on one table before scanning the second in one go. Essentially building the optimal index on-the-fly.

  • Predicate push-down allows one to reduce the amount of data being passed around by 'pushing down' predicates such as (WHERE age > 12) into the scan, where the scan operator will only return the relevant values.

  • It is common to store columns in groups/chunks of data, where each group can be compressed and read individually. Compression reduces the amount of very expensive disk I/O in exchange for some cheap CPU time.

  • In addition, it's common to store cheap statistics for each group, such as average, sum, median, min, max, etc. Then for example if you're checking WHERE age > 12, and you see that the maximum for a group is 11, you can skip the whole group, or if you're aggregating a sum you can simply take the sum attribute for the whole group.

There's many more, but these are some from the top of my head.

$\endgroup$
0
$\begingroup$

Theoretically speaking, I am not aware of any such algorithm. A linear search or Full Table Scan is performed when the rdbms doesn't find an index. This is what MySQL documentation says

Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs.

However, there are steps taken to try optimizing this issue. MySQL uses InnoDB storage engine by default, which requires a table to have a primary key. If you don't specify one, it will automatically create one for you as a part of extended index(indexes managed by MySQL).

Another scenario arises when a query uses ORDER BY. When an index is available and can satisfy the order, it is used. In other cases, MySQL performs a filesort, which is a version of quicksort, as an extra step to satisfy the order by clause.

I have taken MySQL for reference but it should be similar in case of other databases.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.