Database index - Basic understanding

Question

Trying to clarify my knowledge on databases and indexes. I'd like to know how exactly it works. So I have a few questions:

When indexing a table over column or set of columns, a new table is created that holds "indexes". What exactly is stored in that table? A pointer to the row related to the index & some integer value calculated from the column values (similar to hashcodes)? I have bolded integer, because I want to know how exactly does an index look like.
Are updating, deleting & inserting the only operations that trigger recalculating an index?
Is only the index related to the row recalculated and the whole table sorted afterwards, or are there more calculations made?

One thing that most definitely doesn't happen is that the whole table is sorted. The entire point of indexing is to be able to find specific rows quickly without having to put them into specific places. — Kilian Foth
– Kilian Foth, Commented Feb 22, 2022 at 22:16
Something to be clear: Tables are a high level abstraction. The list of rows is not the data structure stored directly on disk, with various other details involved to enable performant data manipulation. Once you understand the difference between the abstraction and the storage format, hopefully that will help with the conceptual issue behind this question. — user1937198
– user1937198, Commented Feb 22, 2022 at 22:27
@user1937198, surely there is an equally feasible abstraction that describes indexes? — Steve
– Steve, Commented Feb 23, 2022 at 6:56
@Steve Indexes are arguably a part of the table abstraction (especially on systems with clustered index support) — user1937198
– user1937198, Commented Feb 23, 2022 at 8:19

user10489 · Accepted Answer · 2022-02-23 05:56:49Z

2

An index isn't a table, it is an index. There are multiple kinds of indexes, some ordered, some not. An index might internally be a hash table, a btree, a balanced tree, or something completely different. An index will not be a sorted linear list, as those are too expensive to update. Tables are not sorted, because there's no point. That's what indexes are for.

What is stored in an index is likely the key used to generate the index and a reference to the row it is from. Tables are not arrays, so indexes probably won't be storing integers to reference the rows. How an index references rows in a table is an implementation detail that is different in every database and will be unique to the way the database stores its tables.

Also, an index is unlikely to store a hash, as the hash can be generated from the key and the hash is only used to find the key in the index. If it is an ordered index, it wouldn't need a hash.

Any operation that affects the fields in a table that are the keys the index uses will cause the index to be updated. Generally, indexes are NOT recalculated -- they are just incrementally updated, and they use data structures that support that. However, there may also be some other index specific and whole table operations (like create index or repair table) that may cause an index to be calculated for every row. Also, some databases support temporarily disabling indexes for mass data inserts, and then recalculate the index once when the operation completes.

answered Feb 23, 2022 at 5:56

user10489

3902 silver badges11 bronze badges

I haven't put much thought into this before now, but is there such a thing as a "disordered index"? Surely the fundamental point of an index is to improve efficiency by maintaining a particular order?

Steve
– Steve

2022-02-23 07:04:53 +00:00
Commented Feb 23, 2022 at 7:04
@Steve An index implemented as a hash table would not provide a useful order. You would need an ordering for efficient range queries (e.g. SELECT * WHERE foo > 0) but a hash index is still perfectly suitable for indexing a unique or primary key (e.g. SELECT * WHERE id = 123) and consequently also for joins. But in databases, hash tables rarely have advantages over btrees which are sorted.

amon
– amon

2022-02-23 08:02:53 +00:00
Commented Feb 23, 2022 at 8:02
Hash is extremely useful for unique() because it is O(1) where a btree is going to be O(log(N)) at best. Unique doesn't care about sort order.

user10489
– user10489

2022-02-23 12:26:37 +00:00
Commented Feb 23, 2022 at 12:26
@amon, the hash order is useful for the hash itself, presumably? I don't know the full details of how hashes work, but again I can't see how the index could possibly work effectively if its entries were fundamentally disordered.

Steve
– Steve

2022-02-23 16:50:50 +00:00
Commented Feb 23, 2022 at 16:50
@Steve: You need to look up hashes somewhere else (like wikipedia) and then maybe you will understand both why an unordered index is useful and why hashes are not useful for an index that needs to be ordered.

user10489
– user10489

2022-02-24 02:15:25 +00:00
Commented Feb 24, 2022 at 2:15

| Show 4 more comments

Stack Exchange Network

Database index - Basic understanding

1 Answer 1

Your Answer

Hot Network Questions

Database index - Basic understanding

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions