Questions tagged [indexing]
The indexing tag has no summary.
71 questions
1
vote
3
answers
296
views
How to implement an DB index to allow intersection queries on a database which does not support compound keys? [closed]
I am creating a DB that indexes JSONs on top of a key value storage engine (LMDB or somewhat similar).
When a new JSON needs to be indexed, I will create an entry for each field (AKA, JSON key), for ...
0
votes
1
answer
226
views
Hash Indexes vs LSM trees with SSTables
I was reading Designing-Data Intensive Applications, and I am confused about the usage of LSM trees with SSTables.
The author talks about Hash Indexes and log files (written as segments which are ...
2
votes
2
answers
194
views
Do we have 2 logical query processings, one with indexes and one without indexes?
In the book "Inside Microsoft® SQL Server® 2008: T-SQL Programming" the behaviour of a sql query is explained. The following picture is taken from the book. I have some questions about the ...
1
vote
2
answers
155
views
Secondary indexes vs Using elastic search
When does it make sense to put data in elastic search vs creating secondary indexing on Primary datastore?
Elastic search with another primary store
Pros:
Primary datastore can be optimised for read ...
2
votes
3
answers
1k
views
How to keep track of Indices
I've discovered my biggest issue with practicing interview questions and writing software more generally is keeping track of indices in python, maybe partly because my first two languages were the 1-...
-2
votes
1
answer
212
views
Database index - Basic understanding
Trying to clarify my knowledge on databases and indexes. I'd like to know how exactly it works. So I have a few questions:
When indexing a table over column or set of columns, a new table is created ...
-1
votes
1
answer
270
views
A data structure / algorithm to combine search tree and hash table?
I have a two dimensional data with one dimension is ordered and another one is categorical, for example, country and city_age:
country
age
city
Italy
2773
Rome
Germany
784
Berlin
USA
397
New York
...
0
votes
1
answer
119
views
Why does SQL Server prefer to order this result set on a nonclustered index rather than the clustered index?
It's common knowledge that the order of records from a simple one-table query is not guaranteed to be in the order of the primary key/clustered index. Adding a simple ORDER BY is no problem of course, ...
-1
votes
2
answers
148
views
1D coordinate to 2D coordinates without defining a stride
I'm in a situation that basically boils down to storing values based on 2 ID's. The ID's are sparse, from different ID pools and pretty much unpredictable so the naive approach is to just store the ...
2
votes
1
answer
818
views
Alternatives for index arrays
I work on a C++ project where I am not really happy with the data structures. The question isn't that specific to C++, I think that I would face a similar issue in say Java or Python.
There are data ...
1
vote
1
answer
4k
views
Is it ok to use redis scan extensively?
In redis docs, it is stated that keys command should not be used in production, since it blocks other processes while executing, it is better to use scan iteration over all keys with some batch size.
...
-1
votes
5
answers
2k
views
Why is converting 0-indexed code to 1-indexed code non-trivial?
The disadvantages of 1-indexing are well-known. However, our hand is sometimes forced by our choice of language and we have to convert algorithms that were intended for a 0-indexed language to being 1-...
1
vote
1
answer
655
views
Custom File System Index/Cache - How to save index
I've got an extremely oniony(deep) folder structure which contains Appx 1,000,000 text-based files on a network share. Using windows search is extremely slow and unreliable. I've created some text ...
1
vote
2
answers
319
views
Dealing with complex uniqueness in MongoDB
I am creating a booking system that will allow users to make a reservations for whole days. When a user wants to initially make a reservation, they select the day(s) and then will have 10 minutes to ...
-3
votes
3
answers
1k
views
If data is stored in RAM, do we still need index into data?
Is the purpose of the indexing data structures to address the limitations of disks?
If data is stored in RAM, do we still need index into data? Thanks.
Question comes from Design Data Intensive ...
0
votes
2
answers
271
views
Is all permutations of a string a good way to index full text search?
I am writing an application to be used as a local disc documents store similar functionality to Firebase or MongoDB. The gist of how it works is a column hash table.
For example:
Say I have a user ...
2
votes
0
answers
37
views
How to build a HAMT with multi-index keys
By that I mean, forming a Hash Array Mapped Trie with 2 or more indexed fields, such as a User model by email and location name, or email + username + last logged in + isActive. Basically any ...
1
vote
0
answers
733
views
Performance impact of JPARepository save() on a large database table with index
We have a few tables with a large amount of data and with indexes on those tables to help in faster retrieval.
We are also using Spring Data JPA JpaRepository for adding data to those tables using the ...
2
votes
1
answer
988
views
Indexes on a SQL Server fact table
If I have a SQL Server fact table with four dimensions (OrderDate, Customer, Product, Region), my understanding is that it's best to create a non-clustered index per foreign key (dim key column in the ...
0
votes
1
answer
984
views
Limitations of Tries in Comparison to B-Trees for a Database
I am wondering about how range queries work, and the standard solution is to use B+trees. However, I am a fan of tries as a general data structure and would like to know if they (or variations of them)...
-2
votes
1
answer
103
views
Which data structure should I use for implementing a code indexer
I need to write a basic code indexer, which needs to be fast. Should I use an embedded SQLite database for this or should rather rely on a custom data structure, or even flat files as used by ctags?
...
5
votes
3
answers
4k
views
Is "Array[1]" the first element or second element in the array?
Following the reading of the question Why are zero-based arrays the norm?, I wonder about the terms to use for referring to specific array elements, in the perspective of linguistic reading of ...
3
votes
1
answer
3k
views
Why is elastic search popular? [closed]
Elastic search is basically about indexing of data.
In database world,
Multiple indexes can be created on a MongoDB collection
Collection in MongoDB can be schema-less.
In MongoDB, BSON encoding of ...
0
votes
3
answers
253
views
Strategy to update search index after fixing index generation
Describing the situation
I'm working on an application (based on the Spring Framework) using a search index (lucene if that matters) to make content of that application searchable. Documents are ...
1
vote
1
answer
829
views
What is an indexer?
The C# docs have a page on indexers, which appears to use "indexer" to refer to the construct required to enable instances of a class to be accessed via square bracket notation.
Indexers allow ...
0
votes
0
answers
485
views
Managing an index of file metadata on a network share
What is a good way of setting up a "shared index" of file metadata, when there can be no shared process such as a database server?
I'll explain the scenario: A server contains M (say 10000) large ...
3
votes
0
answers
231
views
What is a good way to structure a timespan-oriented index?
I'm finishing off work on a complex rule-engine for Hotel Rates with real-time queries. There is a lot of conditions addressing intersections of periods for options, restrictions and policies. All of ...
7
votes
4
answers
2k
views
Can someone explain the technicalities of MapReduce in layman's terms?
When people talk about MapReduce you think about Google and Hadoop. But what is MapReduce itself? How does it work? I came across this blog post that tries to explain just MapReduce without Hadoop, ...
5
votes
4
answers
4k
views
Approach for parsing and indexing very large files
I have been tasked with developing a web based (i.e runs in browser) viewer for a proprietary log file.
I have no control over the format of the logs, I just consume them. The log file contains ...
1
vote
2
answers
734
views
Use (numeric) IDs over names as unique key?
I have a set of data (assume they are objects) with unique immutable names, like this:
class Datum {
final string name
// other fields
}
Considering that:
I don't need to support rename. (The ...
11
votes
2
answers
12k
views
Is indexing foreign keys a good practice?
Looking at DB tables created by a different developer I have noticed that whenever a table had a forein_key_id field/column, it was always an INDEX/KEY.
I am not sure if it was manually created, or ...
2
votes
1
answer
135
views
Indexing Algorithm That Avoids Duplicate Lookups On An Uncontrollable Dataset
How to index a massive, randomly selected, uncontrollable, constantantly changing dataset?
Imagine you want to index all of the snow particles in a giant snowglobe that is constantly being shaken. ...
1
vote
2
answers
601
views
Creating a web application with full text search on dynamic data
Even after thorough requirements engineering we end up with users wanting to attach 'notes' to their otherwise well-structured data records, in other words: arbitrary key-value pairs. Their primary ...
8
votes
4
answers
4k
views
What is the origin of counting from zero in programming languages?
This is a question which I have wondered (and been asked) about for a long time.
In (most? all?) programming languages, an index begins at zero for an array, string, etc. I recognize it became ...
0
votes
1
answer
226
views
Best Azure Solution for Complex Search Index
I need to perform quick searches against a combination of tags while including date ranges:
Example:
Users
who have requested notifications
who did not respond to a notification sent at least 3 days ...
3
votes
1
answer
314
views
Efficient Value-Lookup in List of k-Tuples
I've encountered a problem in a personal project that I think could be solved by a particular data structure but I'm not sure what. The problem is as follows:
Given a set of k-tuples, provide an ...
-1
votes
2
answers
965
views
Does sql use hastables for indexes? [closed]
Are there other ways of indexing?,
Which are the more used?
Does sql have an standard for indexes, it uses hash tables?
1
vote
3
answers
3k
views
Bits - Least-Significant/Lowest is 0th or 1st; zero or one indexed
Question
Is there a rough consensus if the bitmask 0x01 is properly said to have the "zeroth" bit set, or the "first" bit set?
If there isn't rough consensus that there's a generally right answer, ...
-1
votes
1
answer
100
views
Is there a generally accepted definition of "Secondary Index" independent of DB product? [closed]
Is the definition for "Secondary Index" anything more specific than just "Any index that is not the primary index"?
EDIT: Here is some research I have done:
Search Google. I evaluated the first 20 ...
1
vote
1
answer
141
views
Search by Location and Keywords [closed]
I have data set with ~5M entries/rows (~2GB). Every entry contains a location field (lat/lon coordinate-pair) and n keyword fields (keyword-1, keyword-2, ... keyword-n). The keyword-fields can all ...
1
vote
0
answers
218
views
Saving and retrieving multiple cached json/txt data files
I am working on some numerical programming and need to generate the results for a model given a variety of input parameters. Since the model takes a while to run, I was planning to save the data to ...
25
votes
8
answers
6k
views
Should my sequential collection start at index 0 or index 1?
I am creating an object model for a device that has multiple channels. The nouns used between the client and I are Channel and ChannelSet. ("Set" isn't semantically accurate, because it's ordered ...
-1
votes
1
answer
158
views
node deep file indexer module cannot go deep [closed]
I'm trying to make a file indexer by node.js. The program is supposed to index files and folders in an array and also check inside folders and add all sub folders and files
I wrote:
fs.readdir(...
3
votes
2
answers
227
views
Efficient range search for pair of numbers
Assume we have large list of pairs:
struct {x: double, y: double} pair;
vector<pair>
What is the most effective way to find all pairs where (x1 < x < x2) AND (y1 < y < y2)?
O(n) is ...
2
votes
1
answer
2k
views
Finding the closest n points to any arbitrary point in two dimensions (r-tree, quadtree, spatial index)
I have a distribution of two dimensional point objects. How is it possible to find the nearest N number of points to any given point without iterating over the entire collection of points (and only ...
12
votes
5
answers
2k
views
How come the computer doesn't have to read the entire table when the column is indexed? [duplicate]
Let's say a table with two columns has 100 quadrillion records. And I want to find a record that has column #2 equal something.
If column #2 is indexed it returns the result immediately, but if it's ...
1
vote
1
answer
112
views
Difficulty in deciding correct data structure
Two objects are interacting (Object Alpha, Object Beta)
Each contain a point (x,y) which will be used to make comparisons, among other things.
Object Alpha's point (x,y) attribute is dynamic and ...
62
votes
10
answers
7k
views
Is it premature optimization to add database indices?
A colleague of mine today suggested that we go through all of the queries in our application and to add indices accordingly.
I feel this is premature optimisation because our application is not even ...
0
votes
1
answer
913
views
Full Text Indexing Strategy for MS Excel Documents
Background
As part of a broader application that allows users to search thousands of MS Office documents on a private network, I need to index and make searchable Microsoft Excel files.
My basic ...
12
votes
5
answers
3k
views
Quadtree with duplicates
I'm implementing a quadtree. For those who don't know this data structure, I am including the following small description:
A Quadtree is a data structure and is in the Euclidean plane
what an ...