Skip to main content

Questions tagged [indexing]

Filter by
Sorted by
Tagged with
1 vote
3 answers
296 views

I am creating a DB that indexes JSONs on top of a key value storage engine (LMDB or somewhat similar). When a new JSON needs to be indexed, I will create an entry for each field (AKA, JSON key), for ...
Samuel E.'s user avatar
  • 233
0 votes
1 answer
226 views

I was reading Designing-Data Intensive Applications, and I am confused about the usage of LSM trees with SSTables. The author talks about Hash Indexes and log files (written as segments which are ...
Ufder's user avatar
  • 254
2 votes
2 answers
194 views

In the book "Inside Microsoft® SQL Server® 2008: T-SQL Programming" the behaviour of a sql query is explained. The following picture is taken from the book. I have some questions about the ...
jwa's user avatar
  • 29
1 vote
2 answers
155 views

When does it make sense to put data in elastic search vs creating secondary indexing on Primary datastore? Elastic search with another primary store Pros: Primary datastore can be optimised for read ...
best wishes's user avatar
2 votes
3 answers
1k views

I've discovered my biggest issue with practicing interview questions and writing software more generally is keeping track of indices in python, maybe partly because my first two languages were the 1-...
JoeTheShmoe's user avatar
-2 votes
1 answer
212 views

Trying to clarify my knowledge on databases and indexes. I'd like to know how exactly it works. So I have a few questions: When indexing a table over column or set of columns, a new table is created ...
Maciaz's user avatar
  • 113
-1 votes
1 answer
270 views

I have a two dimensional data with one dimension is ordered and another one is categorical, for example, country and city_age: country age city Italy 2773 Rome Germany 784 Berlin USA 397 New York ...
Dims's user avatar
  • 157
0 votes
1 answer
119 views

It's common knowledge that the order of records from a simple one-table query is not guaranteed to be in the order of the primary key/clustered index. Adding a simple ORDER BY is no problem of course, ...
Jacob Stamm's user avatar
-1 votes
2 answers
148 views

I'm in a situation that basically boils down to storing values based on 2 ID's. The ID's are sparse, from different ID pools and pretty much unpredictable so the naive approach is to just store the ...
user81993's user avatar
  • 221
2 votes
1 answer
818 views

I work on a C++ project where I am not really happy with the data structures. The question isn't that specific to C++, I think that I would face a similar issue in say Java or Python. There are data ...
Martin Ueding's user avatar
1 vote
1 answer
4k views

In redis docs, it is stated that keys command should not be used in production, since it blocks other processes while executing, it is better to use scan iteration over all keys with some batch size. ...
ogbofjnr's user avatar
  • 121
-1 votes
5 answers
2k views

The disadvantages of 1-indexing are well-known. However, our hand is sometimes forced by our choice of language and we have to convert algorithms that were intended for a 0-indexed language to being 1-...
J. Mini's user avatar
  • 1,015
1 vote
1 answer
655 views

I've got an extremely oniony(deep) folder structure which contains Appx 1,000,000 text-based files on a network share. Using windows search is extremely slow and unreliable. I've created some text ...
GisMofx's user avatar
  • 379
1 vote
2 answers
319 views

I am creating a booking system that will allow users to make a reservations for whole days. When a user wants to initially make a reservation, they select the day(s) and then will have 10 minutes to ...
Matthew Weeks's user avatar
-3 votes
3 answers
1k views

Is the purpose of the indexing data structures to address the limitations of disks? If data is stored in RAM, do we still need index into data? Thanks. Question comes from Design Data Intensive ...
Tim's user avatar
  • 5,565
0 votes
2 answers
271 views

I am writing an application to be used as a local disc documents store similar functionality to Firebase or MongoDB. The gist of how it works is a column hash table. For example: Say I have a user ...
Sam Orozco's user avatar
2 votes
0 answers
37 views

By that I mean, forming a Hash Array Mapped Trie with 2 or more indexed fields, such as a User model by email and location name, or email + username + last logged in + isActive. Basically any ...
user10869858's user avatar
1 vote
0 answers
733 views

We have a few tables with a large amount of data and with indexes on those tables to help in faster retrieval. We are also using Spring Data JPA JpaRepository for adding data to those tables using the ...
phoenixSid's user avatar
2 votes
1 answer
988 views

If I have a SQL Server fact table with four dimensions (OrderDate, Customer, Product, Region), my understanding is that it's best to create a non-clustered index per foreign key (dim key column in the ...
MAK's user avatar
  • 23
0 votes
1 answer
984 views

I am wondering about how range queries work, and the standard solution is to use B+trees. However, I am a fan of tries as a general data structure and would like to know if they (or variations of them)...
Lance Pollard's user avatar
-2 votes
1 answer
103 views

I need to write a basic code indexer, which needs to be fast. Should I use an embedded SQLite database for this or should rather rely on a custom data structure, or even flat files as used by ctags? ...
BigONotation's user avatar
5 votes
3 answers
4k views

Following the reading of the question Why are zero-based arrays the norm?, I wonder about the terms to use for referring to specific array elements, in the perspective of linguistic reading of ...
profaisal's user avatar
3 votes
1 answer
3k views

Elastic search is basically about indexing of data. In database world, Multiple indexes can be created on a MongoDB collection Collection in MongoDB can be schema-less. In MongoDB, BSON encoding of ...
user1787812's user avatar
0 votes
3 answers
253 views

Describing the situation I'm working on an application (based on the Spring Framework) using a search index (lucene if that matters) to make content of that application searchable. Documents are ...
lucash's user avatar
  • 288
1 vote
1 answer
829 views

The C# docs have a page on indexers, which appears to use "indexer" to refer to the construct required to enable instances of a class to be accessed via square bracket notation. Indexers allow ...
Ninjakannon's user avatar
0 votes
0 answers
485 views

What is a good way of setting up a "shared index" of file metadata, when there can be no shared process such as a database server? I'll explain the scenario: A server contains M (say 10000) large ...
Anders Forsgren's user avatar
3 votes
0 answers
231 views

I'm finishing off work on a complex rule-engine for Hotel Rates with real-time queries. There is a lot of conditions addressing intersections of periods for options, restrictions and policies. All of ...
Julian's user avatar
  • 264
7 votes
4 answers
2k views

When people talk about MapReduce you think about Google and Hadoop. But what is MapReduce itself? How does it work? I came across this blog post that tries to explain just MapReduce without Hadoop, ...
Eddie Bravo's user avatar
5 votes
4 answers
4k views

I have been tasked with developing a web based (i.e runs in browser) viewer for a proprietary log file. I have no control over the format of the logs, I just consume them. The log file contains ...
Matt's user avatar
  • 255
1 vote
2 answers
734 views

I have a set of data (assume they are objects) with unique immutable names, like this: class Datum { final string name // other fields } Considering that: I don't need to support rename. (The ...
SOFe's user avatar
  • 728
11 votes
2 answers
12k views

Looking at DB tables created by a different developer I have noticed that whenever a table had a forein_key_id field/column, it was always an INDEX/KEY. I am not sure if it was manually created, or ...
Dennis's user avatar
  • 8,267
2 votes
1 answer
135 views

How to index a massive, randomly selected, uncontrollable, constantantly changing dataset? Imagine you want to index all of the snow particles in a giant snowglobe that is constantly being shaken. ...
user58446's user avatar
  • 327
1 vote
2 answers
601 views

Even after thorough requirements engineering we end up with users wanting to attach 'notes' to their otherwise well-structured data records, in other words: arbitrary key-value pairs. Their primary ...
A.M.'s user avatar
  • 111
8 votes
4 answers
4k views

This is a question which I have wondered (and been asked) about for a long time. In (most? all?) programming languages, an index begins at zero for an array, string, etc. I recognize it became ...
user avatar
0 votes
1 answer
226 views

I need to perform quick searches against a combination of tags while including date ranges: Example: Users who have requested notifications who did not respond to a notification sent at least 3 days ...
Rick Love's user avatar
  • 101
3 votes
1 answer
314 views

I've encountered a problem in a personal project that I think could be solved by a particular data structure but I'm not sure what. The problem is as follows: Given a set of k-tuples, provide an ...
geofflittle's user avatar
-1 votes
2 answers
965 views

Are there other ways of indexing?, Which are the more used? Does sql have an standard for indexes, it uses hash tables?
Luis Javier's user avatar
1 vote
3 answers
3k views

Question Is there a rough consensus if the bitmask 0x01 is properly said to have the "zeroth" bit set, or the "first" bit set? If there isn't rough consensus that there's a generally right answer, ...
mtraceur's user avatar
  • 269
-1 votes
1 answer
100 views

Is the definition for "Secondary Index" anything more specific than just "Any index that is not the primary index"? EDIT: Here is some research I have done: Search Google. I evaluated the first 20 ...
Trindaz's user avatar
  • 195
1 vote
1 answer
141 views

I have data set with ~5M entries/rows (~2GB). Every entry contains a location field (lat/lon coordinate-pair) and n keyword fields (keyword-1, keyword-2, ... keyword-n). The keyword-fields can all ...
BVN's user avatar
  • 81
1 vote
0 answers
218 views

I am working on some numerical programming and need to generate the results for a model given a variety of input parameters. Since the model takes a while to run, I was planning to save the data to ...
krishnab's user avatar
  • 179
25 votes
8 answers
6k views

I am creating an object model for a device that has multiple channels. The nouns used between the client and I are Channel and ChannelSet. ("Set" isn't semantically accurate, because it's ordered ...
kdbanman's user avatar
  • 1,457
-1 votes
1 answer
158 views

I'm trying to make a file indexer by node.js. The program is supposed to index files and folders in an array and also check inside folders and add all sub folders and files I wrote: fs.readdir(...
nikoss's user avatar
  • 109
3 votes
2 answers
227 views

Assume we have large list of pairs: struct {x: double, y: double} pair; vector<pair> What is the most effective way to find all pairs where (x1 < x < x2) AND (y1 < y < y2)? O(n) is ...
Sergey Alaev's user avatar
2 votes
1 answer
2k views

I have a distribution of two dimensional point objects. How is it possible to find the nearest N number of points to any given point without iterating over the entire collection of points (and only ...
easymoden00b's user avatar
12 votes
5 answers
2k views

Let's say a table with two columns has 100 quadrillion records. And I want to find a record that has column #2 equal something. If column #2 is indexed it returns the result immediately, but if it's ...
user1806244's user avatar
1 vote
1 answer
112 views

Two objects are interacting (Object Alpha, Object Beta) Each contain a point (x,y) which will be used to make comparisons, among other things. Object Alpha's point (x,y) attribute is dynamic and ...
easymoden00b's user avatar
62 votes
10 answers
7k views

A colleague of mine today suggested that we go through all of the queries in our application and to add indices accordingly. I feel this is premature optimisation because our application is not even ...
Marco de Jongh's user avatar
0 votes
1 answer
913 views

Background As part of a broader application that allows users to search thousands of MS Office documents on a private network, I need to index and make searchable Microsoft Excel files. My basic ...
Matt Cashatt's user avatar
  • 3,325
12 votes
5 answers
3k views

I'm implementing a quadtree. For those who don't know this data structure, I am including the following small description: A Quadtree is a data structure and is in the Euclidean plane what an ...
Pierre Arlaud's user avatar