1

Let's say we're using Django with a Postgres database.

I want to store a sequence of data like so:

  • Record 1: 1, 2, 3, 4, 5
  • Record 2: 7, 8, 2, 3, 1, 9, 6
  • Record 3: 4, 4, 3, 2

A couple points to note:

  • The sequence will always be one-dimensional.
  • A sequence can have redundant values.
  • A sequence can be of variable length.

So first thing; I want to store this information in the database. There are lots of ways I can accomplish this, so let's look into my querying requirements.

Let's say I have a query sequence 1, 2, 3. Now I want to identify the sequences that match this sequence. A match would meet one of the following cases:

  • Case A: The sequence contains the query 1, 2, 3 in that order.
    • In the example, Record 1 matches this.
  • Case B: The sequence contains the components of the query 1, 2, 3 in any order.
    • In the example, Record 1 and 2 match this.
  • Case C: The sequence contains some of the components of the query 1, 2, 3 in any order.
    • In the example, all records match this.

In a perfect world I'd like the results to be ranked such that:

  • Record 1 comes first (because it matches Case A, our highest-priority match)
  • Record 2 comes second (because it matches Case B, our mid-priority match)

  • Record 3 comes last (because it matches Case C, our low-priority match)

Can anyone recommend a method, library, or concept for storing this data in such as way that queries can be made relatively fast?

3
  • Since you’re using a relational database, you probably want to normalize these records: have a RecordValue table that matches a record ID (which could be a foreign key to a Record table, but you may not need that) and index to a value. Then you can write these queries as selects against that table. They may not be trivial (e.g. you may need to select all record IDs where the count of unique values that are 1, 2, or 3 is 3), but they will be doable. Commented May 11, 2018 at 17:08
  • You thinking something like a table with: sequence_identifier, value, position? With each value in each sequence being stored in its own record. That would probably work; I could query for matching values in sequential order and matching values in non-sequential order. And then case C could be managed by getting a count of matching results for each sequence. Commented May 11, 2018 at 17:11
  • have a look on from difflib import SequenceMatcher may be that will help you somehow. That worked for me in this my situation: stackoverflow.com/q/47682491/8495108 Commented May 11, 2018 at 17:34

1 Answer 1

1

If you are using django and postgres look into the ArrayField. They are queryable.

Sign up to request clarification or add additional context in comments.

3 Comments

Heh, this looks to be exactly what I need, and considerably more succinct than something I would have coded. Gonna give it some time to see what others think, but probably best answer. Thanks!
@Bryant Jackson but ArrayField does not sort values, as you wanted. they just store them.
Yeah, would miss on the partial matches, but I can probably work around that. Will bear in mind, excellent point!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.