2

New commit, works locally but not in production. The messages are strange.

With uwsgi everything starts as usual. But on the first hit I send the server, everything freezes (I mean everything, even the SSH session stops responding). Eventually one of the workers die and the server starts responding again. With ./manage.py runserver this happens:

$ ./manage.py runserver
Performing system checks...

Killed

That's it. The only line in the commit is at the beginning of views.py

import my_module

Again, this works locally but not in production. I can't give you too much detail about my_module, but it's basically a very long list inside a class with a method that does a binary search.

My suspicion is that it has to do with how big the thing is. The my_module.py file is 62MB.

How can I find out what's happening?

EDIT: Tried ./manage.py runserver and then dmesg, twice:

[  372.911491] python[1692]: segfault at 24 ip 0000000000558077 sp 00007f6624b70880 error 6 in python2.7[400000+2bc000]
[  414.833167] python[1729]: segfault at 24 ip 0000000000558077 sp 00007f9f17cbc880 error 6 in python2.7[400000+2bc000]
[  414.837098] Core dump to |/usr/share/apport/apport 1726 11 0 1726 pipe failed
19
  • Why are you running runserver in production? Commented Feb 2, 2017 at 23:24
  • @LegoStormtroopr I'm not, I'm using uwsgi + nginx. I just used runserver because I want to make it closer to dev, since I know that commit works perfectly in dev. Commented Feb 2, 2017 at 23:32
  • 3
    Try run dmesg and see what the last line says, it may say something about running out of memory? Commented Feb 3, 2017 at 0:04
  • @davidejones updated question with an edit Commented Feb 3, 2017 at 1:19
  • The most likely contender (in my experience) is the OOM (out of memory) killer. Are you sure you have enough memory available? Commented Feb 3, 2017 at 1:21

1 Answer 1

0

Here's an example on how to use sqlite to solve your problem. The searches are indexed so they're about as fast as you can get them:

import os
import random
import sqlite3
import contextlib
from datetime import datetime


@contextlib.contextmanager
def get_connection():
    connection = sqlite3.connect('database.db')
    try:
        yield connection
        connection.commit()
    finally:
        connection.close()


@contextlib.contextmanager
def get_cursor():
    with get_connection() as connection:
        cursor = connection.cursor()
        try:
            yield cursor
        finally:
            cursor.close()


def initial_create():
    # read your giant array from a file here

    with get_cursor() as cursor:
        cursor.executescript('''
        BEGIN;
        CREATE TABLE big_list (
            a int,
            b int,
            value text
        );

        CREATE INDEX big_list__a_b_idx ON big_list (a, b);
        COMMIT;
        ''')

        for i in range(1000):
            a = random.randint(0, 1000000)
            inserts = []

            for j in range(1000):
                b = random.randint(0, 1000000)

                inserts.append((
                    a,
                    b,
                    'some string (%d,%d)' % (a, b),
                ))

            cursor.executemany('INSERT INTO big_list VALUES (?,?,?)', inserts)


def test():
    with get_cursor() as cursor:
        print 'Total rows: %d' % (
            cursor.execute('SELECT COUNT(*) FROM big_list').fetchone())

        print 'Exact searches:'
        for i in range(10):
            start = datetime.now()

            result = cursor.execute('''
            SELECT *
            FROM big_list
            WHERE a = %d
            AND b = %d
            ''' % (i * 1000, i * 10000))
            print 'Got results: %r in' % result.fetchall(),
            print datetime.now() - start

        print 'Ranged searches:'
        for i in range(10):
            start = datetime.now()

            result = cursor.execute('''
            SELECT *
            FROM big_list
            WHERE a BETWEEN %d AND %d
            AND b BETWEEN %d AND %d
            ''' % (i * 1000, (i + 1) * 1000, i * 1000, (i + 1) * 1000))
            print 'Got results: %r in' % result.fetchall(),
            print datetime.now() - start


if __name__ == '__main__':
    if not os.path.isfile('database.db'):
        print 'Creating database, this should only be needed once'
        initial_create()

    test()

Example output:

Total rows: 1000000
Exact searches:
Got results: [] in 0:00:00.000113
Got results: [] in 0:00:00.000055
Got results: [] in 0:00:00.000044
Got results: [] in 0:00:00.000044
Got results: [] in 0:00:00.000043
Got results: [] in 0:00:00.000045
Got results: [] in 0:00:00.000043
Got results: [] in 0:00:00.000041
Got results: [] in 0:00:00.000044
Got results: [] in 0:00:00.000041
Ranged searches:
Got results: [(604, 31, u'some string (604,31)'), (604, 386, u'some string (604,386)')] in 0:00:00.000889
Got results: [(1142, 1856, u'some string (1142,1856)')] in 0:00:00.000538
Got results: [] in 0:00:00.000056
Got results: [(3802, 3983, u'some string (3802,3983)')] in 0:00:00.000482
Got results: [] in 0:00:00.000165
Got results: [] in 0:00:00.000047
Got results: [] in 0:00:00.000164
Got results: [(7446, 7938, u'some string (7446,7938)'), (7947, 7381, u'some string (7947,7381)')] in 0:00:00.000867
Got results: [(8003, 8174, u'some string (8003,8174)')] in 0:00:00.000501
Sign up to request clarification or add additional context in comments.

7 Comments

Hi Wolph, thanks for this. These aren't exactly the queries I was describing though. Forget about the exact searches now. Your ranged searches are WHERE a BETWEEN %d AND %d AND b BETWEEN %d AND %d. What I described would be something like WHERE %d BETWEEN a AND b. Does this syntax do what's expected? If so I'll try it in MySQL (which I'm using) and then mark your answer correct.
Hi again. I just tried it, it works. I've also re-read my question and noticed that actually I didn't describe any of it well (I was thinking about our discussion in the comments). Should I open a new question which asks what I asked in the comments and you'll post this answer there, additionally replacing WHERE a BETWEEN %d AND %d AND b BETWEEN %d AND %d with WHERE %d BETWEEN a AND b?
I've just tested my query and it's not good enough. With a and b being 32bit ints and 700k rows, each query like WHERE %d BETWEEN a AND b takes half a second. It's not obvious how indexes on a and b would even be used for that query, right? I suspect that the DB is using the index on a to find a few thousands results and then just scroll row by row for matches on b...
To me the above is reinforced by the fact that if I pick a very small or very big int, the queries are super fast, but if I pick one in the middle of the 32bit range they take about half a second.
BTW Wolph what I ended up doing was follow your initial suggestion. I put it all in one file, forced line size to be constant by padding at the end with spaces, and did a binary search. If I remember correctly my tests at the time showed that it would take 1.5 secs to do 10k searches.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.