2

so I run my mysql scripts through a python wrapper, the important parts are here though. My goal through this was to see which was faster, to make two separate select DISTINCT() calls and find the ones the ones in list a that are not in list b. I also did this using subqueries.

I found that the subqueries were much faster (as hoped) however, the results it yielded were wrong.

starttime1 = time.time()
LoLs.cursor.execute("SELECT DISTINCT(Id) FROM participants")
drop = cursor.fetchall()
drop = [x[0] for x in drop]
LoLs.cursor.execute("SELECT DISTINCT(Id) FROM history")
exist = cursor.fetchall()
exist = [x[0] for x in exist]
drop1 = list(set(drop)-set(exist))
endtime1 = time.time()
print len(drop1)

print endtime1 - starttime1

this results in:

545594
205.518273115

as expected.

However:

starttime2 = time.time()
cursor.execute("SELECT DISTINCT(Id) FROM participants where Id NOT IN (SELECT DISTINCT(Id) FROM history)")
drop2 = cursor.fetchall()
drop2 = [x[0] for x in drop2]
drop2 = list(set(drop2))
endtime2 = time.time()
print len(drop2)

print endtime2 - starttime2

results in:

427333
185.753734112

It's much faster, awesome, but it results in only 427333 entries instead of 545594.

as a note i did a set difference for the two end lists print len(list(set(drop1)-set(drop2)))

results in: 118261

so there are 118261 unique entries in the first list that are not in the second. I even took some of these and checked if they were in the history table manually, and they were, so the subquery should've caught them.

NOTE March 6th: changed:

 cursor.execute("SELECT DISTINCT(Id) FROM participants where summonerId NOT IN (SELECT DISTINCT(Id) FROM history)")

to

 cursor.execute("SELECT DISTINCT(Id) FROM participants where Id NOT IN (SELECT DISTINCT(Id) FROM history)")

to accurately reflect the real problem.

1 Answer 1

1

NOT IN queries remove nulls. Try instead:

cursor.execute("""SELECT DISTINCT(Id) 
                  FROM participants p
                  WHERE NOT EXISTS (SELECT 1
                                    FROM history h
                                    WHERE h.Id = p.summonerId)""")
Sign up to request clarification or add additional context in comments.

1 Comment

still the same issue because both id's are required non null. so nulls don't exist

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.