0

I am trying to figure out a good way to handle blacklists for words via a MySQL database. I have hit a roadblock when it comes to handling the data returned from the database.

cursor.execute('SELECT word FROM blacklist')
blacklist1 = []
for word in cursor.fetchall():
   if word in blacklist1:
      return
   else:
      blacklist1.append(word)

The above code is what I am using to pull the info which I know works. However, I need some help with converting this:

[('word1',), ('word2',), ('word3',), ('word4',), ('word5',)]

into this:

['word1', 'word2', 'word3', 'word4', 'word5']

my biggest issue is that I need it to scale so that it will check each word within the blacklist from no words to several thousand if necessary. I know a for loop would work when it comes to checking them versus the message it checks. but I know I will not be able to check the words till it is a normal list. any help would be appreciated.

3 Answers 3

1

In each iteration of for word in cursor.fetchall(), the variable word is a tuple, or a collection of values. This is documented here.

These correspond to each column returned, i.e. if you had a second column in your select statement ('SELECT word, replacement FROM blacklist') you would get tuples of two elements.

Use a set, and add the one and only element of the tuple, instead of the tuple itself:

for word_tuple in cursor.fetchall():
  blacklist1.add(word[0])

Looking at the code more closely, if word in blacklist1: return may be a logical error - as soon as you see a duplicate, you'll stop reading rows from the database. You were likely looking to just skip that duplicate - you don't actually need that logic anymore because sets automatically remove duplicates.

Sign up to request clarification or add additional context in comments.

1 Comment

I can see why there would be a logical error and using it as a set is much easier. thankyou for pointing this out.
0

Your list currently contains one element tuples. If you want to extract the strings you could try this:

blacklist1 = []
for word_tuple in cursor.fetchall():
   if word_tuple[0] in blacklist1:
      return
   else:
      blacklist1.append(word_tuple[0])

For your use case you might also benefit from having blacklist1 be a set, that way you can check for membership in O(1) time:

blacklist1 = set()
for word_tuple in cursor.fetchall():
   if word_tuple[0] in blacklist1:
      return
   else:
      blacklist1.add(word_tuple[0])

2 Comments

How would I use it as a set, using the set() function?
In the second code snippet above, we instantiate a new set using set(). You can find more documentation on sets here. As mentioned above, you can determine whether a set contains an element in constant time. You can do this using in just like with a list. You mentioned this blacklist could get pretty long so I think a set will be beneficial.
0

First, your actual problem is that the cursor is a wrapper of an iterator over rows returned from MySQL, so it can be operated on similarly to a list of tuples. That being said, my advice would be to split your "business" logic from your data access logic. This might seem trivial but it will make debugging much easier. The overall approach will look like this:

def get_from_database():
    cursor.execute('SELECT word FROM blacklist')
    return [row[0] for row in cursor.fetchall()]

def get_blacklist():
    words = get_from_database()
    return list(set(words))

In this approach, get_from_database retrieves all the words from MySQL and returns them in the format your program needs. get_blacklist encapsulates this logic and also makes the returned list unique. So now, if there's a bug, you can verify each independently.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.