Return to Answer

answers to comments

Source Link

edited Jun 9, 2015 at 20:02

50.1k
3
130
211

4. Replies to comments

My point in §1.3 is that it can be hard learning to use a new interface. Imagine me at the Python prompt, trying to use the WordSearch class for the first time. I try creating an object:
```
 >>> w = WordSearch()
 >>> w
 <__main__.WordSearch object at 0x10442f278>
```

Now what? Did it work? How do I see the result? Well, maybe the display method will display it.

    >>> w.display()
    Traceback (most recent call last):
      File "cr92649.py", line 448, in display
        print('Words: '+', '.join(sorted(self.used_words.keys())))
    AttributeError: 'WordSearch' object has no attribute 'used_words'

Oh, so maybe I need to call generate first:

    >>> w.generate()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: generate() missing 1 required positional argument: 'num_iterations'

OK, I missed an argument. I suppose I could try 10 iterations:

    >>> w.generate(10)
    >>> w.display()
    Words:

Now compare with the experience of using my code in §3 above:

    >>> WordSearch()
    Traceback (most recent call last):
      File "cr92649.py", line 500, in __init__
        raise ValueError("neither word_list nor word_file specified")
    ValueError: neither word_list nor word_file specified

OK, I missed an argument. Let's try word_list:

    >>> WordSearch(word_list=['hello', 'world'])
    Grid:
        ..........
        .......d..
        .......l..
        .......r..
        .o.....o..
        ..l....w..
        ...l......
        ....e.....
        .....h....
        ..........
    Words:
        world at (5,7) up
        hello at (8,5) upleft

Success! The feedback is much more immediate in this version of the code and likely to lead to faster progress in using it.

In §1.4, what I mean is that kwargs should be reserved for the case where you do not know what all the keyword arguments are going to be. In the case where you do know the names of all the keyword arguments, then if you write:
```
 def __init__(self, width=10, height=10, difficulty=5):
```

this is shorter and clearer than:

    def __init__(self, width=10, height=10, **kwargs):
        difficulty = kwargs.pop('difficulty', 5)

and you get better error messages. For example, if you misspell the name of the keyword, then in the first version you get:

    >>> WordSearch(difficult=1)
    TypeError: __init__() got an unexpected keyword argument 'difficult'

but in the second version there's no error message: it appears to work, except that difficulty gets the value 5 when you meant it to be 1.

List lookups are cheap, so I doubt that the list of lists implementation makes much difference to the performance of the code. If you are concerned, then measure both and see what the difference is.

4. Replies to comments

My point in §1.3 is that it can be hard learning to use a new interface. Imagine me at the Python prompt, trying to use the WordSearch class for the first time. I try creating an object:
```
 >>> w = WordSearch()
 >>> w
 <__main__.WordSearch object at 0x10442f278>
```

Now what? Did it work? How do I see the result? Well, maybe the display method will display it.

    >>> w.display()
    Traceback (most recent call last):
      File "cr92649.py", line 448, in display
        print('Words: '+', '.join(sorted(self.used_words.keys())))
    AttributeError: 'WordSearch' object has no attribute 'used_words'

Oh, so maybe I need to call generate first:

    >>> w.generate()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: generate() missing 1 required positional argument: 'num_iterations'

OK, I missed an argument. I suppose I could try 10 iterations:

    >>> w.generate(10)
    >>> w.display()
    Words:

Now compare with the experience of using my code in §3 above:

    >>> WordSearch()
    Traceback (most recent call last):
      File "cr92649.py", line 500, in __init__
        raise ValueError("neither word_list nor word_file specified")
    ValueError: neither word_list nor word_file specified

OK, I missed an argument. Let's try word_list:

    >>> WordSearch(word_list=['hello', 'world'])
    Grid:
        ..........
        .......d..
        .......l..
        .......r..
        .o.....o..
        ..l....w..
        ...l......
        ....e.....
        .....h....
        ..........
    Words:
        world at (5,7) up
        hello at (8,5) upleft

Success! The feedback is much more immediate in this version of the code and likely to lead to faster progress in using it.

In §1.4, what I mean is that kwargs should be reserved for the case where you do not know what all the keyword arguments are going to be. In the case where you do know the names of all the keyword arguments, then if you write:
```
 def __init__(self, width=10, height=10, difficulty=5):
```

this is shorter and clearer than:

    def __init__(self, width=10, height=10, **kwargs):
        difficulty = kwargs.pop('difficulty', 5)

and you get better error messages. For example, if you misspell the name of the keyword, then in the first version you get:

    >>> WordSearch(difficult=1)
    TypeError: __init__() got an unexpected keyword argument 'difficult'

but in the second version there's no error message: it appears to work, except that difficulty gets the value 5 when you meant it to be 1.

List lookups are cheap, so I doubt that the list of lists implementation makes much difference to the performance of the code. If you are concerned, then measure both and see what the difference is.

Source Link

answered Jun 5, 2015 at 12:44

Gareth Rees

50.1k
3
130
211

1. Introduction

This is well-documented code but it is complex and awkward to use. Some work is needed to make it simpler.

The key problem, it seems to me, is that the caller is required to proceed in three steps: (i) create the WordSearch object; (ii) load the list of words by calling the format_input_list method; (iii) call the generate method.

It is not clear to me that anything is gained by this division into steps, so if I were writing it I would redesign the interface so that everything is done in one step. An additional advantage of this is that the user only has to read one docstring.

2. Review

The code could easily be made portable to Python 3 by using the print function instead of the print statement, and using range instead of xrange.
width and height would be better parameter names than x and y.
The default values to the constructor (x=0 and y=0) are unhelpful. Choosing values for parameters requires experience with an API, but you can't get experience with an API until you've chosen values for parameters. Good defaults help beginners get out of this cycle.
There's no need for kwargs in __init__. Specify all keyword arguments explicitly:
```
 def __init__(self, width=10, height=10, difficulty=5):
```

The __repr__ is unhelpful too:

 >>> WordSearch()
 <__main__.WordSearch object at 0x10442f278>

Did this work or not? It is easier to work with objects that are self-describing.

The name format_input_list is misleading: it does not actually format anything.
word_length_min is not actually the minimum word length, but one less than the minimum. Possibly word_length_min < len(i) should be word_length_min <= len(i).
direction_coordinate is way too complicated. There are three kinds of decoding going here: (i) location identifiers to location coordinates (ii) direction numbers to direction names; (iii) direction names to directions. The fact that you need debug_grid is a sign that you've over-complicated things.

I would avoid (i) by representing the grid as a list of lists instead of a single list; and I would avoid (ii) by working directly with direction names instead of direction numbers. Using this approach you could cut out direction_coordinate completely.

To implement (iii), the code builds a decoding table every time direction_coordinate is called. But the decoding table is the same every time, so this is a waste. It would be better to make this a global variable, as in my revised code below.

Representing difficulty levels as numbers is hard to understand and needlessly complex (because there have to be encoding and decoding steps). It would be clearer to represent difficulty levels as collections of directions:
```
 EASY = 'down right'.split()
 MEDIUM = 'up down left right'.split()
 HARD = 'up down left right upleft upright downleft downright'.split()
```

Using this approach, you could cut out get_difficulty completely.

The string 'abcdefghijklmnopqrstuvwxyz' is built into Python as string.ascii_lowercase.

3. Revised code

This doesn't implement everything that the original code does (in particular, it doesn't populate the empty squares); the idea here is to give you an idea of how to go about keeping the code simple and short:

import random
import string

LETTERS = set(string.ascii_lowercase)
EMPTY = '.'
DIRECTIONS = dict(up=(-1,0), down=(1,0), left=(0,-1), right=(0,1),
                  upleft=(-1,-1), upright=(-1,1), downleft=(1,-1),
                  downright=(1,1))

class WordSearch:
    """A word search puzzle.

    Arguments to the constructor:

    width       Width of the puzzle (default: 10).
    height      Height of the puzzle (default: 10).
    word_list   List of words to use (default: None).
    word_file   File to load words from (if word_list is None).
    min_len     Minimum length of words (default: 3).
    max_len     Maximum length of words (default: None).
    directions  Iterable of names of allowed directions (default: all eight).
    density     Stop generating when this density is reached (default: .7).

    """
    def __init__(self, width=10, height=10, word_list=None, word_file=None,
                 min_len=3, max_len=None, directions=DIRECTIONS, density=.7):

        # Check arguments and load word list.
        if max_len is None:
            max_len = min(width, height)
        if word_list is None:
            if word_file is None:
                raise ValueError("neither word_list nor word_file specified")
            word_list = []
            with open(word_file) as f:
                for line in f:
                    word = line.strip()
                    if set(word) <= LETTERS and min_len <= len(word) <= max_len:
                        word_list.append(word)
        else:
            # Take a copy so that we can shuffle it without updating
            # the original.
            word_list = word_list[:]
        random.shuffle(word_list)

        # Initially empty grid and list of words.
        self.grid = [[EMPTY] * width for _ in range(height)]
        self.words = []

        # Generate puzzle by adding words from word_list until either
        # the word list is exhausted or the target density is reached.
        filled_cells = 0
        target_cells = width * height * density
        for word in word_list:
            # List of candidate positions as tuples (i, j, d) where
            # (i, j) is the coordinate of the first letter and d is
            # the direction.
            candidates = []
            for d in directions:
                di, dj = DIRECTIONS[d]
                for i in range(max(0, 0 - len(word) * di),
                               min(height, height - len(word) * di)):
                    for j in range(max(0, 0 - len(word) * dj),
                                   min(width, width - len(word) * dj)):
                        for k, letter in enumerate(word):
                            g = self.grid[i + k * di][j + k * dj]
                            if g != letter and g != EMPTY:
                                break
                        else:
                            candidates.append((i, j, d))
            if candidates:
                i, j, d = random.choice(candidates)
                di, dj = DIRECTIONS[d]
                for k, letter in enumerate(word):
                    if self.grid[i + k * di][j + k * dj] == EMPTY:
                        filled_cells += 1
                        self.grid[i + k * di][j + k * dj] = letter
                self.words.append((word, i, j, d))
                if filled_cells >= target_cells:
                    break

    def __repr__(self):
        grid = (''.join(row) for row in self.grid)
        words = ('{} at ({},{}) {}'.format(*w) for w in self.words)
        indent = lambda lines: '\n'.join('    ' + line for line in lines)
        return "Grid:\n{}\nWords:\n{}".format(indent(grid), indent(words))

For example:

>>> WordSearch(word_file='/usr/share/dict/words', density=.9)
Grid:
    ..tsoob.a.
    ejectedus.
    vlns.in.s.
    iaaceclyed
    biohloptne
    rrmipogrtk
    aeppragiir
    tsesortpna
    owarlikegp
    sessilitys
Words:
    warlike at (8,1) right
    assenting at (0,8) down
    vibrato at (2,0) down
    stilt at (0,3) downright
    chip at (3,3) down
    sparked at (9,9) up
    boost at (0,6) left
    sessility at (9,0) right
    serial at (7,1) up
    ejected at (1,0) right
    goes at (5,6) upleft
    rose at (7,5) left
    unclip at (1,7) downleft
    moan at (5,2) up
    clog at (3,3) downright
    tripe at (4,7) down
    ropy at (6,4) upright
    pat at (5,4) downright