3

I am currently struggling with an assignment. The solution would input a txt file and run through counting the number of palindromes and their frequency. I need to use Map reduce to create to do so

For example: the string "bab bab bab cab cac dad" would output:

bab 3
cab 1
dad 1

Here is what I have so far

def palindrome(string):
    palindromes = []
    for word in string.split(" "):
        if (word == word[::-1]):
            palindromes.append(word)
    return palindromes 

string = "abc abd bab tab cab tat yay uaefdfdu"
print map(lambda x: palindrome(x), ["bab abc dab bab bab dad crap pap pap "])

Currently prints

[['bab', 'bab', 'bab', 'dad', 'pap', 'pap', '']]

Here is my attempt so far at the reduce section

def p(lists):
for list in lists:

set_h = set(list) 

return set_h

with the p function I want to create a set of all palindromes found. Then run a count of the palindromes on the list and make a dict out of this

print reduce(p, [['bab', 'bab', 'bab', 'dad', 'pap', 'pap', '']])

Am I on the right track?

3
  • It's "palindrome"... I fixed the spelling and cleaned up the formatting. Commented Sep 26, 2011 at 21:09
  • Should be tagged with homework Commented Sep 26, 2011 at 21:12
  • 1
    FYI, map(lambda x: palindrome(x), ...) is redundant. You can just as easily do map(palindrome, ...) and get the same results. However, you should probably also re-think your palindrome() function to operate on only a single item at a time, and split your input in advance. Also remember that you will need to have the results sorted between the map and reduce steps. Commented Sep 26, 2011 at 22:33

5 Answers 5

3

I think it would be much easier for you if your map() and reduce() input was an actual list of words. To achieve that, .split() the string before passing it to map(). Then map() a word either to itself (if your mapper encounters a palindrome) or None. You can then filter() the results to discard None values, sort it and pass it to reduce(). reduce() would then reduce it to a dict mapping words to their total count.

I will not provide you with a working solution not to take away from the learning factor.

Sign up to request clarification or add additional context in comments.

2 Comments

so basically, the map function will see if any of the words in the list are palindromes. if so then add them to a different list? palindrome = [] def palindromes(word): if (word == word[::-1]): palindrome.append(word) return palindromes map(palindromes, list_of_strings)
The philosophy of map() is that it will only feed the mapping function with one token at a time and the mapping function should only return one mapped value at a time. Same applies to reduce()—it will only pass tokens to reductor one by one. You should not have to maintain your own lists in your mapper / reductor functions. In fact you don't need filter() either because reductor can just check for None and do ignore it.
2

Split your string into a list before you map it. map() is for lists, sets, and dicts, not strings.

word_list = words_str.split(" ")

Avoid using map-filter-reduce unless your assignment dictates it; GVR says so. The proper solution uses Python's list comprehension syntax. In fact, you can do it with a pretty nasty one-liner:

pal_count = {
    x: word_list.count(x)  # reduce-ish
    for x in word_list     # map-ish
    if x == x[::-1]        # filter-ish
    }
for x, y in pal_count.iteritems():
    print x, y             # print the results!

Breaking it down...

  1. Catch this in a dictionary object to print it later: pal_count = {
  2. Define the return objects: x: word_list.count(x) We use key:value syntax to associate the palindrome, x, with its number of occurrences. count() is like a built-in reduce function for lists.
  3. Iterate through our list with a for loop, assigning the current value to 'x': for x in word_list
  4. We only want to return palindromes, so we add a comparison operator to filter out bad values: if x == x[::-1] # cool logic, btw
  5. Hurray! }

By the way, I'm only doing your homework because I never did mine.

The slower, less flexible, less portable, less awesome equivalent uses nested for loops:

pal_count = dict()
for x in word_list:                     # same loop
    if x == x[::-1]                     # is this a palindrome?
        if x in pal_count:              # have we seen before?
            pal_count[x] += 1
        else:                           # this one is new!
            pal_count.setdefault(x, 1)

4 Comments

It's not cool to provide complete working solutions to problems marked as homework or assignments. Not sure why you consider the second version to be slower. Calling .count() on large datasets is not the fastest thing on earth, especially if you call it for every occurence of the same word. For your second example, use pal_count.setdefault(x, 0) followed by pal_count[x] += 1 or pal_count[x] = pal_count.setdefault(x, 0) + 1.
@patrys Sorries... I was just excited that I was actually able to answer something; feel free to flag it. And thanks for the tip at pal_count[x] = pal_count.setdefault(x, 0) + 1 - very cool way to solve a common problem.
if you're using a modern (2.5+) version of Python, you can also use: pal_count = defaultdict(int) and then just pal_count[x] += 1.
Oooh defaultdict! I've ignored that many times while scanning the docs. Now you can't flag my answer because there's so much learning going on in the comments :-p
1

For your reduce function, you should start off with an empty dict and update/populate your counts. Reduce functions require 2 parameters, so one can be your dict, and the other, your palindrome. You can feed in an initial value in reduce like so:

reduce(lambda x, y: x+y, some_list, initial_value_for_x)

Take a look at dict's get for how to set default values, which should help you simplify your reduce function a lot.

Comments

1

It would be very simple if we decompose the problem into small challenges. In our case this could be:

  1. Filter out all the palindromes from the list of words.
  2. Get unique words to find the count.
  3. Map all the unique words to their appropriate count.

Code:

words =  "bab bab bab cab cac dad"
is_palindrome = lambda word : word == word[::-1]
palindromes = filter(is_palindrome,words.split(" "))
get_count = lambda word : (word , palindromes.count(word))
unique = set(palindromes)
dict(map(get_count,unique))
Out[1]: {'bab': 3, 'cac': 1, 'dad': 1}

Here is the short explanation:

#Input:
    words =  "bab bab bab cab cac dad"

#Step 1: Filter out the palindromes.
    is_palindrome = lambda word : word == word[::-1]
    palindromes = filter(is_palindrome,words.split(" "))

#Step 2: Get the unique set of string to find their counts.
    unique = set(palindromes)

#Step 3: Map every unique palindrome to their respective count.
    get_count = lambda word : (word , palindromes.count(word))
    dict(map(get_count,unique))

#Output:
    Out[1]: {'bab': 3, 'cac': 1, 'dad': 1}

NOTE: map in python can accept any sequence, not just list, set or dict. Strings in python are also sequence, hence not satisfied with Cody Hess's statement: map cannot accept strings.

To demonstrate here is the very simple demo:

In [10]: map(echo, "python")
Out[10]: ['p', 'y', 't', 'h', 'o', 'n']

Comments

0

For map/reduce, using Counter object is pretty straight forward.

from collections import Counter 
words = "bab bab bab cab cac dad"
words_list = words.split(" ") 
cont = Counter()
for word in words_list:
    cont[word] += 1
print(cont)
# or if you want dict
print(dict(cont))

https://docs.python.org/3/library/collections.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.