Getting next variable in a for loop

Question

I'm very new to Python and I'm sure there is a much easier way to accomplish what I need but here goes.

I'm trying to create a program which performs frequency analysis on a list of letters called inputList and retrives the 2 letter pairs and adds them to another dictionary. So I need it to populate a second dictonary with all the 2 letter pairs.

I have a rough idea how I can do this but am I bit stuck with the syntax to make it work.

for bigram in inputList:
    bigramDict[str(bigram + bigram+1)] =  1

Where bigram+1 is the letter in the next iteration

As an example if I was to have the text "stackoverflow" in the inputList I need to to first put the letters "st" as the key and 1 as the value. On the second iteration "ta" as the key and so on. The problem I'm having is retriving the value the variable will be on the next iteration without moving to the next iteration.

I hope I explained myself clearly. Thanks for your help

Community · Accepted Answer · 2017-05-23 11:56:20Z

5

A straightforward way to obtain n-grams for a sequence is slicing:

def ngrams(seq, n=2):
    return [seq[i:i+n] for i in range(len(seq) - n + 1)]

Combine this with collections.Counter and you're ready:

from collections import Counter
print Counter(ngrams("abbabcbabbabr"))

In case you need ngrams() to be lazy:

from collections import deque

def ngrams(it, n=2):
    it = iter(it)
    deq = deque(it, maxlen=n)
    yield tuple(deq)
    for p in it:
        deq.append(p)
        yield tuple(deq)

(See below for more elegant code for the latter).

edited May 23, 2017 at 11:56

CommunityBot

11 silver badge

answered Jun 21, 2012 at 22:02

georg

216k57 gold badges324 silver badges401 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sarnold Over a year ago

Is string subscripting in Python an O(1) operation or an O(n) operation? This is either incredible elegant or incredibly slow...

sarnold Over a year ago

... Well, it ran on 14 megabytes of input quickly enough. It must be O(1) and thus this must be elegant. :D

Maria Zverina · Accepted Answer · 2012-06-21 22:16:21Z

3

Use zip to zip string to copy of itself offset by 1

Get bigraphs like this:

s = "stackoverflow"
zip(s,s[1:])

Gives:

[('s', 't'), ('t', 'a'), ('a', 'c'), ('c', 'k'), ('k', 'o'), ('o', 'v'), ('v', 'e'), ('e', 'r'), ('r', 'f'), ('f', 'l'), ('l', 'o'), ('o', 'w')]

Trigraphs are also easy:

zip(s,s[1:],s[2:])

Gives:

[('s', 't', 'a'), ('t', 'a', 'c'), ('a', 'c', 'k'), ('c', 'k', 'o'), ('k', 'o', 'v'), ('o', 'v', 'e'), ('v', 'e', 'r'), ('e', 'r', 'f'), ('r', 'f', 'l'), ('f', 'l', 'o'), ('l', 'o', 'w')]

You can use the tuples as the keys for your dictionary ... or better still use the Counter or default_dict objects for doing the counts. Good luck!

answered Jun 21, 2012 at 22:16

Maria Zverina

11.2k3 gold badges47 silver badges62 bronze badges

Comments

jfs · Accepted Answer · 2012-06-21 22:28:17Z

3

from collections import Counter
from itertools import islice, izip, tee

def pairs(iterable):
    a, b = tee(iterable)
    for pair in izip(a, islice(b, 1, None)):
        yield pair

print Counter(pairs("stackoverflow"))

Or a simpler version:

def pairs(iterable):
    it = iter(iterable)
    last = next(it)
    for c in it:
        yield last, c
        last = c

A generalized version for arbitrary n:

def ngrams(iterable, n=2):
    return izip(*[islice(it, i, None) for i, it in enumerate(tee(iterable, n))])

edited Jun 21, 2012 at 22:28

answered Jun 21, 2012 at 22:22

jfs

417k210 gold badges1k silver badges1.7k bronze badges

2 Comments

georg Over a year ago

Nice, but how about arbitrary n-grams? I have a strong feeling there must be an itertools oneliner for that.

jfs Over a year ago

@thg435: I've posted generalized version

TheZ · Accepted Answer · 2012-06-21 23:17:09Z

1

Keep a variable of the previous letter? First iteration you just fetch first letter and do nothing else.

ADDENDUM: This method, at the very least, doesn't need to waste any more memory than a simple variable to store one letter, no excess tuples or anything.

edited Jun 21, 2012 at 23:17

answered Jun 21, 2012 at 22:01

TheZ

3,73221 silver badges34 bronze badges

Collectives™ on Stack Overflow

Getting next variable in a for loop

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related