3

I am making a inverted index using hadoop and python. I want to know how can I include the byte offset of a line/word in python. I need something like this

hello hello.txt@1124

I need the locations for making a full inverted index. Please help.

1 Answer 1

13

Like this?

file.tell()

Return the file’s current position, like stdio's ftell().

http://docs.python.org/library/stdtypes.html#file-objects

Unfortunately tell() does not function since OP is using stdin instead of a file. But it is not hard to build a wrapper around it to give what you need.

class file_with_pos(object):
    def __init__(self, fp):
        self.fp = fp
        self.pos = 0
    def read(self, *args):
        data = self.fp.read(*args)
        self.pos += len(data)
        return data
    def tell(self):
        return self.pos

Then you can use this instead:

fp = file_with_pos(sys.stdin)
Sign up to request clarification or add additional context in comments.

3 Comments

Added wrapper class in the answer.
thank you for you response ... will try it out ... However, at present i have implemented a counter variable to keep a track of position. It is working pretty good as i need only relative location within a file.
@Siddharth: the code suggested by Wai seems to do exactly what your “present” code does. Unless you post your own code and mark it as the answer, please mark Wai's answer as the chosen answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.