3

I want to open a ppt file using Python on linux, (like python open a .txt file). I know win32com, but I am working on linux. So, What do I need to do?

4
  • 1
    .ppt files aren't readable as plain text, so you can't really open them "like a .txt file". If you just want to launch the user's default viewer for powerpoint files, you can use the xdg-open shell command. Commented Nov 26, 2012 at 5:31
  • 1
    get the information or words in the file. Commented Nov 26, 2012 at 5:39
  • do not ask the same question multiple times; it isn't helpful to anyone. Commented Nov 26, 2012 at 16:21
  • possible duplicate of how to read ppt file using python? Commented Nov 26, 2012 at 16:21

5 Answers 5

3

python-pptx can open recent Powerpoint versions on Linux. They even provide an example for extracting all text from slides in their Getting started guide.

Here's the code (from the Getting Started guide)

from pptx import Presentation

prs = Presentation(path_to_presentation)

# text_runs will be populated with a list of strings,
# one for each text run in presentation
text_runs = []

for slide in prs.slides:
    for shape in slide.shapes:
        if not shape.has_textframe:
            continue
        for paragraph in shape.textframe.paragraphs:
            for run in paragraph.runs:
                text_runs.append(run.text)
Sign up to request clarification or add additional context in comments.

3 Comments

when using prs = Presentation(path_to_presentation) I am getting the following error File is not a zip file please help
@ArunKumar I also get this error, have you found a solution?
I'm not sure if the code has been updated since this answer, but occurances of textframe should be text_frame.
1

If you are on Linux, what office software are you referring to. OpenOffice (headless) can be interfaced using python on Linux. Here is a nice example https://github.com/jledoux/FRIEDA

3 Comments

Exactly, I want to get the content in the file. I need the words in it. so, can I get them, like open('a.txt') to get the content in a.txt? thx
Not so simple. Ppt files are difficult to dissect. The text is contained in individual text elements on each slide. The best way may be to view the file in outline form and extract text via copy and paste. Or you can try the odf modules discussed below.
I can use "catdoc"("catppt") to open it with subprocess. and it works.
1

Use odf.opendocument.OpenDocumentPresentation from the odfpy project. This is assuming you are only concerned with recent format files, that are compatible with the OpenDocument standard.

If you have access to OpenOffice, you can use their Python api to read the file.

Comments

0

Using catdoc/catppt with subprocess to open doc files and ppt files.

Comments

-1

You may check Apache Tika because I use it on mac like this

For MacOS Homebrew users: install Apache Tika (brew install tika)

The command-line interface works like this:

tika --text something.ppt > something.txt

And to use it inside python script:

import os
os.system("tika --text temp.ppt > temp.txt")

You will be able to do it and that is the only solution I have so far.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.