2

I'm trying to write a PowerShell script to count the number of pages in word documents which will end up in a database to be used as a way to assess average number of pages in documents. We have thousands of these word documents which we get from providers and my challenge is that I already have an automated solution in a system without Office apps, meaning no COM object implementation possible. I've been looking at PSWriteWord as an option but the command for getting page settings seems to only work for margin, size and orientation and does not output a page count. Does anyone have a suggestion on how to tackle this one? I'm adding the code I tried with PSWriteWord and the results in case someone sees a fluke in my method.

Import-Module PSWriteWord
$WordDocument = Get-WordDocument -FilePath 'C:\Users\user\Desktop\testdocument.docx'
Get-WordPageSettings -WordDocument $WordDocument  

enter image description here

4
  • By "no COM", you mean no Office Automation, but you're still on Windows aren't you? Commented Apr 4, 2022 at 14:43
  • Correct, still on Windows but no Office installed on that system so I won't be able to do something like "New-Object -ComObject word.application" Commented Apr 4, 2022 at 14:47
  • Each Word file is a zipped folder - wherein you find a subfolder "docProps" and within that an xml-file app.xml - there you have a node "pages" which gives you the number of pages of that document. Commented Apr 4, 2022 at 14:49
  • The number of pages in a document to some extent relies on the attached printer. That's because Word uses data from the attached printer to optimize the page layout. This sometimes results in the page count varying from one computer to the next for the same document. Word's option to 'scale content for A4 or 8.5x11" paper sizes' further complicates matters. Anything that relies on XML properties, for example, is therefore unreliable. Commented Apr 4, 2022 at 22:44

2 Answers 2

3

What you can do is use the Shell.Application object and the System.Document.PageCount standard Windows property, something like this:

$application = New-Object -com "Shell.Application"
$folder = $application.Namespace("c:\myFolder1\myFolder2")
$docfile = $folder.ParseName("myDoc.docx");
Write-Host $docfile.ExtendedProperty("System.Document.PageCount")

This is the equivalent of what you'll see in the Shell's Properties dialog, Details tab.

If Word is not installed at all, this method will only be able to read old Word files (.doc format).

For .docx file (Open Xml format), if Word is not installed, you can use Microsoft's Open Xml SDK. Just download the package from Nuget, extract it, and copy the DocumentFormat.OpenXml.dll somewhere (from net46 folder for example) on your disk. This is the only extra file you'll need.

Once you've done that, this other script will dump the page count:

[System.Reflection.Assembly]::LoadFrom("DocumentFormat.OpenXml.dll") | out-null
$doc = [DocumentFormat.OpenXml.Packaging.WordprocessingDocument]::Open("c:\myFolder1\myFolder2\myDoc.docx", $false)
Write-Host $doc.ExtendedFilePropertiesPart.Properties.Pages.Text
$doc.Dispose()
Sign up to request clarification or add additional context in comments.

9 Comments

I'm getting a blank result on that, if I call the $docfile object props I don't see one for "PageCount". Here's what the object looks like, codeApplication : System.__ComObject Parent : System.__ComObject Name : testdocument.docx Path : C:\Users\user\Desktop\testdocument.docx GetLink : GetFolder : IsLink : False IsFolder : False IsFileSystem : True IsBrowsable : False ModifyDate : 4/5/2021 12:18:36 PM Size : 23901 Type : Office Open XML Document code
There's no PageCount property as such, it's a Windows "extended" property with "System.Document.PageCount" being its canonical name. Is the file on an NTFS drive? What does the Shell Properties dialog say, like this one: i.imgur.com/pB29gZ3.png
The file is in an NTFS drive. When I look at the detail window properties tab for the file in a computer with Word installed, it looks like the one in your screenshot, when I look at it in one without Word, the properties are limited to "Name, Type, Folder Path, Size, Date Created, Date Modified, Attibutes, Owner and Computer"
Yes this code works for .doc (old format) files w/o Word but it needs Word installed on the PC for newer Office Xml docx (.zip) files.
Thanks Simon, maybe I should update my question to specify both formats doc and docx since we have some legacy files and the majority as Office XML. Not sure if there's a work around for this without Office installed. Any suggestions?
|
1

I just want to give a little back since I got a lot out of this, in case someone comes to this page wondering how to the get the page count of DOC and DOCX files without Word installed. Thanks to Simon Mourier for pointing me in the right direction, I have taken his answers and turned them into a function. As he said, you'll need to download the file DocumentFormat.OpenXml.dll which is contained in the lib\net46 folder within the documentformat.openxml.2.16.0.nupkg file that's available here: https://www.nuget.org/packages/DocumentFormat.OpenXml/ The function takes a PSObject or a string with the path to the file. I'm only using it to return the page count, but you can return many other properties by editing the return line within the function or maybe adding it as a parameter.

function Get-PageCount {
    param (
        $docuName
    )
    if ($docuName -is [string]) {
        $docuName = Get-Item $docuName
    }
    if ($docuName.Extension -eq '.doc') {
        # Process Doc Extension Files
        $application = New-Object -com "Shell.Application"
        $folder = $application.Namespace((Split-Path $docuName))
        $docfile = $folder.ParseName($docuName.Name)
        return $docfile.ExtendedProperty("System.Document.PageCount")
    }
    else {
        [System.Reflection.Assembly]::LoadFrom((Join-Path $appFolder 'DocumentFormat.OpenXml.dll')) | out-null # Replace $appFolder with the location where DocumentFormat.OpenXml.dll was stored 
        $doc = [DocumentFormat.OpenXml.Packaging.WordprocessingDocument]::Open($docuName.FullName, $false)
        return $doc.ExtendedFilePropertiesPart.Properties.Pages.Text
    }
}`

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.