I use XHTMLConverter to convert .docx to html, to make preview of the document. Is there any way to convert only few pages from original document? I'll be grateful for any help.
1 Answer
You have to parse the complete .docx file. It is not possible to read just parts of it. Otherwise if you want to know how to select a specific page number, im afraid to tell you(at least I believe) that word does not store page numbers therefore there is no function in the libary to accsess a specified page..
(I've read this at another forum, it actually might be false information).
PS: the Excel POI contains a .getSheetAt()method (this might helps you for your research)
But there are also other ways to accsess your pages. For instance you could read the lines of your docx document and search for the pagenumbers(might crash if your text contains those numbers though). Another way would be to search for the header of the site which would be more accurate:
HeaderStories headerStore = new HeaderStories( doc);
String header = headerStore.getHeader(pageNumber);
this should give you the header of the specified page. Same with footer:
HeaderStories headerStore = new HeaderStories( doc);
String footer = headerStore.getFooter(pageNumber);
If this dosen't work. I am not really into that API....
here a little Example for a very sloppy solution:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ReadDocFile
{
public static void main(String[] args)
{
File file = null;
WordExtractor extractor = null;
try
{
file = new File("c:\\New.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument document = new HWPFDocument(fis);
extractor = new WordExtractor(document);
String[] fileData = extractor.getParagraphText();
for (int i = 0; i < fileData.length; i++)
{
if (fileData[i].equals("headerPageOne")){
int firstLineOfPageOne = i;
}
if (fileData[i]).equals("headerPageTwo"){
int lastLineOfPageOne = i
}
}
}
catch (Exception exep)
{
exep.printStackTrace();
}
}
}
If you go with this i would recommend you to create a String[] with your headers and refractor the for-loop to a seperate getPages() Method. Therefore your loop would look like:
List<String> = new ArrayList<String>(Arrays.asList("header1","header2","header3","header4"));
for (int i = 0; i < fileData.length; i++)
{
//well there should be a loop for "x" too
if (fileData[i].equals(headerArray[x])){
int firstLineOfPageOne = i;
}
if (fileData[i]).equals(headerArray[x+1]){
int lastLineOfPageOne = i
}
}
You could create an Object(int pageStart, int PageStop), wich would be the product of your method.
I hope it helped you :)