1

I have a local PDF form that has a specific template that never changes. I've identified the form as an XFA (xml) dynamic form since no keysets were returned. I'm trying to use itext to fill in the form with data contained in a .txt file. From my understanding I need to somehow get the data from the text file and place it properly into a .xml file so that itext can manipulate the original PDF by using the given xml.

The form has the following layout as an example:

Example

The sample code I'm using in Eclipse compiles/runs successfully but it requires the data in the file data.xml in order to populate the empty form with field data and output the filled-in version. The thing is, for my actual project I don't have a data.xml file to use in order to populate the form properly. The raw field data is in a .txt file with each line containing data for a different field in the PDF.

EXAMPLE: Referencing the image above, my .txt file looks like this for the fields up to and including the field labelled "FOUR":

  • John
  • 15
  • Black
  • Honda
  • Toyota
  • Ford
  • BMW

I'm confused about 2 things:

1. How do I extract the original PDF's xml structure so that I know the format to adhere to when populating it with data from the .txt file?

2. How do I get the values from the text file and insert them into the .xml structure properly?

The following code works but requires data.xml in order to fill in "incomplete.pdf". It uses the code xfa.fillXfaForm(new FileInputStream(XML)); to input the data, but I'm stuck on how to identify the structure for "XML" and how to fill it in in the first place.

Any help is appreciated, thank you very much.

Code:

package sandbox;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;


import java.io.PrintStream;
import java.util.Set;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.XfaForm;


public class FillXFA {

    public static final String SRC = "C:/Workspace/PDF/incomplete.pdf";
    public static final String XML = "C:/Workspace/PDF/data.xml";
    public static final String DEST = "C:/Workspace/PDF/completed.pdf";

    public static void main(String[] args) throws IOException, DocumentException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new FillXFA().manipulatePdf(SRC, DEST);
    }

    public void readXfa(String src, String dest)
            throws IOException, ParserConfigurationException, SAXException,
                TransformerFactoryConfigurationError, TransformerException {
            FileOutputStream os = new FileOutputStream(dest);
            PdfReader reader = new PdfReader(src);
            XfaForm xfa = new XfaForm(reader);
            Document doc = xfa.getDomDocument();
            Transformer tf = TransformerFactory.newInstance().newTransformer();
            tf.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            tf.setOutputProperty(OutputKeys.INDENT, "yes");
            tf.transform(new DOMSource(doc), new StreamResult(os));
            reader.close();
        }

    public void manipulatePdf(String src, String dest)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(src);
        PdfStamper stamper = new PdfStamper(reader,
                new FileOutputStream(dest));
        AcroFields form = stamper.getAcroFields();
        XfaForm xfa = form.getXfa();
        xfa.fillXfaForm(new FileInputStream(XML));
        stamper.close();
        reader.close();
    }
}
4
  • 1
    Write code that takes your text file and makes it an XML file like data.xml Commented Oct 12, 2015 at 22:26
  • I'm somewhat hesitant of this approach because I've previously manipulated a .xml file... and it imported fine into my PDF, yet when a certain company tried to upload it to their system it wouldn't work. Leading me to believe I somehow "broke" the XML structure, perhaps with an out-of-place character or something. I'm hoping itext allows me to fully adhere to the original xml structure while still filling the data fields properly within the PDF. Any feedback or sources for how to continue is fantastic, thank you. Commented Oct 13, 2015 at 12:30
  • As you can see from the command, it takes one argument. The xml. It is up to you to ensure it is valid XML Commented Oct 13, 2015 at 16:38
  • Thank you Kevin. My question is then, how do I ensure it is valid XML? As my previous comment stated, I manually manipulated an XML file and successfully imported it into the PDF form. Yet when this certain company uploaded it to their system, it was deemed broken. Is there a way for me to verify if it is "broken" or wrong xml structure despite it successfully importing manually into a PDF? Perhaps with a line of code or an online service? Thank you Commented Oct 13, 2015 at 17:07

1 Answer 1

2

In XFA, the link between form fields and form data is made using a concept called data binding. Fields can have an XPath-like expression to select their value from the XML data structure. This implies that the XML data needs to be suitably structured to work for a specific XFA form, but this structure is not necessarily unique.

A simple example: Suppose you have an XFA form with just 1 text field. This text field has a data binding to any XML element with tag name "Name". In this case you data.xml can simply be:

<Name>Hurmle</Name>

But this, and an infinite number of different XML structures, will also work:

<StackOverflow>
    <accounts>
        <account>
            <Name>Hurmle</Name>
        </account>
    </accounts>
</StackOverflow>

The readXfa method in your code sample will work to extract the complete XML stream from the XFA form. It consists of different parts. The most relevant are:

  • template: Describes the logical form structure, including all the fields and their data binding.
  • xfa:datasets: Holds information about the data. Consists of 2 parts.
    • dataDescription: A schema for the form data, optional. The data description grammar is defined in the XFA specification.
    • xfa:data: The form data.

One way to determine which XML structure will work, is to look at the data binding of all the fields (cf template). Thus you will know where the fields expect to get their data. For a non-trivial form, this can be complex and/or a lot of work.

If available in the XFA form, you can use the dataDescription. It will give you the structure for the data and information like minimum and maximum occurrence for elements.

Finally, you can look at the data that's already in the form (cf. xfa:data). Keep in mind that this XML structure is not necessarily complete: empty elements can be omitted. For example, if a form has 2 fields, the values could be specified as:

<SomeRoot>
    <Field1>Value1</Field1>
    <Field2></Field2>
</SomeRoot>

But also:

<SomeRoot>
    <Field1>Value1</Field1>
</SomeRoot>

The first case will be easier for you to figure out the needed structure. If xfa:data is missing or incomplete, you can try to fill out all the form fields manually with an XFA capable PDF viewer. When saving, the viewer will populate xfa:data, according to the data description and the data binding.

For reference: XFA specification

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you rhens for the detailed feedback. It sounds like a strategy I can use is to manually fill in XFA form entirely. Then I can export it to .xml and read it in a text editor to understand the data description/binding. I'm confused about how I get the values in my .txt file into the .xml file so that I don't "break" the pre-defined data structure. This happened recently wherein a company related to my work couldn't upload the pdf because I had manually altered the xml and perhaps made a slight syntax error. I'm hoping itext allows me to fully adhere to the pre-defined xml data structure?
Making sure that the XML is valid is not related to XFA or iText. You can use standard XML tools to manipulate the XML structure. Have a look at JAXP for Java XML processing. DOM is probably easiest to understand. Node's replaceChild() and setNodeValue() can be used to add content to the XML document.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.