1

in these days I'm totally struggling myself trying to let sas read an xfdf file, an export of comments (annotation) in a pdf with adobe professional. If you never worked with an .xfdf file, don't worry, basically is an XML parent format of adobe.

I can't use SAS XML Mapper, for two reason: first one is that I can't use it on workplace (where I develop my personal projects too, like this), second one is that I'd like to write a procedure that could be always repeated (without mapping anytime).

Usually comments are collected in xfdf with this format:

><freetext rect="300.165985,66.879105,380.165985,86.879105" creationdate="D:-001-1-1-1-1-1-00'30'" name="a7311cdb-77b3-4a48-8eff-62364f94213d" color="#FFBF00" flags="print" date="D:20150730153125+01'00'" page="0"
><contents-richtext
><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:8.0.0" xfa:spec="2.0.2" style="font-size:11.0pt;text-align:left;color:#FF0000;font-weight:normal;font-style:normal;font-family:Arial,sans-serif;font-stretch:normal"
><p
>THE_COMMENT_TO_EXPORT_IS_THIS_STRING</p
></body
></contents-richtext
></freetext

And I gather that data with this portion of xml map:

<COLUMN name='var1'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>

Sometimes comment are collected in another way:

><freetext rect="331.041992,230.949005,553.198975,250.949005" creationdate="D:-001-1-1-1-1-1-00'30'" name="4f112387-dec6-42f1-ad8c-a1fecf9d8e04" color="#66CCFF" flags="print" date="D:20150730153213+01'00'" page="0"
><contents-richtext
><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:8.0.0" xfa:spec="2.0.2" style="font-size:11.0pt;text-align:left;color:#FF0000;font-weight:normal;font-style:normal;font-family:Arial,sans-serif;font-stretch:normal"
><p dir="ltr"
><span style="font-family:Arial"
>THE_COMMENT_TO_EXPORT_IS_THIS_STRING</span
></p
></body
></contents-richtext
></freetext

No problem also here, I can gather this comment with this xml map portion:

<COLUMN name='var2'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p/span</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>

But here comes the problem, sometimes the data is collected in this strange format, with a double span tag:

><freetext rect="9.623672,760.177979,210.281006,783.448975" creationdate="D:00000000000000Z" name="4f037e18-9143-4ec1-a6ae-249fa2215528" width="2" color="#66CCFF" flags="print" date="D:20150731152640+01'00'" page="53"
><contents-richtext
><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:8.0.0" xfa:spec="2.0.2" style="font-size:14.0pt;text-align:left;color:#000000;font-weight:normal;font-style:normal;font-family:Arial,sans-serif;font-stretch:normal"
><p dir="ltr"
><span style="font-family:Arial"
>THIS_IS_THE_FIRST_PART </span
><span style="font-family:Arial"
>THIS_IS_THE_SECOND_PART</span
></p
></body
></contents-richtext
></freetext

The second map code hits only the second string (here: THIS_IS_THE_SECOND_PART), can someone please help? How to write an appropriate map for gathering both the informations with sas?

PS: I'm pretty sure that alse SAS XML Mapper can't solve this issue, I found someone with the same problem on the web and using a map created by that tool.

PS2: Path type is xpath 1.0, I gave I try with string-join and I had this error:

ERROR: invalid character in Xpath expression
ERROR: Xpath construct string-join(/xfdf/annots/freetext/contents-richtext/body/p/span, '')
for column var2 is an invalid, unrecognized, or unsupported form

EDIT: Added HTML tag, <P> and <SPAN> are tags related to this language.

2
  • Can you use string-join in SAS's implementation of XPATH? Like in this question? Commented Aug 5, 2015 at 20:32
  • Many thanks @Joe , but I don't think so, or at least I don't know how to implement it. I mean, in that question you kindly linked he had to concatenate 2 objects named in different ways (ELEMENT4 ELEMENT5) , I have to concatenate 2 string inside the same object like him, but these string are collected in different objects but named in the same way: (SPAN SPAN). Any idea? thanks again. Commented Aug 6, 2015 at 7:13

1 Answer 1

1

I answer my own question, I found out a quite good solution, but if anyone has an optimized version of this, please kindly post it.

I found out that in SAS XML maps you can't use XPath 2.0, but only XPath 1.0. In XPath 1.0 this step can be automatically performed within a single block only knowing the number of <PATH> in advance, using CONCAT('\xxx\xxx[1]',' '\xxx\xxx[2]').

Sadly this function does not work with SAS XML Map, and trying this you will encounter an error ERROR: invalid character in Xpath expression.

But I'm not interested in a perfect format, I can post-process the data I retrieve, hence in the map I reproduced in many variables all the possible cases of repeated <PATH> in this way:

<COLUMN name='vars1'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p/span[1]</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>

<COLUMN name='vars2'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p/span[2]</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>

<COLUMN name='vars3'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p/span[3]</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>

I programmed 6 of these blocks, even if I encountered only 2 <PATH> for making this code the most general as possible. Then I concatenated those string variables within a datastep.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.