1

I'm trying to edit xml files in a batch / python script

this is my xml file:

<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
   <taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
      <project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
      <language lcid="1031" name="German (Germany)" />
      <tm name="ENG-DEU_en-GB_de-DE.sdltm" />
      <settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
   </taskInfo>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="55" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
         'Cut the value words="20" replace with 0
         <repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
         'add the value to current value 20 to 34  so the new value is words="54"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="67" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
         'Cut the value words="35" replace with 0
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         'add the value to current value 35 to 54  so the new value is words="89"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <batchTotal>
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </batchTotal>
</task>

general notes:

  • the <task> is the root element (end element </task>)
  • the important here is to modify a few tags in a section called file <file> and endtag </file>
  • there can be X occurrences of <file>*</file>

What i need,

for each <file> element, i would like to:

  • In <inContextExact>, Set the value of the attribute words with 0

    <inContextExact ... words="55" ... /> => <inContextExact ... words="0" ... />

  • In <crossFileRepeated>, Set the value of the attribute words with 0

    <crossFileRepeated ... words="20" ... /> => <crossFileRepeated ... words="0" ... />

  • In <total>, Set the value of the words attribute to be calculated by my own logic

    <total ... words="1462" ... /> => <total ... words="??" ... />

I could really appreciate an example of processing XML files in batch / python

7
  • Have you tried searching the web? I'm sure you'll find lots of examples. Commented May 29, 2015 at 10:50
  • Yeah some but I guess someone here could help me. Commented May 29, 2015 at 11:05
  • Write a program which does what you want. If you fail, come back and tell us about what you did and what does not work. Commented May 29, 2015 at 11:16
  • @DanielElmnas, saw you tagged this question python. is python solution is acceptable? Commented May 29, 2015 at 11:34
  • Well sure if you can. Commented May 29, 2015 at 11:42

2 Answers 2

1

Let's utilize python!

it's extremely easy to do that in python. and since you said it's ok to make a solution in python, check the script below.

here's how you can iterate over a directory contains xml files and process them as requested in python while saving the file changes.

from xml.etree import ElementTree
import os

def edit_xml_file(data):
    e = ElementTree.fromstring(data)

    for file_element in e.findall('file'):

        analyse_element = file_element.find('analyse')

        in_context_exact_element = analyse_element.find('inContextExact')
        in_context_exact_words = int(in_context_exact_element.get('words'))
        in_context_exact_element.set('words', '0')

        cross_file_repeated_element = analyse_element.find('crossFileRepeated')
        cross_file_repeated_words = int(cross_file_repeated_element.get('words'))
        cross_file_repeated_element.set('words', '0')

        total_element = analyse_element.find('total')
        total_element.set('words', str(in_context_exact_words + cross_file_repeated_words))

    xmlstr = ElementTree.tostring(e)
    return xmlstr


def main():

    source_directory = 'xmlfiles'

    for filename in os.listdir(source_directory):

        if not filename.endswith('.xml'):
            continue

        xml_file_path = os.path.join(source_directory, filename)
        with open(xml_file_path, 'r+b') as f:
            data = f.read()
            fixed_data = edit_xml_file(data)
            f.seek(0)
            f.write(fixed_data)
            f.truncate()


if __name__ == '__main__':
    main()

in this solution, iv'e used the built in ElementTree utility

Sign up to request clarification or add additional context in comments.

9 Comments

Haha awesome, how do I use the python script? xD
The script says cannot find any XMLfiles, I installed python run cmd, navigate to C:/python34/test.py also I have got xml files in the same location
I get C:\Python34>lol.py File "C:\Python34\lol.py", line 28 source_directory ="C:\test" ^ SyntaxError: invalid syntax C:\Python34> I also tried "C:\test*.xls" and "C:\test\"
I need at least +20 rep to be able to talk in the room.
I still can't talk to you? Hmm strange.. I'll try to relogon
|
0

Necessary tools

Here are the necessary tools you will need to create a script in Excel VBA or VBscript:

Looping text files in a directory: link

Reading text files: link

Writing text files: link

Replacing using RegExp: link

Example Regex to get you going:

<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
->
<exact segments="114" words="0" characters="1687" placeables="14" tags="3" />

Use this regex: (words="[0-9]+?") or words="([0-9]+?)" even better

Below an example of processing a single row:

Dim re as RegExp
set re = new RegExp
re.Pattern = "words="([0-9]+?)"
newTextRow = re.Replace(textRow, 0) 'Replace word value with 0

The approach

  1. Loop through your XML files using the Dir function

  2. Read the contents of the file using the link above on how to read text files in VBA

  3. Loop through all rows and use the RegExp function to replace the necessary word params

  4. Save the output back to the XML file using the link above on how to write text files in VBA

5 Comments

hmm I dont really get it what to do with it?
Just create a VBA/VBscript and follow the approach using the resources I posted above
Well I should actually need more help...... Have you even opened the links you did google for me?.
Of course I opened them. 2 of them are my posts. More help...You mean basically write the script for you ;>?
I trying now but gonna take years haaha but Indeed yeah I should need your pro help lol. Thank you so far.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.