My code is supposed to "guess" the path(s) that lies before the relevant text nodes in my XML file. Relevant in this case means: text nodes nested within the recurring product/person/something tag, but not text nodes that are used outside of it.
This code:
@doc, items = Nokogiri.XML(@file), []
path = []
@doc.traverse do |node|
if node.class.to_s == "Nokogiri::XML::Element"
is_path_element = false
node.children.each do |child|
is_path_element = true if child.class.to_s == "Nokogiri::XML::Element"
end
path.push(node.name) if is_path_element == true && !path.include?(node.name)
end
end
final_path = "/"+path.reverse.join("/")
works for simple XML files, for example:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Some XML file title</title>
<description>Some XML file description</description>
<item>
<title>Some product title</title>
<brand>Some product brand</brand>
</item>
<item>
<title>Some product title</title>
<brand>Some product brand</brand>
</item>
</channel>
</rss>
puts final_path # => "/rss/channel/item"
But when it gets more complicated, how should I then approach the challenge? For example with this one:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Some XML file title</title>
<description>Some XML file description</description>
<item>
<titles>
<title>Some product title</title>
</titles>
<brands>
<brand>Some product brand</brand>
</brands>
</item>
<item>
<titles>
<title>Some product title</title>
</titles>
<brands>
<brand>Some product brand</brand>
</brands>
</item>
</channel>
</rss>