0

In converting a Nokogiri object to XML and then to JSON, the majority of the content dissapears.

Code getting the data and converting:

def get_data
  doc = Nokogiri::HTML(open("<url>", "User-Agent" => "Ruby/#{RUBY_VERSION}"))

  # Get interesting block of HTML
  blurb = doc.css('.entry')

  # Convert Nokogiri object to XML
  xmlBlurb = blurb.to_xml

  # Convert to JSON
  jsonBlurb = Hash.from_xml(xmlBlurb).to_json

  return jsonBlurb 
end

Somehow between xmlBlurb and jsonBlurb, I'm going from 10+ lines of XML, to a single JSON object { attr: content } with only 1 attribute.

I know there are several questions on SO regarding converting XML to JSON but none that I read address this specific issue.

Does anyone know what can cause the loss of data?

2
  • 2
    Please edit your question to include the input XML and the JSON output you're expecting. Also, what is Hash.from_xml? It's not a standard Ruby method, nor does it come from Nokogiri. Commented May 24, 2016 at 20:03
  • Your title is misleading. Converting XML to JSON doesn't lose content, your code is losing content. Read "minimal reproducible example". Your code doesn't demonstrate the problem and you're missing minimal sample input required to help you. Commented May 24, 2016 at 23:08

1 Answer 1

2

Hash#from_xml is an addition to the standard library Hash class made by Rails. This method is documented as troublesome in losing attributes under various conditions during the conversion from XML to Hash.

"convert XML to ruby hash with attributes" provides some suggestions.

Sources:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.