1

I want to extract JSON string from html document "without" using third party Framework. I'm trying to create iOS framework and I do not want to use third party Framework in it.

Example url: http://www.nicovideo.jp/watch/sm33786214

In that html, there is a line:

I need to extract: JSON_String_I_want_to extract and convert it to JSON object.

With third party framework "Kanna", it is like this:



    if let doc = Kanna.HTML(html: html, encoding: String.Encoding.utf8) {
        if let descNode = doc.css("#js-initial-watch-data[data-api-data]").first {
            let dataApiData = descNode["data-api-data"]
                if let data = dataApiData?.data(using: .utf8) {
                    if let json = try? JSON(data: data, options: JSONSerialization.ReadingOptions.mutableContainers) {

I searched the web with similar question but unable to apply to my case:(I need to admit I'm not quite following regular expression)



      if let html = String(data:data, encoding:.utf8) {
        let pattern = "data-api-data=\"(.*?)\".*?>"
        let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
        let matches = regex.matches(in: html, options: [], range: NSMakeRange(0, html.count))
        var results: [String] = []
        matches.forEach { (match) -> () in
            results.append( (html as NSString).substring(with: match.rangeAt(1)) )
        }
        if let stringJSON = results.first {
          let d = stringJSON.data(using: String.Encoding.utf8)
          if let json = try? JSONSerialization.jsonObject(with: d!, options: []) as? Any {
            // it does not get here...      
          }

Anyone expert in extracting from html and convert it to JSON?

Thank you.

1
  • You could use a WKWebView for this, which will parse the HTML, and then you'd ask it for the contents of that html tag that holds the json. A little bit heavy solution, though. Commented Sep 2, 2018 at 13:53

1 Answer 1

1

Your pattern does not seem to be bad, just that attribute values of HTML Elements may be using character entities.

You need to replace them into actual characters before parsing the String as JSON.

if let html = String(data:data, encoding: .utf8) {
    let pattern = "data-api-data=\"([^\"]*)\""
    let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
    let matches = regex.matches(in: html, range: NSRange(0..<html.utf16.count)) //<-USE html.utf16.count, NOT html.count
    var results: [String] = []
    matches.forEach {match in
        let propValue = html[Range(match.range(at: 1), in: html)!]
            //### You need to replace character entities into actual characters
            .replacingOccurrences(of: "&quot;", with: "\"")
            .replacingOccurrences(of: "&apos;", with: "'")
            .replacingOccurrences(of: "&gt;", with: ">")
            .replacingOccurrences(of: "&lt;", with: "<")
            .replacingOccurrences(of: "&amp;", with: "&")
        results.append(propValue)
    }
    if let stringJSON = results.first {
        let dataJSON = stringJSON.data(using: .utf8)!
        do {
            let json = try JSONSerialization.jsonObject(with: dataJSON)
            print(json)
        } catch {
            print(error) //You should not ignore errors silently...
        }
    } else {
        print("NO result")
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.