2

I have an HTML represented in a weird form (it is much easier to work with than a regular nested one):

         [{:text "5d" :em true :strong true}
          {:text "xx" :em true}
          {:text "damn" :em true :strong true}
          {:text "c6"}
          {:text "qwe" :em true}
          {:text "asd"}
          {:text "qqq" :em true :strong true}]

I need to convert it to a Hiccup-like one:

           [[:em
             [:strong "5d"]
             "xx"
             [:strong "damn"]]
            "c6"
            [:em "qwe"]
            "asd"
            [:strong [:em "qqq"]]]

The best implementation I came up with is:

(defn wrap-tags [states nodes]
  (if (seq states)
    (reduce
     (fn [nodes state]
       [(into [state] nodes)])
     nodes states)
    nodes))

(defn p->tags
  ([data]
     (p->tags data #{} [] []))
  ([[node & rest] state waiting result]
     (let [new-state (set (keys (dissoc node :text)))
           closed (clojure.set/difference state new-state)
           waiting (conj (wrap-tags closed waiting) (:text node))
           result (if-not (seq new-state)
                    (into result waiting)
                    result)
           waiting (if-not (seq new-state) [] waiting)]
       (if (seq rest)
         (p->tags rest new-state waiting result)
         (if (seq waiting)
           (into result (wrap-tags new-state waiting))
           result)))))

It's not working properly though, it doesn't handle the case when :strong appears (it has no idea how much of "waiting" nodes it should wrap, and wraps all of them - but I have no ideas how to track this). It looks a bit ugly to me as well, but that's less annoying. :) What it returns for my case right now is:

[[:em
  [:strong
   [:strong "5d"]
   "xx"
   "damn"]]
 "c6"
 [:em "qwe"]
 "asd"
 [:em [:strong "qqq"]]]

I would love to hear any ideas how to improve my code.

4
  • 1
    It seems you need a reverse of a tree flattening algorithm as the data you have is result of the flattening of Hiccup tree. Commented Feb 21, 2014 at 9:08
  • Yeah, something like that, I just couldn't invent a good way to do that. Commented Feb 21, 2014 at 9:47
  • 1
    why first {:text "5d" :em true :strong true} results in [:em [:strong]], last one {:text "qqq" :em true :strong true} => [:strong [:em]] Commented Feb 21, 2014 at 13:15
  • 1
    Because in first case em is the tag containing all three tags, and in second case I don't really care about order. No real difference if it's <strong><em>qqq</em></strong> or <em><strong>qqq</strong></em> Commented Feb 21, 2014 at 15:23

2 Answers 2

2

If I understand the layout of your data correctly, it looks like you want to partition the sequence by whether or not the elements contain :em, and if they do, have those wrapped in side a single [:em...] node. Clojure's partition-by can be used to do this:

(def elements [{:text "5d" :em true :strong true}                                                                              
               {:text "xx" :em true}                                                                                           
               {:text "damn" :em true :strong true}                                                                            
               {:text "c6"}                                                                                                    
               {:text "qwe" :em true}                                                                                          
               {:text "asd"}                                                                                                   
               {:text "qqq" :em true :strong true}]) 

(vec (partition-by #(:em %1) elements))        
;; =>                                                                              
[({:text "5d", :strong true, :em true} 
  {:text "xx", :em true}
  {:text "damn", :strong true, :em true})                        
 ({:text "c6"})                                                                                                              
 ({:text "qwe", :em true})                                                                                                   
 ({:text "asd"})                                                                                                             
 ({:text "qqq", :strong true, :em true})]   

You could then process this with a reduce to create the hiccup like structure:

(defn group->tag [acc group]                                                                                                   
  (cond                                                                                                                        
    (nil? group)                                                                                                               
    acc                                                                                                                        

    (:em (first group))                                                                                                        
    (conj                                                                                                                      
     acc                                                                                                                       
     (vec                                                                                                                      
      (concat [:em]                                                                                                            
              (mapv                                                                                                            
               (fn [elt]                                                                                                       
                 (if (contains? elt :strong)                                                                                   
                   [:strong (:text elt)]                                                                                       
                   (:text elt)))                                                                                               
               group))))                                                                                                       

    :otherwise                                                                                                                 
    (vec (concat acc (mapv :text group)))))                                                                                    

(defn elements->hiccup [elements]                                                                                              
  (reduce                                                                                                                      
   group->tag                                                                                                                  
   []                                                                                                                          
   (partition-by #(:em %1) elements)))   

The above looks like it produces what you asked for:

(elements->hiccup elements)                                                                                                    
;; =>                                                                                                                          
[[:em                                                                                                                          
  [:strong "5d"]                                                                                                               
  "xx"                                                                                                                         
  [:strong "damn"]]                                                                                                            
 "c6"                                                                                                                          
 [:em "qwe"]                                                                                                                   
 "asd"                                                                                                                         
 [:em [:strong "qqq"]]] 
Sign up to request clarification or add additional context in comments.

1 Comment

That makes a bit of sense, but the thing is that it could be other way around, strong is wrapping more tags and em is wrapping less tags. In other words, more or less free-form HTML. So I need to determine which tag should be an outer wrapper, which one is inside, and so on. Like a first commenter said, basically un-flattening an HTML tree.
0

Ok, it seems I have won this game:

(defn node->tags [node]
  (set (keys (dissoc node :text))))

(defn tag-reach [data tag]
  (reduce (fn [cnt node]
            (if (tag node)
              (inc cnt)
              (reduced cnt)))
          0 data))

(defn furthest-tag [data exclude]
  (let [exclude (into #{:text} exclude)
        tags (filterv #(not (exclude %)) (node->tags (first data)))]
    (if (seq tags)
      (reduce (fn [[tag cnt :as current] rival]
                (let [rival-cnt (tag-reach data rival)]
                  (if (> rival-cnt cnt)
                    [rival rival-cnt]
                    current)))
              [nil 0] tags)
      [nil 1])))

(defn nodes->tree
  ([nodes]
     (nodes->tree nodes []))
  ([nodes wrapping-tags]
     (loop [nodes nodes
            result []]
       (let [[tag cnt] (furthest-tag nodes wrapping-tags)
             [to-process to-recur] (split-at cnt nodes)
             processed (if tag
                         (nodes->tree to-process (conj wrapping-tags tag))
                         (mapv :text to-process))
             result (into result (if tag
                                   [(into [tag] processed)]
                                   processed))]
         (if (seq to-recur)
           (recur to-recur result)
           result)))))

(deftest test-gen-tree
  (let [data [{:text "5d" :em true :strong true}
              {:text "xx" :em true}
              {:text "qqq" :em true :strong true}
              {:text "c6"}
              {:text "qwe" :em true}
              {:text "asd"}
              {:text "qqq" :em true :strong true}]]
    (is (= (nodes->tree data)
           [[:em
             [:strong "5d"]
             "xx"
             [:strong "qqq"]]
            "c6"
            [:em "qwe"]
            "asd"
            [:strong [:em "qqq"]]]))))

It isn't as clear as I would like it to be, but it works. Hurray. :-)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.