Simple pattern matching in Clojure

Question

I have a string in Clojure and I'd like to name and extract various parts of a match. The standard way to do this is:

(re-seq #"\d{3}-\d{4}" "My phone number is 000-1234")
;; returns ("000-1234")

However I want to be able to name and access just the matched parts.

Here's an example:

(def mystring "Find sqrt of 6 and the square of 2")
(def patterns '(#"sqrt of \d" #"square of \d"))

When I match on mystring with my list of patterns, I'd like a result to be something like of {:sqrt 6, :root 2}.

Update

I found a 3rd party package called https://github.com/rufoa/named-re that supported named groups, but I was hoping there was a solution within a core library.

leetwinski · Accepted Answer · 2016-04-19 15:09:36Z

4

you can do it using named groups of java's regular expressions. the problem is that there is no api to get all the groups' names, so you will have to get them from your regexp:

(defn find-named [re s]
  (let [m (re-matcher re s)
        names (map second (re-seq #"\(\?<([\w\d]+)>" (str re)))]
    (when (.find m)
      (into {} (map (fn [name]
                      [(keyword name) (.group m name)])
                    names)))))

in repl:

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of 2")
{:sqrt "6", :root "2"}

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of fff")
nil

update:

the conversation led me to the thought, that you don't really need named groups here, but rather named patterns:

user> 
(defn get-named [patterns s]
  (into {} (for [[k ptrn] patterns]
             [k (second (re-find ptrn s))])))
#'user/get-named

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of 2")
{:sq "6", :rt "2"}

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of xxx")
{:sq "6", :rt nil}

edited Apr 19, 2016 at 15:09

answered Apr 19, 2016 at 8:43

leetwinski

17.9k2 gold badges21 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

M.R. Over a year ago

Nice. Is there something analogous to this that would work symbolically on lists like '(sqrt of 6, square of 2)?

leetwinski Over a year ago

I don't really know, if there is any lib out there, but in simple cases you can always convert this list to string and use regexp, (pr-str '(sqrt of 6, square of 2)) => "(sqrt of 6 square of 2)"

leetwinski Over a year ago

also if you know the structure of the list, you can make some specific parsing for it, like in your case it would be (into {} (for [[k _ v] (partition 3 '(sqrt of 6, square of 2))] [(keyword k) v])) I don't really think there is a need for something universal here.

M.R. Over a year ago

My use case is really the OR: (find-named #"(sqrt of (?<sqrt>\d))|(square of (?<root>\d))" "Find sqrt of 6 and the square of fff") which works well giving {:sqrt "6", :root nil}

leeor · Accepted Answer · 2016-04-19 03:55:52Z

1

You need to capture the pattern you want, e.g.:

(re-seq #"sqrt of (\d)" "Find sqrt of 6")

Or if you want the first group match:

(def matcher #"sqrt of (\d)" "Find sqrt of 6")
(re-find matcher)
(second (re-groups matcher))

See the docs for re-groups.

As far as naming captured groups, I didn't look too carefully at the library you mentioned in the question but I would think the only practical difference is in assigning the capturing group a name rather than it just being referenced by its numeric left-to-right position (starting from 1) in the regex.

edited Apr 19, 2016 at 3:55

answered Apr 19, 2016 at 3:40

leeor

18k6 gold badges36 silver badges61 bronze badges

6 Comments

M.R. Over a year ago

This doesn't work for me. It gives IllegalStateException No match found java.util.regex.Matcher.group (Matcher.java:536)

leeor Over a year ago

I updated the answer - the main thing is the parens around the \d

M.R. Over a year ago

Ok, so now it returns (["sqrt of 6" "6"]), but I have no guarentee that the last element is the one I want. I think we need "named patterns".

leeor Over a year ago

The return from re-groups is a vector. The first element is the whole match. The remaining elements are specific matches.

M.R. Over a year ago

Thanks. So named groups is what I'm after, I added a link to the project. Is there something like that built into the core libraries, something that would work in lists as well as strings?

|

glts · Accepted Answer · 2016-04-19 16:15:15Z

1

Depending on what you intend to do with the ‘named matches’ you may also find it useful to simply destructure the matches and bind them to symbols.

For a single match:

(if-let [[_ digit letter] (re-find #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => ["1" "x"]

For multiple matches:

(for [[_ digit letter] (re-seq #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => (["1" "x"] ["2" "y"] ["3" "z"])

answered Apr 19, 2016 at 16:15

glts

23k12 gold badges80 silver badges94 bronze badges

Collectives™ on Stack Overflow

Simple pattern matching in Clojure

3 Answers 3

4 Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related