1

I have a string in Clojure and I'd like to name and extract various parts of a match. The standard way to do this is:

(re-seq #"\d{3}-\d{4}" "My phone number is 000-1234")
;; returns ("000-1234")

However I want to be able to name and access just the matched parts.

Here's an example:

(def mystring "Find sqrt of 6 and the square of 2")
(def patterns '(#"sqrt of \d" #"square of \d"))

When I match on mystring with my list of patterns, I'd like a result to be something like of {:sqrt 6, :root 2}.

Update

I found a 3rd party package called https://github.com/rufoa/named-re that supported named groups, but I was hoping there was a solution within a core library.

3 Answers 3

4

you can do it using named groups of java's regular expressions. the problem is that there is no api to get all the groups' names, so you will have to get them from your regexp:

(defn find-named [re s]
  (let [m (re-matcher re s)
        names (map second (re-seq #"\(\?<([\w\d]+)>" (str re)))]
    (when (.find m)
      (into {} (map (fn [name]
                      [(keyword name) (.group m name)])
                    names)))))

in repl:

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of 2")
{:sqrt "6", :root "2"}

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of fff")
nil

update:

the conversation led me to the thought, that you don't really need named groups here, but rather named patterns:

user> 
(defn get-named [patterns s]
  (into {} (for [[k ptrn] patterns]
             [k (second (re-find ptrn s))])))
#'user/get-named

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of 2")
{:sq "6", :rt "2"}

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of xxx")
{:sq "6", :rt nil}
Sign up to request clarification or add additional context in comments.

4 Comments

Nice. Is there something analogous to this that would work symbolically on lists like '(sqrt of 6, square of 2)?
I don't really know, if there is any lib out there, but in simple cases you can always convert this list to string and use regexp, (pr-str '(sqrt of 6, square of 2)) => "(sqrt of 6 square of 2)"
also if you know the structure of the list, you can make some specific parsing for it, like in your case it would be (into {} (for [[k _ v] (partition 3 '(sqrt of 6, square of 2))] [(keyword k) v])) I don't really think there is a need for something universal here.
My use case is really the OR: (find-named #"(sqrt of (?<sqrt>\d))|(square of (?<root>\d))" "Find sqrt of 6 and the square of fff") which works well giving {:sqrt "6", :root nil}
1

You need to capture the pattern you want, e.g.:

(re-seq #"sqrt of (\d)" "Find sqrt of 6")

Or if you want the first group match:

(def matcher #"sqrt of (\d)" "Find sqrt of 6")
(re-find matcher)
(second (re-groups matcher))

See the docs for re-groups.

As far as naming captured groups, I didn't look too carefully at the library you mentioned in the question but I would think the only practical difference is in assigning the capturing group a name rather than it just being referenced by its numeric left-to-right position (starting from 1) in the regex.

6 Comments

This doesn't work for me. It gives IllegalStateException No match found java.util.regex.Matcher.group (Matcher.java:536)
I updated the answer - the main thing is the parens around the \d
Ok, so now it returns (["sqrt of 6" "6"]), but I have no guarentee that the last element is the one I want. I think we need "named patterns".
The return from re-groups is a vector. The first element is the whole match. The remaining elements are specific matches.
Thanks. So named groups is what I'm after, I added a link to the project. Is there something like that built into the core libraries, something that would work in lists as well as strings?
|
1

Depending on what you intend to do with the ‘named matches’ you may also find it useful to simply destructure the matches and bind them to symbols.

For a single match:

(if-let [[_ digit letter] (re-find #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => ["1" "x"]

For multiple matches:

(for [[_ digit letter] (re-seq #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => (["1" "x"] ["2" "y"] ["3" "z"])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.