I need to make a regexp on elasticsearch to filtre some data. The field I filter on is the name of person. The data are not always well formatted (sometimes, there is no first name, sometimes, the family name is followed by a period or a comma or 'comma+first name' or 'point+first name'....).
For example, using "bouchard" I get the following matches:
"bouchard", "bouchard, m.", "bouchard, j.", "bouchard j.p.", "bouchard. j.p."
I need also to exclude name who begin with same prefixe like "bouchardat".
I tried many regexps and finally found that an exclusion may yield better results:
"query" : { "regexp" : {
"RECORDEDBY" : "bouchard([^a-z].*)"
}}
This doesn't work because it returns "bouchard, m.", "bouchard, j.", "bouchard j.p." but not "bouchard. j.p." and not "bouchard".
I try some regexps with + and .* but they don't work.
( "bouchard([^a-z].*.*)" "bouchard([^a-z]*+.*)")
To make it clear, I want to allow:
bouchard
bouchard, m.
bouchard, j.
bouchard j.p.
bouchard. j.p.
I want to exclude
bouchardat
Any advice is welcome.
"RECORDEDBY" : "bouchard"will only allowbouchard, and"RECORDEDBY" : "bouchard.+"should allow any values starting withbouchard.bouchard[^a-zA-Z]*