2

I have the following string:

strings <- c("David, FC; Haramey, S; Devan, IA", 
            "Colin, Matthew J.; Haramey, S",
            "Colin, Matthew")

If I want the last initials/givenname for all strings i can use the following:

sub(".*, ", "", strings)
[1] "IA"      "S"       "Matthew"

This removes everything before the last ", "

However, I am stuck on how to get the the first initials/givenname. I know have to remove everything before the first ", " but then I have to remove everything after any spaces, semicolons, if any.

To be clear the output I want is:

c("FC", "Matthew", "Matthew")

Any pointers would be great.

fiddling i can get the first surnames gsub( " .*$", "", strings )

0

1 Answer 1

5

You can use

> gsub( "^[^\\s,]+,\\s+([^;.\\s]+).*", "\\1", strings, perl=T)
[1] "FC"      "Matthew" "Matthew"

See the regex demo

Explanation:

  • ^ - start of string
  • [^\\s,]+ - 1 or more characters other than whitespace or ,
  • , - a literal comma
  • \\s+ - 1 or more whitespace
  • ([^;.\\s]+) - Group 1 matching 1 or more characters other than ;, . or whitespace
  • .* - zero or more any character other than a newline

If you want to use a POSIX-like expression, replace \\s inside the character classes (inside [...]) with [:blank:] (or [:space:]):

gsub( "^[^[:blank:],]+,\\s+([^;.[:blank:]]+).*", "\\1", strings)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank yopu for the demo as well as the answer, so I can try grok this.
+1 for commenting each part of the regex. That way it becomes less of black-magic to us, the uninitiated ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.