4

I want to split a string on a regular expresion, but preserve the matches.

I have tried splitting the string on a regex, but it throws away the matches. I have also tried using this, but I am not very good at translating code from language to language, let alone C#.

re := regexp.MustCompile(`\d`)
array := re.Split("ab1cd2ef3", -1)

I need the value of array to be ["ab", "1", "cd", "2", "ef", "3"], but the value of array is ["ab", "cd", "ef"]. No errors.

4 Answers 4

2

The kind of regex support in the link you have pointed out is NOT available in Go regex package. You can read the related discussion.

What you want to achieve (as per the sample given) can be done using regex to match digits or non-digits.

package main

import (
    "fmt"
    "regexp"
)

func main() {
    str := "ab1cd2ef3"
    r := regexp.MustCompile(`(\d|[^\d]+)`)
    fmt.Println(r.FindAllStringSubmatch(str, -1))
}

Playground: https://play.golang.org/p/L-ElvkDky53

Output:

[[ab ab] [1 1] [cd cd] [2 2] [ef ef] [3 3]]
Sign up to request clarification or add additional context in comments.

2 Comments

Can I do the same for multiple separators? It seems like I cant. ( \(\)|[^ \(\)]+)
It is just regex to match tokens. So as long as you can represent token separately using regex, you can have multiple of those separated by |. If you can provide some sample string, it will be easier to understand.
1

You can use a bufio.Scanner:

package main

import (
   "bufio"
   "strings"
)

func digit(data []byte, eof bool) (int, []byte, error) {
   for i, b := range data {
      if '0' <= b && b <= '9' {
         if i > 0 {
            return i, data[:i], nil
         }
         return 1, data[:1], nil
      }
   }
   return 0, nil, nil
}

func main() {
   s := bufio.NewScanner(strings.NewReader("ab1cd2ef3"))
   s.Split(digit)
   for s.Scan() {
      println(s.Text())
   }
}

https://golang.org/pkg/bufio#Scanner.Split

Comments

0

I don't think this is possible with the current regexp package, but the Split could be easily extended to such behavior.

This should work for your case:

func Split(re *regexp.Regexp, s string, n int) []string {
    if n == 0 {
        return nil
    }

    matches := re.FindAllStringIndex(s, n)
    strings := make([]string, 0, len(matches))

    beg := 0
    end := 0
    for _, match := range matches {
        if n > 0 && len(strings) >= n-1 {
            break
        }

        end = match[0]
        if match[1] != 0 {
            strings = append(strings, s[beg:end])
        }
        beg = match[1]
        // This also appends the current match
        strings = append(strings, s[match[0]:match[1]])
    }

    if end != len(s) {
        strings = append(strings, s[beg:])
    }

    return strings
}

Comments

0

Dumb solutions. Add separator in the string and split with separator.

package main

import (
    "fmt"
    "regexp"
    "strings"
)

func main() {
    re := regexp.MustCompile(`\d+`)
    input := "ab1cd2ef3"
    sep := "|"

    indexes := re.FindAllStringIndex(input, -1)
    fmt.Println(indexes)

    move := 0
    for _, v := range indexes {
        p1 := v[0] + move
        p2 := v[1] + move
        input = input[:p1] + sep + input[p1:p2] + sep + input[p2:]
        move += 2
    }

    result := strings.Split(input, sep)

    fmt.Println(result)
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.