1

I have tensor with strings (tf.string) and I want to split this strings by regexp and make some preprocessing.

For example I have function

py_split(x):
    x = x.lower()
    x = re.split(r"(http:\/\/)|(https:\/\/)|(\W)", x)
    return x

and I need to use it in tensorflow-transform graph for using with tf serving in future.

But TF does not allow me to work with tf.string's same as normal strings in python.

How can I solve my problem without making new TF op in C++?

P.S. I use TensorFlow 1.13

1 Answer 1

1

Slightly tricky because TensorFlow (at least to my knowledge) doesn't have a regex split function.

If there is a character that you can be sure your input strings won't contain you could do a slightly messy workaround using tf.strings.regex_replace() and tf.strings.split(). We first use regex_replace in order to replace the match with our special character then use split to split on the special character

For example, if we could be sure our input strings would never contain the char | then we could proceed as follows:

def split(x):
  x = tf.strings.regex_replace(x, "(http:\/\/)|(https:\/\/)|(\W)", "|")
  return tf.strings.split(tf.expand_dims(x, 0), '|').values

so that, split("http://www.bbc.co.uk") say, gives us:

[b'', b'www', b'bbc', b'co', b'uk']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, but result of this operation isn't same because re.split not remove delimiters. For example for string: "Hello, world" re.split return ["Hello", ",", " ", "world"]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.