Say I have a string my_string in Python and that I tokenize it according to some_pattern:
match.re.search(some_pattern, my_string)
string_1 = match.group(1)
string_2 = match.group(2)
....
Are string_1 and string_2 ("deep") copies of the substrings in my_string or references to the same location in memory? Do string_1 and string_2 allocate memory for full copies of the characters in my_string?
Please note that I am not asking about the immutability of the strings. If my_string is very long, I would like to know what is the hit in memory that I take by tokenizing my strings.
I don't need to know exactly how much memory is re-used, but it would certainly be useful to know if a tokenization of a string ends up duplicating memory.