2

I have some data that i get as a string from a file that is formatted as shown below. What i would like to do is create a vector (stored as a list in python) that indicates the difference in x,y,z directions between [x2, y2, z2] and [x1, x2, x3] for each line of the string shown below.

I should be fine in calculating the difference vector, once i have the desired [x2, y2, z2] and [x1, x2, x3] extracted as lists of integers. The thing i need help with is creating these [x2, y2, z2] and [x1, x2, x3] lists from the data below.

data = """x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255] 
x1=34 y1=12 z1=15 col1=[255, 255, 255] x2=35 y2=12 z2=15 col2=[255, 255, 255] 
x1=22 y1=33 z1=24 col1=[255, 255, 255] x2=23 y2=33 z2=24 col2=[255, 255, 255] 
x1=16 y1=45 z1=58 col1=[255, 255, 255] x2=17 y2=45 z2=58 col2=[255, 255, 255] 
x1=27 y1=66 z1=21 col1=[255, 255, 255] x2=28 y2=66 z2=21 col2=[255, 255, 255]
"""

Just to clarify, I only need to figure out how to extract the [x2, y2, z2] and [x1, x2, x3] lists for a single line. I can figure out how to loop for each line and calculate the difference vector for each line on my own. its just extracting the relevant data from each line and reformatting it into a usable format that has stumped me.

I suspect that using regular expressions is a potential avenue for extracting this information. I have looked at the documentation at https://docs.python.org/2/library/re.html and feel completely baffled and confused by that document. I just want an easy to understand way to do it.

3
  • Can I suggest that you add the tag regex to this post Commented Nov 25, 2014 at 6:44
  • Actually i dont seem to be able to add any extra tags... I dont see any add more tags button or anything of that nature. Is it because I am new to StackOverflow, and dont have that privelege yet? Commented Nov 25, 2014 at 6:48
  • I just added the extra tag for you. For future reference though, just click on "edit" and it will allow you to edit your tags (along with the rest of the post) Commented Nov 25, 2014 at 6:51

2 Answers 2

3

For a single line, assumming that all lines have same format, you can do:

import re

a_line = "x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255]" 
x1,y1,z1,x2,y2,z2 = list(map(int, re.findall(r'=(\d+)', a_line)))

To process multiplate lines from your data:

for a_line in data.split("\n"):    
    if a_line:
        x1,y1,z1,x2,y2,z2 = list(map(int, re.findall(r'=(\d+)', a_line)))
        print(x1,y1,z1,x2,y2,z2)

Gives:

45 74 55 46 74 55
34 12 15 35 12 15
22 33 24 23 33 24
16 45 58 17 45 58
27 66 21 28 66 21
Sign up to request clarification or add additional context in comments.

4 Comments

Hey thanks for this, this is really cool. I could certainly make use of this myself in the future. Im a little confused as to how this manages to extract just the coordinates and not the color values.
This works for me, i can reformat the x1, x2, x3 etc into lists
Ohhh, i got it now... its because of the "=".... so all of that other stuff i put in my solution was redundant.
@RonRon the = works by a magic of sequence assignment/unpacking that is available in python.
2

I know exactly where you are coming from. I didnt understand regular expressions untill just yesterday, they always confused the hell out of me. But once you understand them you realise how powerful they are. Here is one possible solution to your problem. I will also give a little intuition behind what the regular expression is doing so it hopefully reduces the confusion behind regular expressions.

In the code below i am assuming you are dealing with one line at a time, and the data is always formatted the same.

# Example of just one line of the data
line = """x1=45 y1=74 z1=55 col1=[255, 255, 255] x2=46 y2=74 z2=55 col2=[255, 255, 255] """

# Extract the relevant x1, y1, z1 values, stored as a list of strings
p1 = re.findall(r"[x-z][1]=([\d]*)", line)

# Extract the relevant x2, y2, z2 values, stored as a list of strings
p2 = re.findall(r"[x-z][2]=([\d]*)", line)

# Convert the elements in each list from strings to integers
p1 = [int(x) for x in p1]
p2 = [int(x) for x in p2]

# Calculate difference vector (Im assuming this is what you're trying to do)
diff = [p2[i] - p1[i] for i in range(len(p2))]

A brief explanation of what the symbols in the regular expression are doing

# EXPLANATION OF THE REGEX. 
# Finds segments of strings that: 
#     [x-z]    start with a letter x,y, or z
#     [1]      followed by the number 1
#     =        followed by the equals sign
# 
#     But dont return any of that section of the string, only use that 
#     information to then extract the following values that we do actually want 
#
#     (        Return the parts of the string that have the following pattern, 
#              given that they were preceded by the previous pattern
# 
#     [\d]     contain only a numeric digit
#     *        keep proceeding forward if the current character is a digit
#     )        end of the pattern, now we can return the substring.

4 Comments

This formats the values nicely into the vector (list) format i had wanted. Thsnks. Also thanks for the explanation of the regular expressions... it makes a little more sense now... But i would still have no idea how to use it on my own.
THis how-to document is slightly easier to understand than the official documentation docs.python.org/2/howto/regex.html#regex-howto
Also, this webpage is handy for testing regular expressions regex101.com
cool thanks for that... I will look at them when i get the chance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.