0

I have a pandas dataframe that looks like this:

            X[m]      Y[m]      Z[m]  ...      beta  newx  newy
0       1.439485  0.087100  0.029771  ...  0.063807  1439    87
1       1.439485  0.089729  0.029121  ...  0.065871  1439    89
2       1.439485  0.091992  0.030059  ...  0.067653  1439    91
3       1.439485  0.082073  0.030721  ...  0.059883  1439    82
4       1.439485  0.084095  0.028952  ...  0.061458  1439    84
5       1.439485  0.085937  0.028019  ...  0.062897  1439    85

There are hundreds of thousands of such lines, while I have multiple dataframes like this. X and Y are coordinates on plane (Z is not important) that is moved 45 degrees by the middle to the right. I need to put all points back to the original place, -45 degrees from its location. I have variables newx and newy that represent coordinates before changing, I want to edit these two columns to have values of new coordinates. As I know coordinates of middle point, the point itself, the angle of middle-to-point (alpha) and angle middle-to-fixedpoint (beta), I can use approach presented in mathematics SO. I have transformed the code to python like this:

for i in range(len(df)):
    if df.iloc[i].alpha == math.pi/2 or df.iloc[i].alpha == 3*math.pi/2:
        df.newx[i] = mid
        df.newy[i] = int(math.tan(df.iloc[i].beta*(df.iloc[i].x-mid)+mid))
    elif df.iloc[i].beta == math.pi/2 or df.iloc[i].beta == 3*math.pi/2:
        #df.newx[i] = df.iloc[i].x -- this is already set
        df.newy[i] = int(math.tan(df.iloc[i].alpha*(mid-df.iloc[i].x)+mid))
    else:
        m0 = math.tan(df.iloc[i].alpha)
        m1 = math.tan(df.iloc[i].beta)
        x = ((m0 * df.iloc[i].x - m1 * mid) - (df.iloc[i].y - mid)) / (m0 - m1)
        df.newx[i] = int(x)
        df.newy[i] = int(m0 * (x - df.iloc[i].x) + df.iloc[i].y)

Although this does what I need and moves the point to the correct position, the time complexity is enormous and I have too much files to proceed it like this. I know that there are way faster methods, such as serialization, apply and list comprehension. I however can't figure out how to use it with this function.

Here are first 10 lines as dictionary:

{'X[m]': {0: 1.439484727008419, 1: 1.439484727008419, 2: 1.439484727008419, 3: 1.439484727008419, 4: 1.439484727008419, 5: 1.439484727008419, 6: 1.439484727008419, 7: 1.439484727008419, 8: 1.439484727008419, 9: 1.439484727008419}, 'Y[m]': {0: 0.08709958190841899, 1: 0.08972904270841897, 2: 0.091991981408419, 3: 0.08207325440841898, 4: 0.08409548540841899, 5: 0.08593746080841899, 6: 0.09416210370841899, 7: 0.08874029660841898, 8: 0.09168940400841899, 9: 0.09434491760841898}, 'Z[m]': {0: 0.029770726299999998, 1: 0.0291213803, 2: 0.030058834700000002, 3: 0.0307212565, 4: 0.028951926200000002, 5: 0.0280194897, 6: 0.030717188500000003, 7: 0.026446931099999998, 8: 0.0269318204, 9: 0.0273838975}, 'Velocity[ms^-1]': {0: ['-1.67570162e+00', '-2.59946979e-15', '-2.54510192e-15'], 1: ['-1.63915336e+00', '-2.54277343e-15', '-2.48959140e-15'], 2: ['-1.69191790e+00', '-2.62462561e-15', '-2.56973173e-15'], 3: ['-1.72920227e+00', '-2.68246377e-15', '-2.62636012e-15'], 4: ['-1.62961555e+00', '-2.52797767e-15', '-2.47510523e-15'], 5: ['-1.57713342e+00', '-2.44656340e-15', '-2.39539372e-15'], 6: ['-1.72897375e+00', '-2.68210929e-15', '-2.62601305e-15'], 7: ['-1.48862195e+00', '-2.30925809e-15', '-2.26096006e-15'], 8: ['-1.51591396e+00', '-2.35159534e-15', '-2.30241195e-15'], 9: ['-1.54135919e+00', '-2.39106792e-15', '-2.34105888e-15']}, 'L': {0: 0.9582306809661671, 1: 0.9564957485824027, 2: 0.9550059224371557, 3: 0.9615583774318917, 4: 0.9602177760259737, 5: 0.9589987519260235, 6: 0.9535800607266656, 7: 0.9571476500665267, 8: 0.9552049510914844, 9: 0.953460072490227}, 'x': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'y': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}, 'alpha': {0: -0.7215912027987663, 1: -0.719527331916007, 2: -0.7177451479100487, 3: -0.7255156166536015, 4: -0.7239399868865558, 5: -0.7225009735356016, 6: -0.7160308360594005, 7: -0.7203042790640757, 8: -0.7179837655204843, 9: -0.7158861861473951}, 'beta': {0: 0.06380696059868196, 1: 0.06587083148144124, 2: 0.06765301548739955, 3: 0.05988254674384674, 4: 0.06145817651089247, 5: 0.06289718986184667, 6: 0.06936732733804774, 7: 0.0650938843333726, 8: 0.06741439787696402, 9: 0.0695119772500532}, 'newx': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'newy': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}}
9
  • 1
    Checking for equality against a float smells... Commented Nov 12, 2021 at 21:22
  • @JoshuaVoskamp I will deal with that problem later, I am aware that it may not end up as I wish but now I have to make it run in reasonable time Commented Nov 12, 2021 at 21:25
  • can you provide a small proof-of-concept input df and expected output to test against, perhaps df.head(10).to_dict()? Commented Nov 12, 2021 at 21:26
  • @JoshuaVoskamp I have edited the question. Commented Nov 12, 2021 at 21:29
  • 1
    Can you explain more thoroughly the problem? I do not understand 'X and Y are coordinates on plane that is moved 45 degrees by the middle to the right.". What is "middle"? What does mean moving a plane? (I would understand "rotate", "translate" or "scale"). Can you state the question using a transformation matrix? Commented Nov 12, 2021 at 21:38

2 Answers 2

3

I suspect how we're using mid as from your code may be causing you problems. Is mid a numeric? Are the x- and y-coordinates of your middle point the same value?


@OP, please confirm your variable names as compared to your linked source are as I have translated them:

linked name your name
a0 beta
a1 alpha
(x0, y0) (df.x, df.y)
(x1, y1) (mid, mid)

Update this answer shares some ideas with @mitoRibo's answer, but I re-translated from OP's linked source and suspect OP made some transcription error. Noted in comments. Both of us used a strategy of "selectively calculate newx/newy using masking, where the masks are equivalent to the if/elif/else conditions provided".

#setup
import pandas as pd
import numpy as np
import math

df = pd.DataFrame({'X[m]': {0: 1.439484727008419, 1: 1.439484727008419, 2: 1.439484727008419, 3: 1.439484727008419, 4: 1.439484727008419, 5: 1.439484727008419, 6: 1.439484727008419, 7: 1.439484727008419, 8: 1.439484727008419, 9: 1.439484727008419}, 'Y[m]': {0: 0.08709958190841899, 1: 0.08972904270841897, 2: 0.091991981408419, 3: 0.08207325440841898, 4: 0.08409548540841899, 5: 0.08593746080841899, 6: 0.09416210370841899, 7: 0.08874029660841898, 8: 0.09168940400841899, 9: 0.09434491760841898}, 'Z[m]': {0: 0.029770726299999998, 1: 0.0291213803, 2: 0.030058834700000002, 3: 0.0307212565, 4: 0.028951926200000002, 5: 0.0280194897, 6: 0.030717188500000003, 7: 0.026446931099999998, 8: 0.0269318204, 9: 0.0273838975}, 'Velocity[ms^-1]': {0: ['-1.67570162e+00', '-2.59946979e-15', '-2.54510192e-15'], 1: ['-1.63915336e+00', '-2.54277343e-15', '-2.48959140e-15'], 2: ['-1.69191790e+00', '-2.62462561e-15', '-2.56973173e-15'], 3: ['-1.72920227e+00', '-2.68246377e-15', '-2.62636012e-15'], 4: ['-1.62961555e+00', '-2.52797767e-15', '-2.47510523e-15'], 5: ['-1.57713342e+00', '-2.44656340e-15', '-2.39539372e-15'], 6: ['-1.72897375e+00', '-2.68210929e-15', '-2.62601305e-15'], 7: ['-1.48862195e+00', '-2.30925809e-15', '-2.26096006e-15'], 8: ['-1.51591396e+00', '-2.35159534e-15', '-2.30241195e-15'], 9: ['-1.54135919e+00', '-2.39106792e-15', '-2.34105888e-15']}, 'L': {0: 0.9582306809661671, 1: 0.9564957485824027, 2: 0.9550059224371557, 3: 0.9615583774318917, 4: 0.9602177760259737, 5: 0.9589987519260235, 6: 0.9535800607266656, 7: 0.9571476500665267, 8: 0.9552049510914844, 9: 0.953460072490227}, 'x': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'y': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}, 'alpha': {0: -0.7215912027987663, 1: -0.719527331916007, 2: -0.7177451479100487, 3: -0.7255156166536015, 4: -0.7239399868865558, 5: -0.7225009735356016, 6: -0.7160308360594005, 7: -0.7203042790640757, 8: -0.7179837655204843, 9: -0.7158861861473951}, 'beta': {0: 0.06380696059868196, 1: 0.06587083148144124, 2: 0.06765301548739955, 3: 0.05988254674384674, 4: 0.06145817651089247, 5: 0.06289718986184667, 6: 0.06936732733804774, 7: 0.0650938843333726, 8: 0.06741439787696402, 9: 0.0695119772500532}})

# make the new columns
df['newx'] = np.nan
df['newy'] = np.nan
# if any of the values are np.nan when we're done, something went wrong

# Do the float `between` comparison but cleverly
EPSILON = 1e-6
# windows = ((pi/2 ± EPSILON), (3pi/2 ± EPSILON))
windows = tuple(tuple(d*math.pi/2 + s*EPSILON for s in (1, -1)) for d in (1, 3))
# challenge: make this more DRY (don't repeat yourself)
alpha_cond = sum([df.alpha.between(*w) for w in windows]).astype(bool)
beta_cond  = sum([ df.beta.between(*w) for w in windows]).astype(bool)\
                 & ~alpha_cond
neither = (~alpha_cond & ~beta_cond)

# Handle `alpha near pi/2 or 3pi/2`:
c1 = df.loc[alpha_cond]
df.loc[alpha_cond,'newx'] = mid
                                         # changed `tan` parenthesis
                                         # |    changed `df.x - mid` to `mid - df.x`
                                         # |    |             changed to `df.y` from `mid`
df.loc[alpha_cond,'newy'] = (np.tan(c1.beta) * (mid - c1.x) + c1.y).astype(int)

# Handle `beta near pi/2 or 3pi/2`:
c2 = df.loc[beta_cond]
df loc[beta_cond,'newx'] = c2.x
                                         # changed `tan` parenthesis
                                         # |    changed `mid - df.x` to `df.x - mid`
df.loc[beta_cond,'newy'] = (np.tan(c2.alpha) * (c2.x - mid) + mid).astype(int)

# Handle the remainder:
c3 = df.loc[neither]
m0 = np.tan(c3.alpha)
m1 = np.tan(c3.beta)
t = ((m0 * c3.x - m1 * mid) - (c3.y - mid)) / (m0 - m1)

df.loc[neither,'newx'] = t.astype(int)
df.loc[neither,'newy'] = (m0 * (t - c3.x) + c3.y).astype(int)
Sign up to request clarification or add additional context in comments.

4 Comments

looks great, and I bet its really fast. one suggestion is that df.beta.between(dn_min, dn_max) saves some typing
@Ruli I checked source against your linked SO answer and made some changes; noted in comments. Short version: I think you may have made some transcription errors. Would you confirm my translation table?
alpha and beta are reverse, a0 is alpha, I am going to work on it shortly and will see where I have made a mistake :)
the code works now, I had some logical errors as well in code irrelevant to this, which caused incorrect angles to be count. As I fixed those your code works (keeping in mind you have swapped alpha and beta) and is significantly faster than mine which was aim of question.
3

Same approach as @Joshua Voskamp, but I still wanted to share

import pandas as pd
import numpy as np
import math

df = pd.DataFrame({'X[m]': {0: 1.439484727008419, 1: 1.439484727008419, 2: 1.439484727008419, 3: 1.439484727008419, 4: 1.439484727008419, 5: 1.439484727008419, 6: 1.439484727008419, 7: 1.439484727008419, 8: 1.439484727008419, 9: 1.439484727008419}, 'Y[m]': {0: 0.08709958190841899, 1: 0.08972904270841897, 2: 0.091991981408419, 3: 0.08207325440841898, 4: 0.08409548540841899, 5: 0.08593746080841899, 6: 0.09416210370841899, 7: 0.08874029660841898, 8: 0.09168940400841899, 9: 0.09434491760841898}, 'Z[m]': {0: 0.029770726299999998, 1: 0.0291213803, 2: 0.030058834700000002, 3: 0.0307212565, 4: 0.028951926200000002, 5: 0.0280194897, 6: 0.030717188500000003, 7: 0.026446931099999998, 8: 0.0269318204, 9: 0.0273838975}, 'Velocity[ms^-1]': {0: ['-1.67570162e+00', '-2.59946979e-15', '-2.54510192e-15'], 1: ['-1.63915336e+00', '-2.54277343e-15', '-2.48959140e-15'], 2: ['-1.69191790e+00', '-2.62462561e-15', '-2.56973173e-15'], 3: ['-1.72920227e+00', '-2.68246377e-15', '-2.62636012e-15'], 4: ['-1.62961555e+00', '-2.52797767e-15', '-2.47510523e-15'], 5: ['-1.57713342e+00', '-2.44656340e-15', '-2.39539372e-15'], 6: ['-1.72897375e+00', '-2.68210929e-15', '-2.62601305e-15'], 7: ['-1.48862195e+00', '-2.30925809e-15', '-2.26096006e-15'], 8: ['-1.51591396e+00', '-2.35159534e-15', '-2.30241195e-15'], 9: ['-1.54135919e+00', '-2.39106792e-15', '-2.34105888e-15']}, 'L': {0: 0.9582306809661671, 1: 0.9564957485824027, 2: 0.9550059224371557, 3: 0.9615583774318917, 4: 0.9602177760259737, 5: 0.9589987519260235, 6: 0.9535800607266656, 7: 0.9571476500665267, 8: 0.9552049510914844, 9: 0.953460072490227}, 'x': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'y': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}, 'alpha': {0: -0.7215912027987663, 1: -0.719527331916007, 2: -0.7177451479100487, 3: -0.7255156166536015, 4: -0.7239399868865558, 5: -0.7225009735356016, 6: -0.7160308360594005, 7: -0.7203042790640757, 8: -0.7179837655204843, 9: -0.7158861861473951}, 'beta': {0: 0.06380696059868196, 1: 0.06587083148144124, 2: 0.06765301548739955, 3: 0.05988254674384674, 4: 0.06145817651089247, 5: 0.06289718986184667, 6: 0.06936732733804774, 7: 0.0650938843333726, 8: 0.06741439787696402, 9: 0.0695119772500532}, 'newx': {0: 1439, 1: 1439, 2: 1439, 3: 1439, 4: 1439, 5: 1439, 6: 1439, 7: 1439, 8: 1439, 9: 1439}, 'newy': {0: 87, 1: 89, 2: 91, 3: 82, 4: 84, 5: 85, 6: 94, 7: 88, 8: 91, 9: 94}})

mid = 0 #not sure what mid value should be

near_threshold = 0.001

alpha_near_half_pi = df.alpha.sub(math.pi/2).abs().le(near_threshold)
alpha_near_three_half_pi = df.alpha.sub(3*math.pi/2).abs().le(near_threshold)
beta_near_half_pi = df.beta.sub(math.pi/2).abs().le(near_threshold)
beta_near_three_half_pi = df.beta.sub(3*math.pi/2).abs().le(near_threshold)

cond1 = alpha_near_half_pi | alpha_near_three_half_pi
cond2 = beta_near_half_pi | beta_near_three_half_pi
cond2 = cond2 & (~cond1) #if cond1 is true, we don't want to do cond2
cond3 = ~(cond1 | cond2) #if neither cond1 nor cond2, then we are in cond3

#Process cond1 rows
c1 = df.loc[cond1]
df.loc[cond1,'newx'] = mid
df.loc[cond1,'newy'] = np.tan(c1.beta*(c1.x-mid)+mid)

#Process cond2 rows
c2 = df.loc[cond2]
df.loc[cond2,'newy'] = np.tan(c2.alpha*(mid-c2.x)+mid)

#Process cond3 rows
c3 = df.loc[cond3]
m0 = np.tan(c3.alpha)
m1 = np.tan(c3.beta)

#                       Is this a mistake? always 0?
#                                   |
#                             --------------
x = ((m0 * c3.x - m1 * mid) - (c3.y - c3.y)) / (m0 - m1)
df.loc[cond3,'newx'] = x.astype(int)
df.loc[cond3,'newy'] = (m0 * (x - c3.x) + c3.y).astype(int)

df

3 Comments

lol you found the same possible mistake I was unsure about
glad I'm not crazy haha
yes it should be -mid, checking out both solutions, thanks both, I would like to accept both answers but I can only one :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.