0

I have the following datframe:

>>> name   ID     geom                                                geometry_error
0  Lily   1234  POLYGON ((5.351418786 7.471461148, 5.352018786...     overlap
1  Pil    3248  POLYGON ((7.351657486 9.341445548, 1.346718786...     overlap
2  Poli   9734  -                                                     -
0  Lily   1234  POLYGON ((5.351265486 2.471876538, 6.33355018786...   overlap

I want to "edit" the geometry_erro column, with a condition that if geom value is '-' , the geometry error value will be "no geometry", e.g:

>>> name   ID     geom                                                geometry_error
0  Lily   1234  POLYGON ((5.351418786 7.471461148, 5.352018786...     overlap
1  Pil    3248  POLYGON ((7.351657486 9.341445548, 1.346718786...     overlap
2  Poli   9734  -                                                     no geometry
0  Lily   1234  POLYGON ((5.351265486 2.471876538, 6.33355018786...   overlap

I have tried to do it with this:

def gg(row):
    if row['geom'] == '-':
        val = 'no geometry generated'   
    return val

df['geometry errors'] = df.apply(gg, axis=1)

>>>UnboundLocalError: local variable 'val' referenced before assignment

I don't understand why I get this error because I have used this varuabke name val in different function in the same script so why now do I get this? and is there maybe better way to do it?

4
  • your val is never initialized. your if case is never satisfied for val to get initialized Commented Sep 24, 2020 at 12:48
  • @yashshah i'm not sure I understand you Commented Sep 24, 2020 at 12:50
  • your code never goes inside the if case. so val is not initiated at all. add a default val= Commented Sep 24, 2020 at 12:53
  • but it has '-' as string in the geometry column Commented Sep 24, 2020 at 12:55

3 Answers 3

1

Use this, nice and simple. np.where is doing the test for you.

Code:

import numpy as np

# ...

df['geometry_error'] = np.where(df['geom'] == '-', 
                                'no geometry generated', 
                                df['geometry_error'])

Output:

   name    ID                                               geom  \
0  Lily  1234   POLYGON ((5.351418786 7.471461148, 5.352018786))   
1   Pil  3248   POLYGON ((7.351657486 9.341445548, 1.346718786))   
2  Poli  9734                                                  -   
3  Lily  1234  POLYGON ((5.351265486 2.471876538, 6.333550187...   

          geometry_error  
0                overlap  
1                overlap  
2  no geometry generated  
3                overlap
Sign up to request clarification or add additional context in comments.

Comments

0
df[df['geom'] == '-']['geometry_error'] = 'no geometry generated'

1 Comment

can you specify what are you doing in that statement and how it is an answer
0

A couple of approaches:

  1. Replaces all Null cases of geometery_error with 'no geometry'
df['geometry_error'] = df['geometry_error'].fillna('no geometry')
  1. Find all rows where geom == '-' and set their geometry_error to 'no geometry'
df.loc[df['geom'] == '-', 'geometry_error'] = 'no geometry'

I think your function isn't working because you need to change the indent on the return statement:

def gg(row):
    if row['geom'] == '-':
        val = 'no geometry generated'   
        return val

1 Comment

I don't know why it doesn't work, Imaybe is because something I though to be minoric- the 'geometry errors' column is not null, it has '-', I have edit my original post but still don't know why it doesn't work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.