Simplify haskell function

Question

I'm really struggling with Haskell atm.

It took me almost 6 hours to write a function that does what I want. Unfortunately I'm not satisfied with the look of it.

Could someone please give me any hints how to rewrite it?

get_connected_area :: Eq generic_type => [[generic_type]] -> (Int, Int) -> [(Int,Int)] -> generic_type -> [(Int,Int)]
get_connected_area habitat point area nullValue
  | elem point area = area
  | not ((fst point) >= 0) = area
  | not ((snd point) >= 0) = area
  | not ((fst point) < (length habitat)) = area
  | not ((snd point) < (length (habitat!!0))) = area
  | (((habitat!!(fst point))!!(snd point))) == nullValue = area
  | otherwise = 
    let new_area = point : area
    in 
    get_connected_area habitat (fst point+1, snd point) (
        get_connected_area habitat (fst point-1, snd point) (
            get_connected_area habitat (fst point, snd point+1) (
                get_connected_area habitat (fst point, snd point-1) new_area nullValue
                ) nullValue
            ) nullValue
    ) nullValue

The function get's a [[generic_type]] (representing a landscape-map) and searches the fully connected area around a point that isn't equal to the given nullValue.

Eg.:

If the function gets called like this:

get_connected_area [[0,1,0],[1,1,1],[0,1,0],[1,0,0]] (1,1) [] 0

That literally means

0 1 0
1 1 1
0 1 0
1 0 0

Represents a map (like google maps). Start from the point (coordinates) (1,1) I want to get all coordinates of the elements that form a connected area with the given point.

The result therefore should be:

0 1 0
1 1 1
0 1 0
1 0 0

And the corresponting return value (list of coordinates of bold 1s):

[(2,1),(0,1),(1,2),(1,0),(1,1)]

Start by explaining in words what this function is supposed to do. — Code-Apprentice
– Code-Apprentice, Commented Sep 3, 2017 at 16:39
@Code-Apprentice I hope now it's a little bit more obvious what i want to achieve — infotoni91
– infotoni91, Commented Sep 3, 2017 at 16:49
p.s. Since you have working code, your question is more appropriate at Code Review. You might find more tips there since this is exactly the type of question they expect. — Code-Apprentice
– Code-Apprentice, Commented Sep 3, 2017 at 17:16
I just saw another simplification. See the last sentence in my most recent answer. — Code-Apprentice
– Code-Apprentice, Commented Sep 4, 2017 at 5:12

Code-Apprentice · Accepted Answer · 2017-09-04 05:12:29Z

8

One small change is that you can use pattern matching for the variable point. This means you can use (x, y) instead of point in the function declaration:

get_connected_area habitat (x, y) area nullValue = ...

Now everywhere you have fst point, just put x, and everywhere you have snd point, put y.

Another modification is to use more variables for subexpressions. This can help with the nested recursive calls. For example, make a variable for the inner-most nested call:

....
where foo = get_connected_area habitat (x, y-1) new_area nullValue

Now just put foo instead of the call. This technique can now be repeated for the "new" inner-most call. (Note that you should pick a more descriptive name than foo. Maybe down?)

Note that not (x >= y) is the same as x < y. Use this to simplify all of the conditions. Since these conditions test if a point is inside a bounding rectangle, most of this logic can be factored to a function isIn :: (Int, Int) -> (Int, Int) -> (Int, Int) -> Bool which will make get_connected_area more readable.

edited Sep 4, 2017 at 5:12

answered Sep 3, 2017 at 16:53

Code-Apprentice

84k26 gold badges162 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

infotoni91 Over a year ago

Thanks! Would be nice to avoid the 4 nested functions :/

Code-Apprentice Over a year ago

@infotoni91 I was looking at that, too. The only thing that comes to mind is breaking them out into a let or where clause. This doesn't avoid the "nesting" but might make it easier to read.

David Young Over a year ago

@infotoni91 It's maybe a bit debatable whether this is clearer, but you could replace the nested calls with foldr like this:

foldr (\curr_point curr_area -> get_connected_area  habitat curr_point curr_area nullValue) new_area [(x + 1, y), (x - 1, y), (x, y + 1), (x, y - 1)]

. Also, if you change the nullValue argument so that it comes before the last two arguments, you can write (making nullValue the first argument): foldr (get_connected_area nullValue habitat) new_area [(x + 1, y), (x - 1, y), (x, y + 1), (x, y - 1)]

jberryman · Accepted Answer · 2017-09-03 22:15:09Z

This would be my first quick pass through the function, and sort of the minimum that might pass a code review (just in terms of style):

getConnectedArea :: Eq a => [[a]] -> a -> (Int, Int) -> [(Int,Int)] -> [(Int,Int)]
getConnectedArea habitat nullValue = go where
  go point@(x,y) area
      | elem point area = area
      | x < 0 = area
      | y < 0 = area
      | x >= length habitat = area
      | y >= length (habitat!!0) = area
      | ((habitat!!x)!!y) == nullValue = area
      | otherwise = 
          foldr go (point : area) 
            [ (x+1, y), (x-1, y), (x, y+1), (x, y-1) ]

We bind habitat and nullValue once at the top level (clarifying what the recursive work is doing), remove indirection in the predicates, use camel-case (underdashes obscure where function application is happening), replace generic_type with a (using a noisy variable here actually has the opposite effect from the one you intended; I end up trying to figure out what special semantics you're trying to call out when the interesting thing is that the type doesn't matter (so long as it can be compared for equality)).

At this point there are lots of things we can do:

pretend we're writing real code and worry about asymptotics of treating lists as arrays (!!, and length) and sets (elem), and use proper array and set data structures instead
move your bounds checking (and possible null value checking) into a new lookup function (the goal being to have only a single ... = area clause if possible
consider improvements to the algorithm: can we avoid recursively checking the cell we just came from algorithmically? can we avoid passing area entirely (making our search nicely lazy/"productive")?

luqui · Accepted Answer · 2017-09-04 22:54:31Z

2

Here is my take:

import qualified Data.Set as Set

type Point = (Int, Int)

getConnectedArea :: (Point -> Bool) -> Point -> Set.Set Point
getConnectedArea habitat = \p -> worker p Set.empty  
          -- \p is to the right of = to keep it out of the scope of the where clause
    where
    worker p seen
      | p `Set.member` seen = seen
      | habitat p = foldr worker (Set.insert p seen) (neighbors p)
      | otherwise = seen

    neighbors (x,y) = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]

What I've done

foldr over the neighbors, as some commenters suggested.
Since the order of points doesn't matter, I use a Set instead of a list, so it's semantically a better fit and faster to boot.
Named some helpful intermediate abstractions such as Point and neighbors.
A better data structure for the habitat would also be good, since lists are linear time to access, maybe a 2D Data.Array—but as far as this function cares, all you need is an indexing function Point -> Bool (out of bounds and null value are treated the same), so I've replaced the data structure parameter with the indexing function itself (this is a common transformation in FP).

We can see that it would also be possible to abstract away the neighbors function and then we would arrive at a very general graph traversal method

traverseGraph :: (Ord a) => (a -> [a]) -> a -> Set.Set a

in terms of which you could write getConnectedArea. I recommend doing this for educational purposes—left as an exercise.

EDIT

Here's an example of how to call the function in terms of (almost) your old function:

import Control.Monad ((<=<))

-- A couple helpers for indexing lists.

index :: Int -> [a] -> Maybe a
index _ [] = Nothing
index 0 (x:_) = x
index n (_:xs) = index (n-1) xs

index2 :: (Int,Int) -> [[a]] -> Maybe a
index2 (x,y) = index x <=< index y
-- index2 uses Maybe's monadic structure, and I think it's quite pretty. 
-- But if you're not ready for that, you might prefer
index2' (x,y) xss
  | Just xs <- index y xss = index x xs
  | otherwise = Nothing

getConnectedArea' :: (Eq a) => [[a]] -> Point -> a -> [a]
getConnectedArea' habitat point nullValue = Set.toList $ getConnectedArea nonnull point
    where
    nonnull :: Point -> Bool
    nonnull p = case index2 p habitat of
                  Nothing -> False
                  Just x -> x /= nullValue

edited Sep 4, 2017 at 22:54

answered Sep 4, 2017 at 4:16

luqui

60.7k12 gold badges153 silver badges210 bronze badges

4 Comments

infotoni91 Over a year ago

Thanks, although I don't even know how to call your implementation. Any hint on the worker function? I'm feeling like 10 years ago when I started programming. Haskell and FP are completely new to me...sorry.

luqui Over a year ago

@infotoni91 Ok I gave you an example. And yes that is what it is like, I went through the same thing. You'll get the hang of it in due time.

dfeuer Over a year ago

I'm always a bit wary of Set performance. It might be interesting to benchmark Set (Int,Int) against IntMap IntSet, but of course that would require a large and realistic test case.

dfeuer Over a year ago

One more thing: for finite xs, foldr c n xs = foldl (flip c) n (reverse xs). This suggests that perhaps you should use foldl', swap the order of arguments to worker, and if necessary hand-reverse the list in the definition of neighbors.

Redu · Accepted Answer · 2017-09-05 19:58:54Z

OK i will try to simplify your code. However there are already good answers and that's why i will tackle this with a slightly more conceptual approach.

I believe you could chose better data types. For instance Data.Matrix seems to provide an ideal data type in the place of your [[generic_type]] type. Also for coordinates i wouldn't chose a tuple type since tuple type is there to pack different types. It's functor and monad instances are not very helpful when it is chosen as a coordinate system. Yet since it seems Data.Matrix is just happy with tuples as coordinates i will keep them.

OK your rephrased code is as follows;

import Data.Matrix

gca :: Matrix Int -> (Int, Int) -> Int -> [(Int,Int)]
gca fld crd nil = let nbs = [id, subtract 1, (+1)] >>= \f -> [id, subtract 1, (+1)]
                                                    >>= \g -> return (f,g)
                                                     >>= \(f,g) -> return ((f . fst) crd, (g . snd) crd)
                  in filter (\(x,y) -> fld ! (x,y) /= nil) nbs

*Main> gca (fromLists [[0,1,0],[1,1,1],[0,1,0],[1,0,0]]) (2,2) 0
[(2,2),(2,1),(2,3),(1,2),(3,2)]

The first thing to note is, the Matrix data type is index 1 based. So we have our center point at (2,2).

The second is... we have a three element list of functions defined as [id, subtract 1, (+1)]. The contained functions are all Num a => a -> a type and i need them to define the surrounding pixels of the given coordinate including the given coordinate. So we have a line just like if we did;

[1,2,3] >>= \x -> [1,2,3] >>= \y -> return [x,y] would result [[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1],[3,2],[3,3]] which, in our case, would yield a 2 combinations of all functions in the place of the numbers 1,2 and 3.

Which then we apply to our given coordinate one by one with a cascading instruction

>>= \[f,g] -> return ((f . fst) crd, (g . snd) crd)

which yields all neighboring coordinates.

Then its nothing more than filtering the neighboring filters by checking if they are not equal to the nil value within out matrix.

Collectives™ on Stack Overflow

Simplify haskell function

4 Answers 4

3 Comments

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related