As Robin Zigmond says, the concept of tail recursion doesn't apply in the same way in Haskell as it does in non-lazy languages. In languages with non-lazy semantics (so not Haskell), what you can do to achieve tail recursion is to move the expression that causes stack use to an accumulating argument like so:
h :: Int -> Int
h x = if x > 20 then 50 else x*x + h (x+1)
g :: Int -> Int
g z = g' z 50 where
g' x y
| x > 20 = y
| otherwise = g' (x+1) (x*x + y)
Here the outer expression of the function body of g' is a call to itself, so if this were a non-lazy language, you wouldn't need to preserve the stack frames of old recursive calls before resolving the x*x + ... part of the expression. In Haskell this evaluates differently, though.
Comparing your h and this g in a micro-benchmark,
module Main where
import Criterion
import Criterion.Main
main :: IO ()
main = defaultMain [ bgroup "tail-recursion" [ bench "h" $ nf h 1
, bench "g" $ nf g 1
]
]
you actually get worse performance from this g':
benchmarking tail-recursion/h
time 826.7 ns (819.1 ns .. 834.7 ns)
0.993 R² (0.988 R² .. 0.997 R²)
mean 911.1 ns (866.4 ns .. 971.9 ns)
std dev 197.7 ns (149.3 ns .. 241.3 ns)
benchmarking tail-recursion/g
time 1.742 μs (1.730 μs .. 1.752 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.742 μs (1.729 μs .. 1.758 μs)
std dev 47.44 ns (34.69 ns .. 66.29 ns)
You can get some of that performance back by making the arguments of g' strict,
{-# LANGUAGE BangPatterns #-}
g2 :: Int -> Int
g2 z = g' z 50 where
g' !x !y
| x > 20 = y
| otherwise = g' (x+1) (x*x + y)
but it both looks and performs worse than the original h:
benchmarking tail-recursion/g2
time 1.340 μs (1.333 μs .. 1.349 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.344 μs (1.336 μs .. 1.355 μs)
std dev 33.40 ns (24.71 ns .. 48.94 ns)
Edit: As K. A. Buhr pointed out, I forgot the -O2 flag for GHC; doing this provides the following micro-benchmark results:
h time: 54.27 ns (48.05 ns .. 61.24 ns)
g time: 24.50 ns (21.15 ns .. 27.35 ns)
g2 time: 25.47 ns (22.19 ns .. 29.06 ns)
at which point the accumulating argument version does perform better and the BangPatterns version only performs as well, but both look worse than the original.
So a moral when trying to optimize code in general: Don't do it prematurely. And a moral when trying to optimize Haskell code in particular: You won't necessarily know that it matters until you try, and usually the most abstract solution that relies on library functions performs well.