I'm looking at the amount of carbon in seven forest pools. For dead trees left on the landscape across many locations and over several harvest retention (logging) treatments, there is an extreme value that I happen to know is real.
Data is fairly zero inflated (10 of 190 obs) and right skewed.
Min. = 0.0000
1st Qu. = 0.1733
Median = 0.6664
Mean = 7.0793
3rd Qu. = 3.2283
Max. = 468.9519
Histogram of data without outlier:

A massive coastal old growth snag results in a plot having 469 Mg C ha−1 in the dead trees when the next most C-rich measurement is 83 Mg C ha−1. This is a real tree in my actual plot, but it completely skews the estimates of my GLMMs away from meaningful inference of the rest the data. It's random that this tree wound up in this particular treatment as plots were randomly assigned treatments. It is not random that a tree is at this location because it is our most southern/humid research forest.
How do you handle a totally real but seriously destructive outlier?
