# Species distribution modelling: Data analysis in ecology

Species distribution models (or SDM's) are used to explore how the occurrence of a species is related to the environment, and how a species might respond to changes in its environment. This can help find new locations where a rare species might be found, or understand the potential threats to a species due to urban encroachment, climate change, or other causes.

For example, here is a model of the distribution of the Sydney Red Gum (Angophora costata) in the Sydney basin, as a function of climate variables and fire history. The model was constructed using a Poisson point process model, which models the "intensity" $\lambda$ of A. costata as a function of environmental variables $(X_1, X_2, ..., X_p)$: $\log(\lambda) = \beta_0 + \beta_{11} X_1 + \beta_{12} X_1^2 + \beta_{21} X_2 + \beta_{22} X_2^2 + \cdots + \beta_{p1} X_p + \beta_{p2} X_p^2.$

We fit this model by taking a set of locations where A. costata is found and comparing it to a map of environmental variables, to see what environments the species tends to be associated with. The parameters that describe this relationship, namely $\boldsymbol { \beta } =( \beta_0,\beta_{11}, \cdots, \beta_{p2} )$ are estimated via a common statistical procedure known as maximum likelihood, meaning that we need to find the vector $\boldsymbol{ \beta }$ that maximise the following function:

$l( \boldsymbol { \beta } ) = \sum_i \log( \lambda_i) - \int \lambda \mathrm{d} y.$

This function involves an integral which doesn't have a closed-form solution in the general case, so we estimate it via numerical integration (like using the trapezoidal rule, or Simpson's rule). This can get tricky in practice, and in the A. costata example we needed a total of 86,800 function evaluations to get a good estimate of the integral: sounds hard, but a modern computer can do this almost instantly. As you can see, fitting a species distribution model involves an interesting mix of different mathematical tools, and a bit of computing!

Methodology for constructing species distribution models is an exciting interdisciplinary research front in which contributions are being made by statisticians, computer scientists and ecologists. The Eco-Stats group, together with collaborators at the Australian Museum and elsewhere around Australia, are developing methods that can predict species distribution at a spatial resolution finer than ever before and with greater functionality than previously.