Title: | Data for Comparing Species Distribution Modeling Methods |
---|---|
Description: | Easy access to species distribution data for 6 regions in the world, for a total of 226 anonymised species. These data are described and made available by Elith et al (2020) <doi:10.17161/bi.v15i2.13384> to compare species distribution modelling methods. |
Authors: | Robert J. Hijmans [aut] , Roozbeh Valavi [cre, aut], Jane Elith [aut] |
Maintainer: | Roozbeh Valavi <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0-1 |
Built: | 2024-11-01 05:40:22 UTC |
Source: | https://github.com/cran/disdat |
This package allows for easy use of a collection of datasets that can be used to compare species distribution models. There are data for 6 regions in the world, for a total of 226 anonymised species including birds, vascular plants, reptiles and bats. Each data set has presence-only (and optionally background) training data to build models, and presence/absence data to evaluate models.
The data were compiled and used by a species distribution modeling working group sponsored by the National Center for Ecological Analysis and Synthesis (NCEAS), at UC Santa Barbara, USA. Full details of the dataset are provided in the first publication listed below, from the NCEAS data group.
The data are fully described in the first publication listed below, and also supplied with metadata on Open Science Framework (OSF). On the OSF site, rasters (gridded data) of all environmental data are also available for download.
Package by Robert J. Hijmans, Roozbeh Valavi, and Jane Elith. Data collation and processing by the NCEAS data group (see first reference below, and the manual package for specific datasets).
The main reference for the these data is:
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
Other papers using these data include:
Dudík, M. & Phillips, S. J. (2009). Generative and Discriminative Learning with Unknown Labeling Bias. in Advances in Neural Information Processing Systems 21 (eds. Koller, D., Schuurmans, D., Bengio, Y. & Bottou, L.) 401-408. Curran Associates, Inc.
Dudík, M., Phillips, S. J. & Schapire, R. E. (2006). Correcting sample selection bias in maximum entropy density estimation. in Advances in Neural Information Processing Systems 18 (eds. Weiss, Y., Schölkopf, B. & Platt, J. C.) 323-330 (MIT Press).
Elith, J. & Leathwick, J. R. (2007). Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines. Diversity and Distributions 13, 165-175.
Elith, J., Graham, C.H., Anderson, R.P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.McC., Peterson, A.T., Phillips, S.J., Richardson, K.S., Scachetti-Pereira, R., Schapire, R.E., Soberón, J., Williams, S., Wisz, M.S., Zimmermann, N.E. (2006). Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29, 129–151
Graham, C.H., Elith, J., Hijmans, R.J., Guisan, A., Peterson, A.T., Loiselle, B.A. (2008). The influence of spatial errors in species occurrence data used in distribution models. Journal of Applied Ecology 45, 239–247.
Guisan, A., Graham, C. H., Elith, J., Huettmann, F. & NCEAS Species Distribution Modelling Group (2007). Sensitivity of predictive species distribution models to change in grain size: insights from a multi-models experiment across five continents. Diversity and Distributions 13, 332-340.
Guisan, A., Zimmermann, N. E., Elith, J., Graham, C. H., Phillips, S. P., & Peterson, A. T. (2007). What matters for predicting the occurences of trees: techniques, data, or species' characteristics? Ecological Monographs 77, 615-530.
Hijmans, R. J. (2012). Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology 93, 679-688.
Phillips, S. J. & Dudík, M. (2008). Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31, 161-175.
Phillips, S. J. & Elith, J. (2010). POC-plots: calibrating species distribution models with presence-only data. Ecology 91, 2476-2484.
Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S. (2009). Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19, 181–197.
Phillips, S. J., Anderson, R. P., Dudík, M., Schapire, R. E. & Blair, M. E. (2017). Opening the black box: an open-source release of Maxent. Ecography 40, 887-893.
Valavi, R., Elith, J., Lahoz‐Monfort, J.J. & Guillera‐Arroita, G. (2023). Flexible species distribution modelling methods perform well on spatially separated testing data. Global Ecology and Biogeography, geb.13639.
Valavi, R., Elith, J., Lahoz‐Monfort, J.J. & Guillera‐Arroita, G. (2021). Modelling species presence‐only data with random forests. Ecography, 44, 1731–1742.
Valavi, R., Guillera‐Arroita, G., Lahoz‐Monfort, J.J. & Elith, J. (2022). Predictive performance of presence‐only species distribution models: a benchmark study with reproducible code. Ecological Monographs, 92(1). 10.1002/ecm.1486
Wisz, M.S., Hijmans, R.J., Li, J., Peterson, A.T., Graham, C.H., Guisan, A., & NCEAS Species Distribution Modelling Group (2008). Effects of sample size on the performance of species distribution models. Diversity and Distributions 14, 763–773.
Species occurrence data for 40 species (20 vascular plants, 20 birds) in the Australian Wet Tropics (AWT) and associated environmental data. Full details of the dataset are provided in the reference below. There are four data sets with training (po and bg) and test (pa, env) data:
po
(training data) includes site names, species names, coordinates, occurrence ("1" for all, since all are presence records), group (plant or bird), and site values for 13 environmental variables (below).
bg
(training data) has 10000 sites selected at random across the study region. It is structured identically to po
, with "0" for occurrence (not implying absence, but denoting a background record in a way suited to most modelling methods) and NA
for group.
env
(testing data) includes group, site names, coordinates, and site values for 13 environmental variables (below). These are for sites from different surveys for plants (102 sites) and birds (340 sites), and can be returned as separate datasets by disEnv
, or in one long format dataset by disData
. These data are suited to make predictions to.
pa
(testing data) includes group, site names, coordinates, and presence-absence records, one column per species (in the wide format returned by disPa
). They can also be returned in long format using disData
. The sites are identical to the sites in env
. These data are suited to evaluating the predictions made with env
.
Raster (gridded) data for all environmental variables are available - see the reference below for details.
The coordinate reference system of the x and y coordinates is UTM, zone 55, spheroid GRS 1980, datum GDA94 (EPSG:28355).
The vignette provided with this package provides an example of how to fit and evaluate a model with these data.
Environmental variables:
Code | Description | Units | Type |
bc01 | Annual mean temperature | degrees C | Continuous |
bc04 | Temperature seasonality | dimensionless | Continuous |
bc05 | Max. temperature of warmest period | degrees C | Continuous |
bc06 | Min. temperature of coldest period | degrees C | Continuous |
bc12 | Annual precipitation | mm | Continuous |
bc15 | Precipitation seasonality | dimensionless | Continuous |
bc17 | Precipitation of driest quarter | mm | Continuous |
bc20 | Annual mean radiation | MJ/m2/day | Continuous |
bc31 | Moisture index seasonality | dimensionless | Continuous |
bc33 | Mean moisture index of lowest quarter (MI) | dimensionless | Continuous |
slope | Slope | percent | Continuous |
topo | Topographic position | 0 is a gully and 100 a ridge, 50 mid-slope | Continuous |
tri | Terrain ruggedness index | Sum of variation in a 1 km moving window | Continuous |
Environmental predictors prepared by Karen Richardson, Caroline Bruce and Catherine Graham. Species data supplied by Andrew Ford, Stephen Williams and Karen Richardson.
See the reference below for further details on source, accuracy, cleaning, and particular characteristics of these datasets.
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
awt_po <- disPo("AWT") awt_bg <- disBg("AWT") awt_pa_plant <- disPa("AWT", "plant") awt_env_plant <- disEnv("AWT", "plant") awt_pa_bird <- disPa("AWT", "bird") awt_env_bird <- disEnv("AWT", "bird") # Or all in one list awt <- disData("AWT") sapply(awt, head) disCRS("AWT")
awt_po <- disPo("AWT") awt_bg <- disBg("AWT") awt_pa_plant <- disPa("AWT", "plant") awt_env_plant <- disEnv("AWT", "plant") awt_pa_bird <- disPa("AWT", "bird") awt_env_bird <- disEnv("AWT", "bird") # Or all in one list awt <- disData("AWT") sapply(awt, head) disCRS("AWT")
Species occurrence data for 20 bird species from Ontario, a province in Canada (CAN), and associated environmental data. Full details of the dataset are provided in the reference below. There are four data sets with training (po and bg) and test (pa, env) data:
po
(training data) includes site names, species names, coordinates, occurrence ("1" for all, since all are presence records), group (bird), and site values for 11 environmental variables (below).
bg
(training data) has 10000 sites selected at random across the study region. It is structured identically to CANtrain_po, with "0" for occurrence (not implying absence, but denoting background in a way suited to most modelling methods) and "NA" for group.
env
(testing data) includes group, site names, coordinates, and site values for 11 environmental variables (below), at 14571 sites. This file is suited to making predictions.
pa
(testing data) includes group, site names, coordinates, and presence-absence records, one column per species. The sites are identical to the sites in env
. This file is suited to evaluating the predictions made to env
.
Raster (gridded) data for all environmental variables are available - see the reference below for details.
The reference system of the x and y coordinates is unprojected with Clarke 1866 ellipsoid . Latitude and longitude are in geographical coordinates using unknown datum based upon the Clarke 1866 ellipsoid (EPSG:4008).
The vignette provided with this package provides an example of how to fit and evaluate a model with these data.
Environmental variables:
Code | Description | Units | Type |
alt | Digital elevation | m | Continuous |
asp2 | Aspect | ranges from -1 to 1 (sin transformation) | Continuous |
ontprec | Annual Precipitation | mm | Continuous |
ontprec4 | April precipitation | mm | Continuous |
ontprecsd | Precipitation Seasonality | dimensionless | Continuous |
ontslp | Slope | degrees | Continuous |
onttemp | Annual mean temperature | degrees C * 10 | Continuous |
onttempsd | Temperature standard deviation | dimensionless | Continuous |
onttmin4 | April minimum temperature | degrees C * 10 | Continuous |
ontveg | Vegetation, from Ontario Land Cover Database (OLC) vegetation map, derived from a mosaic of Landsat images. | 5 classes: 1 = open forest & related; 2 = closed forest; 3 = open water, 4 = agriculture, 5 = human settlement | Categorical |
watdist | Distance from Hudson Bay | m | Continuous |
Environmental predictors prepared by Falk Huettmann, Jane Elith and Catherine Graham. Species data: PO from the Ontario Nest Records database, Royal Ontario Museum (ROM) and supplied by M. Peck to Falk Huettmann; PA from Breeding Bird Atlas for Ontario, provided by M. Cadman to Falk Huettmann.
See the reference below for further details on source, accuracy, cleaning, and particular characteristics of these datasets.
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
can_po <- disPo("CAN") can_bg <- disBg("CAN") can_pa <- disPa("CAN") can_env <- disEnv("CAN") # Or all in one list x <- disData("CAN") sapply(x, head) disCRS("CAN")
can_po <- disPo("CAN") can_bg <- disBg("CAN") can_pa <- disPa("CAN") can_env <- disEnv("CAN") # Or all in one list x <- disData("CAN") sapply(x, head) disCRS("CAN")
Get the coordinate reference system for the data of a region.
disCRS(region, format="proj4")
disCRS(region, format="proj4")
region |
character. One of "AWT", "CAN", "NSW", "NZ", "SA", "SWI" |
format |
character. Either "proj4" or "EPSG" |
character vector
disCRS("AWT") disCRS("NSW")
disCRS("AWT") disCRS("NSW")
disPo
returns the presence-only (po) data for a region
disBg
returns the background (bg) data for a region
disPa
returns the presence-absence (pa) data for a region and group
disEnv
returns the environmental (env) data for sites matching those in the pa data, for a region and group
disData
returns a list with all data for a region.
disBorder
returns a polygon for one of the regions.
disData(region) disPo(region) disBg(region) disPa(region, group) disEnv(region, group) disBorder(region, pkg="sf")
disData(region) disPo(region) disBg(region) disPa(region, group) disEnv(region, group) disBorder(region, pkg="sf")
region |
character. One of "AWT", "CAN", "NSW", "NZ", "SA", "SWI" |
group |
character. If |
pkg |
character. Either "sf" or "terra" to get polygons as defined by that package |
disData
returns a list with env
, pa
, bg
and po
data in that order. For regions with more than one group, the testing data (env and pa) will come from different surveys, and the model testing should be targeted to the relevant group. The first column of the env
and pa
data.frames is "group", which can be used to extract the correct data.
data.frame (disPo, disBg, disPa and disEnv) or list with four data.frames (disData)
awt_po <- disPo("AWT") awt_bg <- disBg("AWT") awt_pa_plants <- disPa("AWT", "plant") awt_env_plants <- disEnv("AWT", "plant") x <- disData("NSW") names(x) sapply(x, head) z <- disBorder("NSW") plot(z)
awt_po <- disPo("AWT") awt_bg <- disBg("AWT") awt_pa_plants <- disPa("AWT", "plant") awt_env_plants <- disEnv("AWT", "plant") x <- disData("NSW") names(x) sapply(x, head) z <- disBorder("NSW") plot(z)
A helper function for automatically generating maps for the species data in PDF format.
disMapBook(region, output_pdf, verbose = TRUE)
disMapBook(region, output_pdf, verbose = TRUE)
region |
A character vector. The name of the region(s) to generate plots. |
output_pdf |
Output pdf file to be saved. |
verbose |
Logical. control amount of screen reporting. |
disMapBook(c("AWT", "NSW"), "~/Desktop/sp_mapbook.pdf")
disMapBook(c("AWT", "NSW"), "~/Desktop/sp_mapbook.pdf")
Get the names of the predictor variables for a region.
disPredictors(region)
disPredictors(region)
region |
character. One of "AWT", "CAN", "NSW", "NZ", "SA", "SWI" |
character vector
disPredictors("NSW")
disPredictors("NSW")
Species occurrence data for 54 species from 8 biological groups in New South Wales (NSW, a state in Australia) and associated environmental data. Full details of the dataset are provided in the reference below. There are four data sets with training (po and bg) and test (pa, env) data:
po
(training data) includes site names, species names, coordinates, occurrence ("1" for all, since all are presence records), group [ba = bats (7 species); db = diurnal birds (8 species); nb = nocturnal birds (2 species); ot = open-forest trees (8 species); ou = open-forest understorey plants (8 species); rt = rainforest trees (7 species); ru = rainforest understorey plants (6 species); sr = small reptiles (8 species)], and site values for 13 environmental variables (below).
bg
(training data) has 10000 sites selected at random across the study region. It is structured identically to po
, with "0" for occurrence (not implying absence, but denoting a background record in a way suited to most modelling methods) and NA
for group.
env
(testing data) includes group, site names, coordinates, and site values for 13 environmental variables (below). These are for sites from different surveys for each biological group (from 570 to 2075 sites per group), and can be returned as separate datasets by disEnv
, or in one long format dataset by disData
. This set of files is suited to making predictions.
pa
(testing data) includes group, site names, coordinates, and presence-absence records, one column per species (in the wide format returned by disPa
). They can also be returned in long format using disData
. The sites are identical to the sites in env
. These data are suited to evaluating the predictions made with env
.
Raster (gridded) data for all environmental variables are available - see the reference below for details.
The reference system of the x and y coordinates is unprojected. Latitude and longitude are in geographical coordinates using the WGS84 datum (EPSG:4326).
The vignette provided with this package provides an example of how to fit and evaluate a model with these data.
Environmental variables:
Code | Description | Units | Type |
cti | "compound topographic index" - a quantification of the position of a site in the local landscape. It is often referred to as the steady state wetness index and it is defined as: CTI = ln ( As / tanB ) where 'As' is the specific catchment area expressed as m2 per unit width orthogonal to the flow direction and 'B' is the slope angle | Continuous | |
disturb | disturbance (clearing, logging etc) index. | 1 = light, 2 = moderate, 3 = heavy | Continuous |
mi | moisture index. Index of site wetness derived from a water balance algorithm using rainfall, evaporation, radiation and soil depth as inputs | Between 0 (dry) and 100 (wet) | Continuous |
rainann | mean annual rainfall | mm | Continuous |
raindq | mean rainfall of the driest quarter | mm | Continuous |
rugged | ruggedness. Coefficient of variation of grid cells within 1km of cell of interest | percent | Continuous |
soildepth | mean soil depth predicted from a model relating sampled soil depths to climate, geology and topography | m * 1000 | Continuous |
soilfert | soil fertility ordinal class, derived from soil maps and modeling of geochemical data | 1 (low) to 5 (high) | Continuous |
solrad | annual mean solar radiation (terrain adjusted) | MJm^-2day^-1 * 10 | Continuous |
tempann | annual mean temperature | degrees C * 10 | Continuous |
tempmin | minimum temperature of the coldest month | degrees C * 10 | Continuous |
topo | topographic position. Mean difference in elevation between grid cell of interest and all cells within 1km radius (-ve values are gullies, +ve are ridges) | m | Continuous |
vegsys | broad vegetation type | 1 = Rainforest, 2 = Moist open forest, 3 = Dry open forest, 4 = Woodland, 5 = Coastal sclerophyll complex, 6 = Plateau sclerophyll complex, 7 = Disturbed remnant, 8 = Exotic (pine) plantation, 9 = Cleared | Categorical |
All data were compiled and provided by Simon Ferrier and colleagues.
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
nsw_po <- disPo("NSW") nsw_bg <- disBg("NSW") nsw_pa_bat <- disPa("NSW", "ba") nsw_env_bat <- disEnv("NSW", "ba") nsw_pa_reptile <- disPa("NSW", "sr") nsw_env_reptile <- disEnv("NSW", "sr") # Or all in one list nsw <- disData("NSW") sapply(nsw, head) disCRS("NSW")
nsw_po <- disPo("NSW") nsw_bg <- disBg("NSW") nsw_pa_bat <- disPa("NSW", "ba") nsw_env_bat <- disEnv("NSW", "ba") nsw_pa_reptile <- disPa("NSW", "sr") nsw_env_reptile <- disEnv("NSW", "sr") # Or all in one list nsw <- disData("NSW") sapply(nsw, head) disCRS("NSW")
Species occurrence data for 52 vascular plant species - mostly trees and shrubs from indigenous forests - in New Zealand (NZ), and associated environmental data. Full details of the dataset are provided in the reference below. There are four data sets with training (po and bg) and test (pa, env) data:
po
(training data) includes site names, species names, coordinates, occurrence ("1" for all, since all are presence records), group (plant), and site values for 13 environmental variables (below).
bg
(training data) has 10000 sites selected at random across the study region. It is structured identically to po
, with "0" for occurrence (not implying absence, but denoting a background record in a way suited to most modelling methods) and NA
for group.
env
(testing data) includes group, site names, coordinates, and site values for 13 environmental variables (below), at 19120 sites. These data are suited to making predictions.
pa
(testing data) includes group, site names, coordinates, and presence-absence records, one column per species. The sites are identical to the sites in env
. This file is suited to evaluating the predictions made to env
.
Raster (gridded) data for all environmental variables are available - see the reference below for details.
The coordinate reference system of the x and y coordinates is New Zealand Map Grid (NZMG), Datum: NZGD49 (New Zealand Geodetic Datum 1949), Ellipsoid: International 1924 (EPSG:27200).
The vignette provided with this package provides an example of how to fit and evaluate a model with these data.
Environmental variables:
Code | Description | Units | Type |
age | 3 classes (0 to 2): <2000, 2000-postglacial (app. 30,000), and pre-glacial | number (category) | Categorical |
deficit | Mean October vapor pressure deficit at 0900 hours | kPa | Continuous |
dem | Elevation | meters | Continuous |
hillshade | Hill shading (as surrogate for slope and aspect) | index of brightness | Continuous |
mas | Mean annual solar radiation | Mj/m2/day | Continuous |
mat | Mean annual temperature | degrees C * 10 | Continuous |
r2pet | Average monthly ratio of rainfall and potential evapotranspiration (ratio) | none | Continuous |
rain | annual precipitation | mm | Continuous |
slope | Slope | degrees | Continuous |
sseas | Solar radiation seasonality | dimensionless | Continuous |
toxicats | Toxic Cations in classes: 0=low, 1=intermediate, 2=high | number (category) | Categorical |
tseas | Temperature seasonality | degrees C | Continuous |
vpd | Mean October vapor pressure deficit at 9 AM | kPa | Continuous |
Environmental predictors provided by Jake Overton. Species data supplied by Jake Overton and Susan Wiser, from Allan Herbarium and National Vegetation Survey databank.
See the reference below for further details on source, accuracy, cleaning, and particular characteristics of these datasets.
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
nz_po <- disPo("NZ") nz_bg <- disBg("NZ") nz_pa <- disPa("NZ") nz_env <- disEnv("NZ") x <- disData("NZ") sapply(x, head) disCRS("NZ")
nz_po <- disPo("NZ") nz_bg <- disBg("NZ") nz_pa <- disPa("NZ") nz_env <- disEnv("NZ") x <- disData("NZ") sapply(x, head) disCRS("NZ")
Species occurrence data for 30 vascular plant species (all from the Bignoniaceae family) from Continental Brazil, Ecuador, Colombia, Bolivia, and Peru, South America (SA), and associated environmental data. Full details of the dataset are provided in the reference below. There are four data sets with training (po and bg) and test (pa, env) data:
po
(training data) includes site names, species names, coordinates, occurrence ("1" for all, since all are presence records), group (plant), and site values for 11 environmental variables (below).
bg
(training data) has 10000 sites selected at random across the study region. It is structured identically to po
, with "0" for occurrence (not implying absence, but denoting background in a way suited to most modelling methods) and NA
for group.
env
(testing data) includes group, site names, coordinates, and site values for 11 environmental variables (below), at 152 sites. This file is suited to making predictions.
pa
(testing data) includes group, site names, coordinates, and presence-absence records, one column per species. The sites are identical to the sites in env
. This file is suited to evaluating the predictions made to env
.
Raster (gridded) data for all environmental variables are available - see the reference below for details.
The coordinate reference system of the x and y coordinates is longitude, latitude, with the WGS84 datum (EPSG:4326).
The vignette provided with this package provides an example of how to fit and evaluate a model with these data.
Environmental variables (extracted from WorldClim):
Code | Description | Units | Type |
sabio1 | Annual mean temperature | degrees C * 10 | Continuous |
sabio2 | Mean Diurnal Range (Mean of monthly (max temp - min temp)) | degrees C * 10 | Continuous |
sabio4 | Temperature Seasonality (standard deviation *100) | dimensionless | Continuous |
sabio5 | Max Temperature of Warmest Month | degrees C * 10 | Continuous |
sabio6 | Min Temperature of Coldest Month | degrees C * 10 | Continuous |
sabio7 | Temperature Annual Range | degrees C * 10 | Continuous |
sabio8 | Mean Temperature of Wettest Quarter | mm | Continuous |
sabio12 | Annual Precipitation | mm | Continuous |
sabio15 | Precipitation Seasonality (Coefficient of Variation) | mm | Continuous |
sabio17 | Precipitation of Driest Quarter | mm | Continuous |
sabio18 | Precipitation of Warmest Quarter | mm | Continuous |
Environmental data prepared by Bette Loiselle, Lucia Lohmann and Catherine Graham. Species supplied by Bette Loiselle and Lucia Lohmann. PO data from the Missouri Botanical Gardens database and Lucia Lohmann; PA data collected by Al Gentry.
See the reference below for further details on source, accuracy, cleaning, and particular characteristics of these datasets.
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
sa_po <- disPo("SA") sa_bg <- disBg("SA") sa_pa <- disPa("SA") sa_env <- disEnv("SA") x <- disData("SA") sapply(x, head) disCRS("SA")
sa_po <- disPo("SA") sa_bg <- disBg("SA") sa_pa <- disPa("SA") sa_env <- disEnv("SA") x <- disData("SA") sapply(x, head) disCRS("SA")
Species occurrence data for 30 tree species in Switzerland (SWI, a country in Europe) and associated environmental data. Full details of the dataset are provided in the reference below. There are four data sets with training (po and bg) and test (pa, env) data:
po
(training data) includes site names, species names, coordinates, occurrence ("1" for all, since all are presence records), group (tree), and site values for 13 environmental variables (below).
bg
(training data) has 10000 sites selected at random across the study region. It is structured identically to po
, with "0" for occurrence (not implying absence, but denoting background in a way suited to most modelling methods) and NA
for group.
env
(testing data) includes group, site names, coordinates, and site values for 13 environmental variables (below), at 10103 sites. This file is suited to making predictions.
pa
(testing data) includes group, site names, coordinates, and presence-absence records, one column per species. The sites are identical to the sites in env
. This file is suited to evaluating the predictions made to env
.
Raster (gridded) data for all environmental variables are available - see the reference below for details.
The reference system of the x and y coordinates is Transverse, spheroid Bessel (EPSG:21781) (note all SWI data has a constant shift applied).
The vignette provided with this package provides an example of how to fit and evaluate a model with these data.
Environmental variables:
Code | Description | Units | Type |
bcc | Broadleaved continuous cover (based on Landsat images) | percentage | Continuous |
calc | Bedrock is strictly calcareous | 1 (yes) or 0 (no) | Categorical |
ccc | Coniferous continuous cover (based on Landsat images) | percentage | Continuous |
ddeg | Growing degree-days above a threshold of 0 degrees C | degrees C * days | Continuous |
nutri | Soil nutrients index between 0-45 | D mval/cm2 | Continuous |
pdsum | Number of days with rainfall higher than 1 mm | ndays | Continuous |
precyy | Average yearly precipitation sum | mm | Continuous |
sfro | Summer Frost Frequency | days | Continuous |
slope | Slope | degrees x 10 | Continuous |
sradyy | Potential yearly global radiation (daily average) | (kJ/m2)/day | Continuous |
swb | Site water balance | mm | Continuous |
tavecc | Average temperature of the coldest month | degrees C | Continuous |
topo | Topographic position | dimensionless | Continuous |
Environmental predictors supplied by Niklaus E. Zimmermann. Species data supplied by Niklaus E. Zimmermann, Thomas Wohlgemuth and Meinrad Abegg.
See the reference below for further details on source, accuracy, cleaning, and particular characteristics of these datasets.
Elith, J., Graham, C.H., Valavi, R., Abegg, M., Bruce, C., Ferrier, S., Ford, A., Guisan, A., Hijmans, R.J., Huettmann, F., Lohmann, L.G., Loiselle, B.A., Moritz, C., Overton, J.McC., Peterson, A.T., Phillips, S., Richardson, K., Williams, S., Wiser, S.K., Wohlgemuth, T. & Zimmermann, N.E., (2020). Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity Informatics 15:69-80.
swi_po <- disPo("SWI") swi_bg <- disBg("SWI") swi_pa <- disPa("SWI") swi_env <- disEnv("SWI") x <- disData("SWI") sapply(x, head) disCRS("SWI")
swi_po <- disPo("SWI") swi_bg <- disBg("SWI") swi_pa <- disPa("SWI") swi_env <- disEnv("SWI") x <- disData("SWI") sapply(x, head) disCRS("SWI")