Journal: Scientific data
How were cities distributed globally in the past? How many people lived in these cities? How did cities influence their local and regional environments? In order to understand the current era of urbanization, we must understand long-term historical urbanization trends and patterns. However, to date there is no comprehensive record of spatially explicit, historic, city-level population data at the global scale. Here, we developed the first spatially explicit dataset of urban settlements from 3700 BC to AD 2000, by digitizing, transcribing, and geocoding historical, archaeological, and census-based urban population data previously published in tabular form by Chandler and Modelski. The dataset creation process also required data cleaning and harmonization procedures to make the data internally consistent. Additionally, we created a reliability ranking for each geocoded location to assess the geographic uncertainty of each data point. The dataset provides the first spatially explicit archive of the location and size of urban populations over the last 6,000 years and can contribute to an improved understanding of contemporary and historical urbanization trends.
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.
Reproducible climate reconstructions of the Common Era (1 CE to present) are key to placing industrial-era warming into the context of natural climatic variability. Here we present a community-sourced database of temperature-sensitive proxy records from the PAGES2k initiative. The database gathers 692 records from 648 locations, including all continental regions and major ocean basins. The records are from trees, ice, sediment, corals, speleothems, documentary evidence, and other archives. They range in length from 50 to 2000 years, with a median of 547 years, while temporal resolution ranges from biweekly to centennial. Nearly half of the proxy time series are significantly correlated with HadCRUT4.2 surface temperature over the period 1850-2014. Global temperature composites show a remarkable degree of coherence between high- and low-resolution archives, with broadly similar patterns across archive types, terrestrial versus marine locations, and screening criteria. The database is suited to investigations of global and regional temperature variability over the Common Era, and is shared in the Linked Paleo Data (LiPD) format, including serializations in Matlab, R and Python.
Interactions between species, particularly where one is likely to be a pathogen of the other, as well as the geographical distribution of species, have been systematically extracted from various web-based, free-access sources, and assembled with the accompanying evidence into a single database. The database attempts to answer questions such as what are all the pathogens of a host, and what are all the hosts of a pathogen, what are all the countries where a pathogen was found, and what are all the pathogens found in a country. Two datasets were extracted from the database, focussing on species interactions and species distribution, based on evidence published between 1950-2012. The quality of their evidence was checked and verified against well-known, alternative, datasets of pathogens infecting humans, domestic animals and wild mammals. The presented datasets provide a valuable resource for researchers of infectious diseases of humans and animals, including zoonoses.
Wilderness areas, defined as areas free of industrial scale activities and other human pressures which result in significant biophysical disturbance, are important for biodiversity conservation and sustaining the key ecological processes underpinning planetary life-support systems. Despite their importance, wilderness areas are being rapidly eroded in extent and fragmented. Here we present the most up-to-date temporally inter-comparable maps of global terrestrial wilderness areas, which are essential for monitoring changes in their extent, and for proactively planning conservation interventions to ensure their preservation. Using maps of human pressure on the natural environment for 1993 and 2009, we identified wilderness as all ‘pressure free’ lands with a contiguous area >10,000 km2. These places are likely operating in a natural state and represent the most intact habitats globally. We then created a regionally representative map of wilderness following the well-established ‘Last of the Wild’ methodology; which identifies the 10% area with the lowest human pressure within each of Earth’s 60 biogeographic realms, and identifies the ten largest contiguous areas, along with all contiguous areas >10,000 km2.
Harmonised, representative data on the state of biological invasions remain inadequate at country and global scales, particularly for taxa that affect biodiversity and ecosystems. Information is not readily available in a form suitable for policy and reporting. The Global Register of Introduced and Invasive Species (GRIIS) provides the first country-wise checklists of introduced (naturalised) and invasive species. GRIIS was conceived to provide a sustainable platform for information delivery to support national governments. We outline the rationale and methods underpinning GRIIS, to facilitate transparent, repeatable analysis and reporting. Twenty country checklists are presented as exemplars; GRIIS Checklists for close to all countries globally will be submitted through the same process shortly. Over 11000 species records are currently in the 20 country exemplars alone, with environmental impact evidence for just over 20% of these. GRIIS provides significant support for countries to identify and prioritise invasive alien species, and establishes national and global baselines. In future this will enable a global system for sustainable monitoring of trends in biological invasions that affect the environment.
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Current measures of health and disease are often insensitive, episodic, and subjective. Further, these measures generally are not designed to provide meaningful feedback to individuals. The impact of high-resolution activity data collected from mobile phones is only beginning to be explored. Here we present data from mPower, a clinical observational study about Parkinson disease conducted purely through an iPhone app interface. The study interrogated aspects of this movement disorder through surveys and frequent sensor-based recordings from participants with and without Parkinson disease. Benefitting from large enrollment and repeated measurements on many individuals, these data may help establish baseline variability of real-world activity measurement collected via mobile phones, and ultimately may lead to quantification of the ebbs-and-flows of Parkinson symptoms. App source code for these data collection modules are available through an open source license for use in studies of other conditions. We hope that releasing data contributed by engaged research participants will seed a new community of analysts working collaboratively on understanding mobile health data to advance human health.
Our ability to predict species responses to environmental changes relies on accurate records of animal movement patterns. Continental-scale acoustic telemetry networks are increasingly being established worldwide, producing large volumes of information-rich geospatial data. During the last decade, the Integrated Marine Observing System’s Animal Tracking Facility (IMOS ATF) established a permanent array of acoustic receivers around Australia. Simultaneously, IMOS developed a centralised national database to foster collaborative research across the user community and quantify individual behaviour across a broad range of taxa. Here we present the database and quality control procedures developed to collate 49.6 million valid detections from 1891 receiving stations. This dataset consists of detections for 3,777 tags deployed on 117 marine species, with distances travelled ranging from a few to thousands of kilometres. Connectivity between regions was only made possible by the joint contribution of IMOS infrastructure and researcher-funded receivers. This dataset constitutes a valuable resource facilitating meta-analysis of animal movement, distributions, and habitat use, and is important for relating species distribution shifts with environmental covariates.
Recent years have seen substantial growth in openly available satellite and other geospatial data layers, which represent a range of metrics relevant to global human population mapping at fine spatial scales. The specifications of such data differ widely and therefore the harmonisation of data layers is a prerequisite to constructing detailed and contemporary spatial datasets which accurately describe population distributions. Such datasets are vital to measure impacts of population growth, monitor change, and plan interventions. To this end the WorldPop Project has produced an open access archive of 3 and 30 arc-second resolution gridded data. Four tiled raster datasets form the basis of the archive: (i) Viewfinder Panoramas topography clipped to Global ADMinistrative area (GADM) coastlines; (ii) a matching ISO 3166 country identification grid; (iii) country area; (iv) and slope layer. Further layers include transport networks, landcover, nightlights, precipitation, travel time to major cities, and waterways. Datasets and production methodology are here described. The archive can be downloaded both from the WorldPop Dataverse Repository and the WorldPop Project website.