Journal: Scientific data
How were cities distributed globally in the past? How many people lived in these cities? How did cities influence their local and regional environments? In order to understand the current era of urbanization, we must understand long-term historical urbanization trends and patterns. However, to date there is no comprehensive record of spatially explicit, historic, city-level population data at the global scale. Here, we developed the first spatially explicit dataset of urban settlements from 3700 BC to AD 2000, by digitizing, transcribing, and geocoding historical, archaeological, and census-based urban population data previously published in tabular form by Chandler and Modelski. The dataset creation process also required data cleaning and harmonization procedures to make the data internally consistent. Additionally, we created a reliability ranking for each geocoded location to assess the geographic uncertainty of each data point. The dataset provides the first spatially explicit archive of the location and size of urban populations over the last 6,000 years and can contribute to an improved understanding of contemporary and historical urbanization trends.
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.
The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.
Reproducible climate reconstructions of the Common Era (1 CE to present) are key to placing industrial-era warming into the context of natural climatic variability. Here we present a community-sourced database of temperature-sensitive proxy records from the PAGES2k initiative. The database gathers 692 records from 648 locations, including all continental regions and major ocean basins. The records are from trees, ice, sediment, corals, speleothems, documentary evidence, and other archives. They range in length from 50 to 2000 years, with a median of 547 years, while temporal resolution ranges from biweekly to centennial. Nearly half of the proxy time series are significantly correlated with HadCRUT4.2 surface temperature over the period 1850-2014. Global temperature composites show a remarkable degree of coherence between high- and low-resolution archives, with broadly similar patterns across archive types, terrestrial versus marine locations, and screening criteria. The database is suited to investigations of global and regional temperature variability over the Common Era, and is shared in the Linked Paleo Data (LiPD) format, including serializations in Matlab, R and Python.
Interactions between species, particularly where one is likely to be a pathogen of the other, as well as the geographical distribution of species, have been systematically extracted from various web-based, free-access sources, and assembled with the accompanying evidence into a single database. The database attempts to answer questions such as what are all the pathogens of a host, and what are all the hosts of a pathogen, what are all the countries where a pathogen was found, and what are all the pathogens found in a country. Two datasets were extracted from the database, focussing on species interactions and species distribution, based on evidence published between 1950-2012. The quality of their evidence was checked and verified against well-known, alternative, datasets of pathogens infecting humans, domestic animals and wild mammals. The presented datasets provide a valuable resource for researchers of infectious diseases of humans and animals, including zoonoses.
Wilderness areas, defined as areas free of industrial scale activities and other human pressures which result in significant biophysical disturbance, are important for biodiversity conservation and sustaining the key ecological processes underpinning planetary life-support systems. Despite their importance, wilderness areas are being rapidly eroded in extent and fragmented. Here we present the most up-to-date temporally inter-comparable maps of global terrestrial wilderness areas, which are essential for monitoring changes in their extent, and for proactively planning conservation interventions to ensure their preservation. Using maps of human pressure on the natural environment for 1993 and 2009, we identified wilderness as all ‘pressure free’ lands with a contiguous area >10,000 km2. These places are likely operating in a natural state and represent the most intact habitats globally. We then created a regionally representative map of wilderness following the well-established ‘Last of the Wild’ methodology; which identifies the 10% area with the lowest human pressure within each of Earth’s 60 biogeographic realms, and identifies the ten largest contiguous areas, along with all contiguous areas >10,000 km2.
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
Harmonised, representative data on the state of biological invasions remain inadequate at country and global scales, particularly for taxa that affect biodiversity and ecosystems. Information is not readily available in a form suitable for policy and reporting. The Global Register of Introduced and Invasive Species (GRIIS) provides the first country-wise checklists of introduced (naturalised) and invasive species. GRIIS was conceived to provide a sustainable platform for information delivery to support national governments. We outline the rationale and methods underpinning GRIIS, to facilitate transparent, repeatable analysis and reporting. Twenty country checklists are presented as exemplars; GRIIS Checklists for close to all countries globally will be submitted through the same process shortly. Over 11000 species records are currently in the 20 country exemplars alone, with environmental impact evidence for just over 20% of these. GRIIS provides significant support for countries to identify and prioritise invasive alien species, and establishes national and global baselines. In future this will enable a global system for sustainable monitoring of trends in biological invasions that affect the environment.
How climate affects species distributions is a longstanding question receiving renewed interest owing to the need to predict the impacts of global warming on biodiversity. Is climate change forcing species to live near their critical thermal limits? Are these limits likely to change through natural selection? These and other important questions can be addressed with models relating geographical distributions of species with climate data, but inferences made with these models are highly contingent on non-climatic factors such as biotic interactions. Improved understanding of climate change effects on species will require extensive analysis of thermal physiological traits, but such data are both scarce and scattered. To overcome current limitations, we created the GlobTherm database. The database contains experimentally derived species' thermal tolerance data currently comprising over 2,000 species of terrestrial, freshwater, intertidal and marine multicellular algae, plants, fungi, and animals. The GlobTherm database will be maintained and curated by iDiv with the aim to keep expanding it, and enable further investigations on the effects of climate on the distribution of life on Earth.
Current measures of health and disease are often insensitive, episodic, and subjective. Further, these measures generally are not designed to provide meaningful feedback to individuals. The impact of high-resolution activity data collected from mobile phones is only beginning to be explored. Here we present data from mPower, a clinical observational study about Parkinson disease conducted purely through an iPhone app interface. The study interrogated aspects of this movement disorder through surveys and frequent sensor-based recordings from participants with and without Parkinson disease. Benefitting from large enrollment and repeated measurements on many individuals, these data may help establish baseline variability of real-world activity measurement collected via mobile phones, and ultimately may lead to quantification of the ebbs-and-flows of Parkinson symptoms. App source code for these data collection modules are available through an open source license for use in studies of other conditions. We hope that releasing data contributed by engaged research participants will seed a new community of analysts working collaboratively on understanding mobile health data to advance human health.