Discover the most talked about and latest scientific content & concepts.

Journal: Scientific data


How were cities distributed globally in the past? How many people lived in these cities? How did cities influence their local and regional environments? In order to understand the current era of urbanization, we must understand long-term historical urbanization trends and patterns. However, to date there is no comprehensive record of spatially explicit, historic, city-level population data at the global scale. Here, we developed the first spatially explicit dataset of urban settlements from 3700 BC to AD 2000, by digitizing, transcribing, and geocoding historical, archaeological, and census-based urban population data previously published in tabular form by Chandler and Modelski. The dataset creation process also required data cleaning and harmonization procedures to make the data internally consistent. Additionally, we created a reliability ranking for each geocoded location to assess the geographic uncertainty of each data point. The dataset provides the first spatially explicit archive of the location and size of urban populations over the last 6,000 years and can contribute to an improved understanding of contemporary and historical urbanization trends.

Concepts: Statistics, Chronology, Demography, Geographic information system, City, Urban area, Urbanization, Anno Domini


Reproducible climate reconstructions of the Common Era (1 CE to present) are key to placing industrial-era warming into the context of natural climatic variability. Here we present a community-sourced database of temperature-sensitive proxy records from the PAGES2k initiative. The database gathers 692 records from 648 locations, including all continental regions and major ocean basins. The records are from trees, ice, sediment, corals, speleothems, documentary evidence, and other archives. They range in length from 50 to 2000 years, with a median of 547 years, while temporal resolution ranges from biweekly to centennial. Nearly half of the proxy time series are significantly correlated with HadCRUT4.2 surface temperature over the period 1850-2014. Global temperature composites show a remarkable degree of coherence between high- and low-resolution archives, with broadly similar patterns across archive types, terrestrial versus marine locations, and screening criteria. The database is suited to investigations of global and regional temperature variability over the Common Era, and is shared in the Linked Paleo Data (LiPD) format, including serializations in Matlab, R and Python.

Concepts: Ice, Climate, Weather, Climate change, Degrees of freedom, Ocean, Global warming, Latitude


Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure.


An extensive new multi-proxy database of paleo-temperature time series (Temperature 12k) enables a more robust analysis of global mean surface temperature (GMST) and associated uncertainties than was previously available. We applied five different statistical methods to reconstruct the GMST of the past 12,000 years (Holocene). Each method used different approaches to averaging the globally distributed time series and to characterizing various sources of uncertainty, including proxy temperature, chronology and methodological choices. The results were aggregated to generate a multi-method ensemble of plausible GMST and latitudinal-zone temperature reconstructions with a realistic range of uncertainties. The warmest 200-year-long interval took place around 6500 years ago when GMST was 0.7 °C (0.3, 1.8) warmer than the 19th Century (median, 5th, 95th percentiles). Following the Holocene global thermal maximum, GMST cooled at an average rate -0.08 °C per 1000 years (-0.24, -0.05). The multi-method ensembles and the code used to generate them highlight the utility of the Temperature 12k database, and they are now available for future use by studies aimed at understanding Holocene evolution of the Earth system.


Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.

Concepts: DNA, Genetics, Molecular biology, Organism, Animal, Tardigrade, Trehalose, Cryptobiosis


High-resolution, easily accessible paleoclimate data are essential for environmental, evolutionary, and ecological studies. The availability of bioclimatic layers derived from climatic simulations representing conditions of the Late Pleistocene and Holocene has revolutionized the study of species responses to Late Quaternary climate change. Yet, integrative studies of the impacts of climate change in the Early Pleistocene and Pliocene - periods in which recent speciation events are known to concentrate - have been hindered by the limited availability of downloadable, user-friendly climatic descriptors. Here we present PaleoClim, a free database of downscaled paleoclimate outputs at 2.5-minute resolution (~5 km at equator) that includes surface temperature and precipitation estimates from snapshot-style climate model simulations using HadCM3, a version of the UK Met Office Hadley Centre General Circulation Model. As of now, the database contains climatic data for three key time periods spanning from 3.3 to 0.787 million years ago: the Marine Isotope Stage 19 (MIS19) in the Pleistocene (~787 ka), the mid-Pliocene Warm Period (~3.264-3.025 Ma), and MIS M2 in the Late Pliocene (~3.3 Ma).


The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

Concepts: Electron, Chemistry, Computational chemistry, Density functional theory, Quantum chemistry, Molecular orbital, Standard Model, Theoretical chemistry


This paper presents the first global map of food systems sustainability based on a rigorous protocol. The choice of the metric dimensions, as well as the individual indicators included in the metric, were initially identified from a thorough review of the existing literature. A rigorous inclusion/exclusion protocol was then used to refine the list and shorten it to a sub-set of 27 indicators. An aggregate sustainability score was then computed based on those 27 indicators organized into four dimensions: environment, social, food security & nutrition and economic. The paper shows how the availability of data (or lack therefore) results in an unavoidable trade-off between number of indicators and number of countries, and highlights how optimization can be used to present the most robust metric possible given the existence of this trade-offs in the data space. The process results in the computation of a global sustainability map covering 97 countries and 20 indicators. The sustainability scores obtained for each country are made available over the entire range of indicators.


A comprehensive database of paleoclimate records is needed to place recent warming into the longer-term context of natural climate variability. We present a global compilation of quality-controlled, published, temperature-sensitive proxy records extending back 12,000 years through the Holocene. Data were compiled from 679 sites where time series cover at least 4000 years, are resolved at sub-millennial scale (median spacing of 400 years or finer) and have at least one age control point every 3000 years, with cut-off values slackened in data-sparse regions. The data derive from lake sediment (51%), marine sediment (31%), peat (11%), glacier ice (3%), and other natural archives. The database contains 1319 records, including 157 from the Southern Hemisphere. The multi-proxy database comprises paleotemperature time series based on ecological assemblages, as well as biophysical and geochemical indicators that reflect mean annual or seasonal temperatures, as encoded in the database. This database can be used to reconstruct the spatiotemporal evolution of Holocene temperature at global to regional scales, and is publicly available in Linked Paleo Data (LiPD) format.


Wilderness areas, defined as areas free of industrial scale activities and other human pressures which result in significant biophysical disturbance, are important for biodiversity conservation and sustaining the key ecological processes underpinning planetary life-support systems. Despite their importance, wilderness areas are being rapidly eroded in extent and fragmented. Here we present the most up-to-date temporally inter-comparable maps of global terrestrial wilderness areas, which are essential for monitoring changes in their extent, and for proactively planning conservation interventions to ensure their preservation. Using maps of human pressure on the natural environment for 1993 and 2009, we identified wilderness as all ‘pressure free’ lands with a contiguous area >10,000 km2. These places are likely operating in a natural state and represent the most intact habitats globally. We then created a regionally representative map of wilderness following the well-established ‘Last of the Wild’ methodology; which identifies the 10% area with the lowest human pressure within each of Earth’s 60 biogeographic realms, and identifies the ten largest contiguous areas, along with all contiguous areas >10,000 km2.

Concepts: Biodiversity, Conservation biology, Ecology, Geography, Natural environment, Nature, Conservation movement, Wilderness