Non-random Temporal Patterns of Citizen Science Biodiversity Recording¶

Category: Biodiversity · Size: 3.0 MB · Format: ZIP License: CC-BY-4.0 · Zenodo record · Data sheet on the CSDH

Data and code to analyse when citizens record biodiversity, identifying non-random temporal patterns in sampling effort and their associated drivers.

The data is mounted read-only at /srv/data/temporal-patterns-biodiversity/. Save anything you produce in your personal folder (~/).

What's in the dataset¶

In [1]:
from pathlib import Path

DATA = Path('/srv/data/temporal-patterns-biodiversity')

for f in sorted(DATA.rglob('*')):
    if f.is_file():
        print(f"{f.relative_to(DATA)}  ({f.stat().st_size/1e6:,.1f} MB)")
data&code.zip  (3.0 MB)

Explore the ZIP¶

The dataset comes compressed. We list its contents without extracting; if it contains CSVs, pandas can read them straight from inside the ZIP. Remember: /srv/data is read-only — if you need to extract, do it into your folder (~/).

In [2]:
import zipfile
import pandas as pd

zips = sorted(DATA.rglob('*.zip'))
z = zipfile.ZipFile(zips[0])
print('Using:', zips[0].name)
names = z.namelist()
print(f'{len(names)} files inside; first 20:')
for n in names[:20]:
    print('  ', n)

csv_inside = [n for n in names if n.lower().endswith('.csv')]
if csv_inside:
    df = pd.read_csv(z.open(csv_inside[0]), nrows=100_000, low_memory=False)
    display(df.head())
Using: data&code.zip
7 files inside; first 20:
   data&code/
   data&code/All_TAXA_10_replicates_with_holidays.csv
   data&code/descriptor.txt
   data&code/scp1.R
   data&code/scp2.R
   data&code/scp3.R
   data&code/scp4.R
ID ID_subset latitude longitude Species cellLong cellLat date day month ... Dependent Weekday Month Temperature_K Temperature Precipitation Wind Snow Country Holiday
0 6455115 6 37.040009 -7.973232 Pinus pinea -7.997792 37.07940 42736 1 1 ... 0 Sunday Jan 284.230652 11.080652 0.02 3.166559 0.0 Portugal 1
1 8753915 9 37.038830 -7.781120 Pinus pinea -7.797847 37.07940 42736 1 1 ... 0 Sunday Jan 284.325470 11.175470 0.02 3.077367 0.0 Portugal 1
2 27823919 8 39.935980 -8.188933 Quercus suber -8.197736 39.97779 42737 2 1 ... 0 Monday Jan 280.926483 7.776483 9.36 6.392357 0.0 Portugal 0
3 6022549 1 38.170922 -7.034370 Quercus rotundifolia -6.998069 38.17879 42737 2 1 ... 0 Monday Jan 284.136963 10.986963 0.08 4.979019 0.0 Portugal 0
4 20103566 4 40.023598 -7.232988 Quercus rotundifolia -7.198014 39.97779 42737 2 1 ... 0 Monday Jan 280.862793 7.712793 4.40 4.119776 0.0 Portugal 0

5 rows × 21 columns

Your turn¶

This is just the starting point. Some ideas:

  • Check the dataset challenge on its CSDH data sheet.
  • Work on a copy: right-click the file → Duplicate (or Save Notebook As…). Your changes only live in your Hub space — they're never pushed to GitHub.
  • Edited this notebook and want the original back? Use the Restore cell below (or the restore.ipynb notebook).
  • Questions and results: on the platform forum.

Attribution: data from Non-random Temporal Patterns of Citizen Science Biodiversity Recording, license CC-BY-4.0. Notebook from the Citizen Science Data Hub (CSDH) — Fundación Ibercivis.

In [3]:
# ⚠️ RESTORE: this DISCARDS YOUR CHANGES to this notebook and resets it to the original.
# 1. Uncomment the line below (remove the #)   2. Run this cell
# 3. Then: menu File → Reload Notebook from Disk

# !git -C ~/citizen-science-data fetch -q origin && git -C ~/citizen-science-data checkout origin/main -- temporal-patterns-biodiversity.ipynb && echo "Restored. Now: File → Reload Notebook from Disk"