About the Citizen Science Data Hub

The Citizen Science Data Hub (CSDH) is an open platform by Fundación Ibercivis that brings together citizen science datasets and a ready-to-use environment to work on them. It is the data side of the project; the training modules — the Citizen Science Data Academy — live on the ECS Academy (Moodle).

Where each thing lives

Piece	Where	What it is
Public gallery	data.ibercivis.es	Static catalogue of the datasets: data sheet, license, download and challenge per dataset. No account, no cookies.
Work environment	jupyterhub.ibercivis.es	JupyterHub. Sign in with GitHub, get your own workspace with the data mounted read-only. Per-user limits: 4 GB RAM, 2 CPU.
Example notebooks	github.com/Ibercivis/citizen-science-data	One notebook per dataset. The "Work on this dataset" button clones them into your Hub workspace via nbgitpuller.
Datasets (files)	Zenodo · `/srv/data` on the server	Downloads link to Zenodo (citable DOIs). The same files are mounted read-only inside the Hub.
Forum & questions	GitHub Discussions	Where challenges, questions and results are discussed.
Training	ECS Academy (Moodle)	The Data Analysis course series. The Hub links to it; it does not reproduce it.

Working with large datasets

Each Hub session has 4 GB of RAM and 2 CPUs, and the datasets are mounted read-only in /srv/data — a single shared copy, so opening one never duplicates it to your space. The catch: some datasets are far bigger than 4 GB (the largest single file is ~13 GB). Opening a file is free; loading it whole into memory is what crashes the kernel. You can still analyse a 10–20 GB file on a 4 GB session — you just work like the pros:

Read only the columns you need — pd.read_csv(f, usecols=[…]) or pd.read_parquet(f, columns=[…]). Parquet is columnar, so reading 3 of 50 columns barely touches the rest.
Process in chunks — for chunk in pd.read_csv(f, chunksize=1_000_000): … walks the whole file keeping only a slice in memory at a time.
Query on disk with DuckDB (pre-installed) — run SQL straight against a CSV or Parquet file and only the result comes into memory: duckdb.sql("SELECT … FROM '/srv/data/…/file.parquet' GROUP BY …").df(). This handles files much larger than RAM.
For ZIP datasets, read entries from inside the archive instead of extracting the whole thing to your shared home folder.

Every example notebook for a large dataset opens with a reminder of this, and the CSV and Parquet ones include ready-to-use snippets. Downloads on the gallery point to Zenodo, so you never need to pull tens of gigabytes onto your own machine to get started.

Who runs it

The platform is operated by Fundación Ibercivis. See the legal notice for provider details, the privacy policy for how personal data is handled, and the terms of use for the work environment.

Funding

Funded by the European Union (Horizon Europe, grant agreement No. 101058509 — ECS project). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the REA. Neither the European Union nor the granting authority can be held responsible for them.