FLOW: Participant Survey Data from Freshwater Citizen Science Project¶

Category: Water Quality · Size: 407 kB · Format: PDF, XLSX License: CC-BY-4.0 · Zenodo record · Data sheet on the CSDH

Participant survey data from the FLOW project (2021) monitoring and protecting freshwater streams, analysing how citizen science fosters knowledge and collective action.

The data is mounted read-only at /srv/data/flow-freshwater/. Save anything you produce in your personal folder (~/).

What's in the dataset¶

In [1]:
from pathlib import Path

DATA = Path('/srv/data/flow-freshwater')

for f in sorted(DATA.rglob('*')):
    if f.is_file():
        print(f"{f.relative_to(DATA)}  ({f.stat().st_size/1e6:,.1f} MB)")
Data_FLOW_survey.xlsx  (0.1 MB)
Dateset_Explanation.pdf  (0.3 MB)

Load the data¶

The dataset comes as Excel; pandas reads it directly.

In [2]:
import pandas as pd

xlsx = sorted(DATA.rglob('*.xlsx'))
print('Using:', xlsx[0].name)
df = pd.read_excel(xlsx[0])
df.head()
Using: Data_FLOW_survey.xlsx
Out[2]:
No ID group complete age gender attitude.1 attitude.2 attitude.3 awareness.1 ... plan_behavior2.3 personal_efficacy.1 personal_efficacy.2 personal_efficacy.3 personal_norms.1 personal_norms.2 personal_norms.3 skills.1 skills.2 skills.3
0 1 1A5KDSL7X1 Treatment 2 18.0 male 3.666667 3.500000 NaN 4.50 ... NaN 4.0 4.0 NaN 2.25 2.00 NaN 3.000000 3.500000 NaN
1 2 1FZN47ETFG Treatment 2 16.0 female 5.000000 4.000000 NaN 3.50 ... NaN 4.0 4.0 NaN 4.00 2.75 NaN 1.833333 4.333333 NaN
2 3 1GB6QR7D7W Treatment 2 17.0 female 4.500000 4.500000 NaN 4.25 ... NaN 4.0 5.0 NaN 3.50 3.75 NaN 2.000000 3.666667 NaN
3 4 1GLGSGANPU Treatment 3 47.0 female 5.000000 5.000000 4.833333 4.75 ... 3.5 5.0 5.0 5.0 5.00 5.00 4.75 4.833333 5.000000 5.0
4 5 1M6DC33S2D Treatment 2 19.0 male 4.166667 4.666667 NaN 4.25 ... NaN 3.0 5.0 NaN 3.50 3.25 NaN 1.666667 3.500000 NaN

5 rows × 53 columns

First look¶

Shape, types and basic statistics.

In [3]:
df.info()
df.describe(include='all').T.head(20)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 555 entries, 0 to 554
Data columns (total 53 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   No                     555 non-null    int64  
 1   ID                     555 non-null    object 
 2   group                  555 non-null    object 
 3   complete               555 non-null    int64  
 4   age                    546 non-null    float64
 5   gender                 543 non-null    object 
 6   attitude.1             553 non-null    float64
 7   attitude.2             550 non-null    float64
 8   attitude.3             81 non-null     float64
 9   awareness.1            550 non-null    float64
 10  awareness.2            548 non-null    float64
 11  awareness.3            81 non-null     float64
 12  behavioral_control.1   536 non-null    float64
 13  behavioral_control.2   533 non-null    float64
 14  behavioral_control.3   81 non-null     float64
 15  collective_efficacy.1  533 non-null    float64
 16  collective_efficacy.2  527 non-null    float64
 17  collective_efficacy.3  81 non-null     float64
 18  group_id.1             164 non-null    float64
 19  group_id.2             206 non-null    float64
 20  group_id.3             81 non-null     float64
 21  group_norms.1          147 non-null    float64
 22  group_norms.2          190 non-null    float64
 23  intrinsic_motivation   213 non-null    float64
 24  knowledge_score.1      546 non-null    float64
 25  knowledge_score.2      544 non-null    float64
 26  knowledge_score.3      80 non-null     float64
 27  nature_rel.1           555 non-null    float64
 28  nature_rel.2           555 non-null    float64
 29  nature_rel.3           81 non-null     float64
 30  interest.1             547 non-null    float64
 31  interest.2             545 non-null    float64
 32  past_behavior1.1       551 non-null    float64
 33  past_behavior1.2       548 non-null    float64
 34  past_behavior1.3       81 non-null     float64
 35  past_behavior2.1       542 non-null    float64
 36  past_behavior2.2       544 non-null    float64
 37  past_behavior2.3       81 non-null     float64
 38  plan_behavior1.1       550 non-null    float64
 39  plan_behavior1.2       551 non-null    float64
 40  plan_behavior1.3       81 non-null     float64
 41  plan_behavior2.1       537 non-null    float64
 42  plan_behavior2.2       539 non-null    float64
 43  plan_behavior2.3       81 non-null     float64
 44  personal_efficacy.1    537 non-null    float64
 45  personal_efficacy.2    529 non-null    float64
 46  personal_efficacy.3    81 non-null     float64
 47  personal_norms.1       548 non-null    float64
 48  personal_norms.2       545 non-null    float64
 49  personal_norms.3       81 non-null     float64
 50  skills.1               547 non-null    float64
 51  skills.2               545 non-null    float64
 52  skills.3               81 non-null     float64
dtypes: float64(48), int64(2), object(3)
memory usage: 229.9+ KB
Out[3]:
count unique top freq mean std min 25% 50% 75% max
No 555.0 NaN NaN NaN 278.0 160.358972 1.0 139.5 278.0 416.5 555.0
ID 555 555 1A5KDSL7X1 1 NaN NaN NaN NaN NaN NaN NaN
group 555 3 Control2 250 NaN NaN NaN NaN NaN NaN NaN
complete 555.0 NaN NaN NaN 2.145946 0.353371 2.0 2.0 2.0 2.0 3.0
age 546.0 NaN NaN NaN 42.175824 19.516971 13.0 25.0 41.0 59.0 84.0
gender 543 2 female 281 NaN NaN NaN NaN NaN NaN NaN
attitude.1 553.0 NaN NaN NaN 4.230199 0.636721 1.0 4.0 4.333333 4.666667 5.0
attitude.2 550.0 NaN NaN NaN 4.254242 0.665762 1.5 4.0 4.333333 4.833333 5.0
attitude.3 81.0 NaN NaN NaN 4.469136 0.500077 3.0 4.0 4.666667 4.833333 5.0
awareness.1 550.0 NaN NaN NaN 4.276364 0.614444 2.0 4.0 4.5 4.75 5.0
awareness.2 548.0 NaN NaN NaN 4.412865 0.60181 1.75 4.0 4.5 5.0 5.0
awareness.3 81.0 NaN NaN NaN 4.657407 0.329404 3.5 4.5 4.75 5.0 5.0
behavioral_control.1 536.0 NaN NaN NaN 3.6875 0.87745 1.0 3.0 3.5 4.5 5.0
behavioral_control.2 533.0 NaN NaN NaN 3.704503 0.82911 1.0 3.0 3.5 4.5 5.0
behavioral_control.3 81.0 NaN NaN NaN 3.679012 0.841534 1.5 3.0 3.5 4.5 5.0
collective_efficacy.1 533.0 NaN NaN NaN 3.953096 0.965405 1.0 3.0 4.0 5.0 5.0
collective_efficacy.2 527.0 NaN NaN NaN 3.99241 0.941209 1.0 3.0 4.0 5.0 5.0
collective_efficacy.3 81.0 NaN NaN NaN 4.148148 0.807947 2.0 4.0 4.0 5.0 5.0
group_id.1 164.0 NaN NaN NaN 3.896341 0.950421 1.0 3.0 4.0 5.0 5.0
group_id.2 206.0 NaN NaN NaN 4.082524 0.84852 1.0 4.0 4.0 5.0 5.0

A first chart¶

Histogram of the first numeric column — swap it for the variable you care about.

In [4]:
import matplotlib.pyplot as plt

num = df.select_dtypes('number')
if num.shape[1]:
    col = num.columns[0]
    num[col].plot.hist(bins=50, figsize=(8, 4), title=col)
    plt.tight_layout()
else:
    print('No direct numeric columns: explore df on your own.')
No description has been provided for this image

Your turn¶

This is just the starting point. Some ideas:

  • Check the dataset challenge on its CSDH data sheet.
  • Work on a copy: right-click the file → Duplicate (or Save Notebook As…). Your changes only live in your Hub space — they're never pushed to GitHub.
  • Edited this notebook and want the original back? Use the Restore cell below (or the restore.ipynb notebook).
  • Questions and results: on the platform forum.

Attribution: data from FLOW: Participant Survey Data from Freshwater Citizen Science Project, license CC-BY-4.0. Notebook from the Citizen Science Data Hub (CSDH) — Fundación Ibercivis.

In [5]:
# ⚠️ RESTORE: this DISCARDS YOUR CHANGES to this notebook and resets it to the original.
# 1. Uncomment the line below (remove the #)   2. Run this cell
# 3. Then: menu File → Reload Notebook from Disk

# !git -C ~/citizen-science-data fetch -q origin && git -C ~/citizen-science-data checkout origin/main -- flow-freshwater.ipynb && echo "Restored. Now: File → Reload Notebook from Disk"