Downloading Datasets¶

Download NEMAR datasets using git-annex for efficient large file handling.

Quick Download¶

# Download dataset (includes all data files)
nemar dataset download nm000104

This clones the dataset and downloads all data files from S3.

Download Options¶

# Download to specific directory
nemar dataset download nm000104 -o ./datasets/

# Clone metadata only (skip large data files)
nemar dataset download nm000104 --no-data

# Parallel downloads for large datasets
nemar dataset download nm000104 -j 8

Resume an Interrupted Download¶

If a download is interrupted, rerun with --resume instead of deleting the partial clone:

nemar dataset download nm000104 --resume

--resume validates the existing directory is a git-annex clone of the same dataset, refuses to proceed when the working tree is dirty, and refuses when the local DatasetVersion has fallen behind the remote (use --update instead). It then re-runs git annex get so only missing files are pulled.

Update to a Newer Version¶

When upstream publishes a new version, pull only the diff:

nemar dataset download nm000104 --update           # pulls just the changed files
nemar dataset download nm000104 --update --prune   # also drops orphaned annex objects

--update reads the local and remote DatasetVersion, fast-forwards to the remote HEAD, and runs git annex get only on the annex keys that changed between the two manifests. For a 5 GB dataset with a 20 MB metadata bump, this typically transfers ~20 MB instead of the whole dataset. Non-fast-forward merges (you have local commits) are refused; use nemar dataset update (the PR workflow) to push them first.

BIDS Entity Filters¶

Pull only the parts of the dataset you need. The clone retains the full git-annex tree (so the result is still a structurally valid BIDS dataset), but only matching files have content locally. You can git annex get <path> later to pull more.

# Specific subjects only (auto-prefix; "01" == "sub-01")
nemar dataset download nm000104 --subjects sub-01,02

# A single task across all subjects
nemar dataset download nm000104 --tasks rest

# Subjects, tasks, and datatypes intersected
nemar dataset download nm000104 \
  --subjects 01,02 --tasks rest --datatypes eeg

# Runs (unpadded 1-9 match both run-1 and run-01)
nemar dataset download nm000104 --runs 1,2

# Sessions
nemar dataset download nm000104 --sessions ses-pre,post

# Raw glob pass-through
nemar dataset download nm000104 --include 'sub-01/eeg/*.edf,*.json'
nemar dataset download nm000104 --exclude 'derivatives/**,sourcedata/**'

Flag	Comma-list values	Maps to
`--subjects`	`sub-01,02`	`sub-01/`, `sub-02/`
`--sessions`	`ses-pre,post`	`/ses-pre/`, `/ses-post/`
`--tasks`	`rest,nback`	`*/_task-rest_`, `/_task-nback_*`
`--runs`	`1,2`	`*/_run-1_`, `/_run-01_*`, ...
`--datatypes`	`eeg,emg`	`/eeg/`, `/emg/`
`--include`	raw glob list	`--include` pass-through
`--exclude`	raw glob list	`--exclude` pass-through

Filters compose with --update (only changed files inside the filter scope are pulled). They cannot be combined with --no-data, since filters imply data download.

Clone vs Download¶

For large datasets, you may want to clone first and get files selectively:

# Clone metadata only
nemar dataset clone nm000104

# Get specific files later
cd nm000104
nemar dataset get sub-01/

# Get specific modality
nemar dataset get sub-01/eeg/

How It Works¶

NEMAR uses git-annex for efficient data management:

Metadata stored in Git (GitHub)
Large files stored in S3 (retrieved on demand)
Versioning tracked automatically

This means: - Quick initial clone (just metadata) - Download only files you need - Automatic deduplication - Version history preserved

Working with Downloaded Data¶

Check What's Available¶

# See what files exist but aren't downloaded
git annex find --not --in here

# See what's downloaded
git annex find --in here

Free Space¶

Drop files you no longer need locally:

# Drop specific files (keeps remote copies)
nemar dataset drop sub-01/eeg/sub-01_task-rest_eeg.edf

# Drop all local copies
nemar dataset drop

Troubleshooting¶

"Permission denied" Error¶

Ensure you're logged in:

nemar auth status --refresh

Slow Download¶

For large datasets, downloads happen from S3. Check your connection and try increasing parallelism with -j 8.

"Content not available" Error¶

The file may have been removed or moved. Try pulling the latest changes:

git pull
nemar dataset get <file>