Uploading Datasets¶
This guide walks you through uploading a BIDS dataset to NEMAR.
Prerequisites¶
Before uploading:
- [ ] Dataset is in valid BIDS format
- [ ] Logged in with
nemar auth login - [ ] git-annex installed
- [ ] GitHub CLI (
gh) installed and authenticated - [ ] Sandbox training completed (
nemar sandbox)
DataLad Users
NEMAR requires main as the default branch. DataLad's adjusted branch naming (e.g. adjusted/master(unlocked)) will cause CI and metadata pipelines to fail.
The CLI will detect non-main branches during upload and offer to rename automatically. You can also rename manually:
git branch -m adjusted/master\(unlocked\) main
Step 1: Validate Your Dataset¶
Always validate before uploading:
nemar dataset validate ./my-dataset
Validation Must Pass
Datasets with validation errors cannot be uploaded. Fix all errors before proceeding.
Common Validation Issues¶
| Issue | Solution |
|---|---|
| Missing dataset_description.json | Create the required BIDS metadata file |
| Invalid JSON | Check for syntax errors in JSON files |
| Missing required fields | Add Name and BIDSVersion to dataset_description.json |
| Invalid modality data | Ensure data files match BIDS naming conventions |
Step 2: Upload¶
nemar dataset upload ./my-dataset
Options¶
| Option | Description |
|---|---|
--name, -n |
Dataset name (defaults to BIDS Name field, then directory name) |
--description |
Brief description |
--skip-validation |
Skip BIDS validation (not recommended) |
--skip-orcid |
Skip co-author ORCID collection |
--dry-run |
Show what would be uploaded without doing it |
--restart |
Clear upload progress and re-upload all files |
-j, --jobs |
Number of parallel upload jobs (default: 4) |
-y, --yes |
Skip confirmation and proceed |
Step 3: What Happens¶
The upload process:
- Prerequisite Check - Verifies required tools (git-annex, gh, aws) are installed with platform-specific install guidance if missing
- Auth and Prerequisites - Verifies login, GitHub authentication (HTTPS preferred, SSH as fallback)
- BIDS Validation - Runs the official BIDS validator (unless skipped)
- File Manifest - Collects files and co-author ORCIDs
- License Enforcement - Detects license from
dataset_description.jsonor LICENSE file; prompts to select one if missing. Validates the license allows research redistribution (see License Requirements below) - Data Provenance - For derived datasets, collects source dataset DOIs and checks license compatibility
- Confirmation - Shows upload plan for review
- Dataset Registration - Creates dataset record and private GitHub repo
- GitHub Invitation - Accepts collaborator invitation to the repo
- git-annex Init - Initializes git-annex and configures S3 remote
- Data Upload - Uploads large files to S3 (uses AWS CLI fast-path when available)
- Metadata and Push - Writes metadata, commits, and pushes to GitHub
- CI Deployment - Deploys GitHub Actions workflows for validation
License Requirements¶
Every dataset uploaded to NEMAR must have a license that allows research redistribution. The CLI will:
- Auto-detect your license from
dataset_description.json(theLicensefield) or a LICENSE file - Prompt you to choose if no license is found, offering recommended open data licenses
- Validate that the chosen license allows research use
- Create a LICENSE file if one does not exist
Recommended licenses (most permissive to most restrictive):
| License | Description |
|---|---|
| CC0-1.0 | Public domain dedication (no restrictions) |
| PDDL-1.0 | Public Domain Dedication and License |
| CC-BY-4.0 | Attribution only |
| CC-BY-SA-4.0 | Attribution + ShareAlike |
| CC-BY-NC-4.0 | Attribution + NonCommercial |
| CC-BY-NC-SA-4.0 | Attribution + NonCommercial + ShareAlike |
| ODC-By-1.0 | Open Data Commons Attribution |
| ODbL-1.0 | Open Database License |
You can also enter any valid SPDX license identifier manually if it allows research use.
License for Derived Data
If your dataset is derived from another dataset, the CLI will ask about data provenance and check that your chosen license is compatible with the source dataset's license. For example, a dataset derived from CC-BY-SA-4.0 source data must also use CC-BY-SA-4.0.
GitHub Authentication¶
The CLI uses HTTPS-first authentication for GitHub operations:
- CI/CD: Uses
GH_TOKENenvironment variable if set - Local (preferred): Uses
gh auth tokenfrom GitHub CLI, so rungh auth loginfirst - Fallback: SSH key authentication (
nemar auth setup-ssh)
Step 4: Making Updates¶
After initial upload, push changes using the CLI:
cd nm000104 # Your dataset directory
# Make changes, then save and push
nemar dataset save -m "Add subjects 101-110"
nemar dataset push
Or create a formal update PR:
nemar dataset update ./nm000104
Troubleshooting¶
Upload Fails with Authentication Error¶
# Check login status
nemar auth status --refresh
# Re-login if needed
nemar auth login
git-annex Errors¶
# Ensure git-annex is configured
git annex version
# Re-initialize if needed
git annex init
Upload Interrupted or Timed Out¶
The upload tracks progress automatically. Re-run the same command to resume:
# Resume from where it left off
nemar dataset upload ./my-dataset
# Or start fresh if resume fails
nemar dataset upload ./my-dataset --restart