What is a Data Dictionary?
A data dictionary is a document that describes all variables in your dataset. It serves as essential documentation for:
- Yourself - Remember what variables mean months later
- Collaborators - Help others understand your data
- Reproducibility - Enable others to reuse your data correctly
- Compliance - Meet requirements for data sharing mandates
Using the Shiny App
The easiest way to create a data dictionary is through the psychds Shiny app:
Navigate to “Update Data Dictionary” in the sidebar.
Step-by-Step Process
- Load your dataset - Select a Psych-DS dataset folder
- Edit each variable - Add descriptions, types, and constraints
- Define categorical values - Document allowed values and their meanings
- Set missing value codes - Define how missing data is coded
- Generate the dictionary - Save into metadata or export as HTML
See the Shiny App Guide for detailed instructions.
Best Practices
Writing Good Descriptions
❌ Bad: “age” ✅ Good: “Participant age in years at the time of initial assessment”
❌ Bad: “rt” ✅ Good: “Mean response time in milliseconds across all correct trials in the Stroop task”
Choosing Data Types
| Use | When |
|---|---|
string |
Text, IDs, categorical values |
integer |
Whole numbers (counts, ages) |
number |
Decimals (measurements, proportions) |
boolean |
True/false values |
date |
Dates (ISO 8601 format) |
datetime |
Timestamps |
Documenting Categorical Variables
Always define what each category means, even if it seems obvious:
| Value | Label | Description |
|---|---|---|
1 |
Strongly Disagree | Likert rating indicating strong disagreement |
2 |
Disagree | Likert rating indicating disagreement |
3 |
Neutral | Likert rating indicating neutral response |
4 |
Agree | Likert rating indicating strong agreement |
5 |
Strongly Agree | Likert rating indicating strong agreement |
Integration with Psych-DS
The data dictionary complements the
dataset_description.json metadata. While
dataset_description.json contains machine-readable
metadata, the data dictionary provides human-readable documentation.
Both should be included in your Psych-DS dataset:
my_dataset/
├── dataset_description.json # Machine-readable
├── data_dictionary.html # Human-readable
├── README.md
└── data/
└── study-exp1_data.csv