The first large-scale egocentric dataset for maintenance and repair actions
MAD dataset is now publicly available for research purposes.
Our paper has been accepted at [Conference Name].
Evaluation server and baselines are now available.
The Maintenance Actions Dataset (MAD) is a large-scale egocentric video dataset capturing real-world maintenance and repair activities. Recorded from a first-person perspective using head-mounted cameras, MAD provides a unique window into procedural tasks performed by technicians in various maintenance scenarios.
Unlike existing egocentric datasets that focus on daily activities or cooking, MAD specifically targets the industrial maintenance domain, featuring complex tool usage, multi-step procedures, and fine-grained hand-object interactions.
First-person perspective capturing the technician's viewpoint during maintenance tasks.
Rich annotations of tool usage and hand-object interactions in maintenance scenarios.
Multi-step procedures with temporal ordering and hierarchical action structure.
Real maintenance scenarios including assembly, repair, and inspection tasks.
Dense frame-level annotations with action start/end times and verb-noun labels.
Full HD recordings at high frame rate for detailed action analysis.
Action Distribution Chart
Add your visualization hereVerb-Noun Distribution
Add your visualization hereAnnotation Pipeline
Add your visualization hereBrowse sample images from the MAD dataset with annotations. Use filters to explore specific actions, objects, or scenarios.
The dataset is available for academic research purposes. Please read and agree to the terms of use before downloading.
All annotations (Train/Val/Test splits) are available at our GitHub repository:
MAD-annotationsIf you use this dataset, please cite our paper:
@article{author2024mad,
title={MAD: Maintenance Actions Dataset for Egocentric Video Understanding},
author={Author Name and Co-Author Name},
journal={Conference/Journal Name},
year={2024}
}
We provide several benchmark tasks for evaluating methods on MAD:
Task: Assign a (verb, noun) label to a trimmed video segment.
Metrics: Top-1/5 accuracy for verb, noun, and action.
Task: Detect action start/end times in untrimmed videos.
Metrics: Mean Average Precision (mAP) @ various IoU thresholds.
Task: Predict the next action before it starts.
Metrics: Top-5 recall at various anticipation times.
We are researchers from the Department of Computer Science at Ben-Gurion University of the Negev.
This research was supported by: