Ego-Deliver, a new large-scale egocentric video benchmark recorded by takeaway riders about their daily work. To the best of our knowledge, Ego-Deliver presents the first attempt in understanding activities from the takeaway delivery process while being one of the largest egocentric video action dataset to date. We believe is pivotal to future research in this area.
We have collected 5, 360 videos with a total duration of 425 hours. We used cameras to record continuous video data with fixed resolution 720P (1280 × 720) at 25 FPS. Ego-Deliver provides a total of 5, 360 videos with more than 139, 000 multi-track annotations and 45 different attributes. The average number of annotations for each video is 27.15. 28 out of 45 categories have more than 3, 000 instances thus ensuring sufficient training data.
We choose ANVIL as the annotation tool. To fully describe the delivery process, we annotate collected videos from different perspectives. The annotations were divided into 5 different tracks. On each track, both starting and ending points of each action were labeled.
We will provide the original .anvil file and the sqlite db file of Ego-Deliver.
The dataset will be released after the paper is accepted.

The demo of Ego-Deliver can be donwload from HERE.