Home Action Genome

A large-scale multi-view video dataset of daily activities at home

CVPR2021 Competition

We have 3 challenge tasks in Home Action Genome competition.

Sample code of dataloader

You can use sample scripts for preparing for competition.

dataloader(hierarchical_homage_single_sample.zip)

Important dates

Evaluation server open: end-April, 2021

Evaluation server close: 9 June, 2021

Report submission deadline: 14 June, 2021

Workshop: 19 June, 2021

Challenge #1: Atomic Action Localization

About

In our dataset, we annotated all the atomic actions segments that are performed during the activities. For this track, participants will use atomic action segments, this dataset carefully annotated with a complete set of temporal action segments for the atomic action localization task. Each sample can contain multiple action segments. The task is to localize these atomic action segments by predicting the start and end times of each atomic action as well as the action label. For this track, participants are also allowed to leverage audio information. External datasets for pre-training are allowed, but it needs to be clearly documented.

competition1

Evaluation Metric

For evaluation, the metric used in this task is Second-mAP. Second-mAP measures the area under the precision-recall curve of detections for each second. To determine if the prediction is true positive, we check intersection over union (IoU) with ground truth segments. If temporal IoU is greater than a threshold (e.g. IoU > 0.5), we judge the detection is true positive. Score is defined as the mean of all mAP values computed with tIoU thresholds between 0.5 and 0.95 (inclusive) with a step size of 0.05. This metric is similar to ActivityNet.

Evaluation code and submission Format

Here is the evaluation code for task1.

evaluation_code_task1.zip

Challenge #2: Scene-graph Generation

About

We use scene graphs to describe the relationship between a person and the object used during the execution of an action. In this track, the algorithms need to predict per-frame scene graphs, including how they change as the video progresses. For this track, participants are also allowed to leverage audio information. External datasets for pre-training are allowed, but it needs to be clearly documented. Since there are multiple relationships between each pair of human and object, there is no graph constraint (or single-relationship constraint).

competition2

Evaluation Metric

For evaluation of scene graph prediction, we use the evaluation metric as Scene graph classification (SGCLS). The task is to predict object categories and predicate labels between the person and each object. Participants can use input information as video, other modalities and ground truth boxes. Evaluation metrics is recall@k, we compute the fraction of times the ground truth relationship triplets are predicted in the top k most confident relationships predictions in each tested frame. We will use k=10, 20.

Evaluation code and submission Format

Here is the evaluation code for task2.

evaluation_code_task2.zip

Challenge #3: Privacy Concerned Activity Recognition

About

Privacy-sensitive recognition method is very important for practical application. This task is Video-level activity recognition, but in specially, input videos are blurred, look like recorded with a multi-pinhole camera. In training models, participants can use unblurred images, but test data has only blurred images. External datasets for pre-training are allowed, but it needs to be clearly documented.

competition3

Evaluation Metric

This task is video understanding without a clear video image. Participants predict k activity labels for each video, and each candidate predicted label is checked whether or not it is consistent with ground truth activity label to calculate scores. We will use k=1 and k=5 and evaluate recognition methods with the average of these two scores.

Evaluation code and submission Format

Here is the evaluation code for task3.

evaluation_code_task3.zip