Activity Extended Video (ActEV) challenge

Datasets

ActEV-supported data sets
- Multiview Extended Video with Activities (MEVA) See the License and the README for context.
- VIRAT
- Data access from mevadata.org: Accessing and using MEVA and MEVA Download Instructions
- Data access from git and NIST : See actev-data-repo. Access credentials provided during signup
Kinetics
AVA
Moments-in-Time
ActivityNet
NVIDIA's CityFlow dataset

Framework

The DIVA Framework is a software framework designed to provide an architecture and a set of software modules which will facilitate the development of activity recognition analytics. The Framework is developed as a fully open source project on GitHub. The following links will help you get started with the framework:

DIVA Framework Github Repository This is the main DIVA Framework site, all development of the framework happens here.
DIVA Framework Issue Tracker Submit any bug reports or feature requests for the framework here.
DIVA Framework Main Documentation PageThe source for the framework documentation is maintained in the Github repository using Sphinx. A built version is maintained on ReadTheDocs at this link. A good place to get started in the documentation, after reading the Introduction is the UseCase section which will walk you though a number of typical use cases with the framework.

The DIVA Framework is based on KWIVER, an open source framework designed for building complex computer vision systems. The following links will help you learn more about KWIVER:

KWIVER Github Repository This is the main KWIVER site, all development of the framework happens here.
KWIVER Issue Tracker Submit any bug reports or feature requests for the KWIVER here. If there's any question about whether your issues belongs in the KWIVER or DIVA framework issues tracker, submit to the DIVA tracker and we'll sort it out..
KWIVER Main Documentation PageThe source for the KWIVER documentation is maintained in the Github repository using Sphinx. A built version is maintained on ReadTheDocs at this link. A good place to get started in the documentation, after reading the Introduction are the Arrows and Sprokit sections, both of which are used by the KWIVER framework.

Baseline Algorithms

KITWARE has adapted two "baseline" activity recognition algorithms to work within the DIVA Framework:

R-C3D
ACT

Annotation Tools

For information on data, evaluation code, etc., please email: actev-nist@nist.gov

For ActEV evaluation discussion, please visit our Google Group: https://groups.google.com/a/list.nist.gov/forum/#!forum/trecvid.actev

Activity Examples

An ActEV activity is defined to be “one or more people performing a specified movement or interacting with an object or group of objects”. Activities are annotated by humans using a set of annotation guidelines that specify how to perform the annotation and the criteria to determine if the activity occurred. Each activity is formally defined by five elements:

Activity Name - A mnemonic handle for the activity
Activity Description - Textual description of the activity
Begin time rule definition - The specification of what determines the beginning time of the activity
End time rule definition - The specification of what determines the ending time of the activity
Required object type list - The list of objects systems are expected to identify for the activity. Note: this aspect of an activity not addressed by ActEV-PC.

For example:

Activity Name

Description and Example Chip Videos

person_closes_vehicle_door

Description: A person closing the door to a vehicle.
Start: The event begins 1 s before the door starts to move.
End: The event ends after the door stops moving. People in cars who close the car door from within is a closing event if you can still see the person within the car. If the person is not visible once they are in the car, then the closing should not be annotated as an event.
Objects associated with the activity : Person; and Door or Vehicle

vehicle_turns_left

Description: A vehicle turning left or right is determined from the POV of the driver of the vehicle. The vehicle may not stop for more than 10 s during the turn.
Start: Annotation begins 1 s before vehicle has noticeably changed direction.
End: Annotation ends 1 s after the vehicle is no longer changing direction and linear motion has resumed. Note: This event is determined after a reasonable interpretation of the video.
Objects associated with the activity : Vehicle

person_loads_vehicle

Description: An object moving from person to vehicle.
Start: The event begins 2 s before the cargo to be loaded is extended toward the vehicle (i.e., before a person’s posture changes from one of “carrying” to one of “loading”).
End: The event ends after the cargo is placed into the vehicle and the person-cargo contact is lost. In the event of occlusion, it ends when the loss of contact is visible.
Objects associated with the activity: Person; and Vehicle

Activities for the ActEV evaluations

The ActEV evaluation will use the ActEV 2020 Sequestered Data Leaderboard (SDL) activity names for this and all future ActEV evaluations. The table below provides a list of activities for ActEV 2020 Sequestered Data Leaderboard (SDL), ActEV 2019 Sequestered Data Leaderboard (SDL), and ActEV TRECVID 2019.

ActEV 2020 SDL Activity Name	ActEV 2019 SDL Activity Name	ActEV TRECVID 2019
person_abandons_package person_closes_facility_door person_closes_trunk person_closes_vehicle_door person_embraces_person person_enters_scene_through_structure person_enters_vehicle person_exits_scene_through_structure person_exits_vehicle hand_interacts_with_person person_carries_heavy_object person_interacts_with_laptop person_loads_vehicle person_transfers_object person_opens_facility_door person_opens_trunk person_opens_vehicle_door person_talks_to_person person_picks_up_object person_purchases person_reads_document person_rides_bicycle person_puts_down_object person_sits_down person_stands_up person_talks_on_phone person_texts_on_phone person_steals_object person_unloads_vehicle vehicle_drops_off_person vehicle_picks_up_person vehicle_reverses vehicle_starts vehicle_stops vehicle_turns_left vehicle_turns_right vehicle_makes_u_turn	abandon_package person_closes_facility_door Closing_Trunk person_closes_vehicle_door person_person_embrace person_enters_through_structure person_enters_vehicle person_exits_through_structure person_exits_vehicle hand_interaction Transport_HeavyCarry person_laptop_interaction person_loads_vehicle object_transfer person_opens_facility_door Open_Trunk person_opens_vehicle_door Talking person_picks_up_object person_purchasing person_reading_document Riding person_sets_down_object person_sitting_down person_standing_up specialized_talking_phone specialized_texting_phone theft Unloading vehicle_drops_off_person vehicle_picks_up_person vehicle_reversing vehicle_starting vehicle_stopping vehicle_turning_left vehicle_turning_right vehicle_u_turn	Closing (P, V) or (P) Closing_trunk (P, V) Entering (P, V) or (P) Exiting (P, V) or (P) Loading (P, V) Open_Trunk (P, V) Opening (P, V) or (P) Transport_HeavyCarry (P, V) Unloading (P, V) Vehicle_turning_left (V) Vehicle_turning_right (V) Vehicle_u_turn (V) Interacts (P) Pull (P) Riding (P) Talking (P) activity_carrying (P) specialized_talking_phone (P) specialized_texting_phone (P)

Updates

March 01, 2020: The ActEV 2020 SDL opens with the expanded MEVA Test3 dataset
March 01, 2020: ActEV 2020 SDL is a guest task under ActivityNet CVPR'20 ActivityNet workshop
May 10, 2020 at 12:00 noon EST: Deadline for CLI submissions to be included in ActEV SDL ActivityNet rankings. Top two submissions to be announced on June 1, 2020.
June 14, 2020: CVPR'20 ActivityNet workshop ActEV SDL guest task presentations

Summary

ActEV is a series of evaluations to accelerate the development of robust, multi-camera, automatic activity detection algorithms for forensic and real-time alerting applications. ActEV is an extension of the annual TRECVID Surveillance Event Detection (SED) evaluation where systems will also detect and track objects involved in the activities. Each evaluation will challenge systems with new data, system requirements, and/or new activities. Currently we are running the ActEV 2020 Sequestered Data Leaderboard (SDL) and ActEV TRECVID 2020 evaluations.

Past Evaluations

ActEV began with the Summer 2018 Blind and Leaderboard evaluations for 12 activities. The summer evaluation was followed by the ongoing Fall ActEV Self-Reported Evaluation which ended in Dec 2018 and included 18 activities. The Activities in Extended Videos Prize Challenge (ActEV-PC) ran under CVPR'19 ActivityNet workshop. In 2019 we ran two other evaluations, the ActEV 2019 Sequestered Data Leaderboard (SDL) and the ActEV TRECVID 2020 evaluations.

What is Activity Detection in Extended Videos?

An ActEV activity is defined to be “one or more people performing a specified movement or interacting with an object or group of objects”. Activity detection technologies process extended video streams, such as those from a CCTV camera, and automatically detects all instances of the activity by: (1) identifying the type of activity, (2) producing a confidence score indicating the presence of instance, (3) temporally localizing the instance by indicating the begin and end times, and (4) optionally, detecting and tracking the objects (people, vehicles, objects) involved in the activity.

Click on the tabs above to see video examples, activity examples, and evaluation tasks

What

The ActEV evaluations are being conducted to assess the robustness of automatic activity detection for a multi-camera streaming video environment.

Who

Everyone. Anyone who registers can submit to the evaluation server.

How

Register here and then based on the evaluation participants can either ran their activity detection software on their compute hardware and submit their system output to the ActEV Scoring Server or submited their runnable activity detection software to NIST using the Evaluation Commandline Interface. See the individual evaluation pages and evaluation plans for details.

Data

Each ActEV evaluation uses a new video data set, changes the evaluation tasks, or adds/changes activities. The data will be provided in MPEG-4 and AVI formatted files. See the individual evaluation pages for details.

Evaluation Metrics and Tools

The main scoring metrics will be based on detection, temporal localization, and spatio-temporal localization using evaluation measures that include the probability of missed detection and rate of false alarm. See details in the evaluation plans of each evaluation.

NIST maintains the ActEV Scoring Software on the Scoring software for the Activities in Extended Video (ActEV) evaluation GitHub repo.

News

01Mar

ActEV 20 SDL starts

10May

Deadline to submit for ActEV ActivityNet

ActEV: Video Examples

Below you will find four example videos from our data sets. There are two example views each of indoor and outdoor.

Location	View 1	View 2
Indoor
Outdoor

ActEV Evaluation Tasks

Activity detection has been researched for many years and remains an unsolved computer vision challenge that requires many capabilities beyond the current state of the art. The ActEV series supports several evaluation tasks each escalating the difficulty by requiring more specific information from the system. Presently, there are three evaluation tasks defined: 1) Activity Detection (AD), 2) Activity and Object Detection (AOD), and (3) Activity and Object Detection and Tracking (AODT). Each evaluation task is summarized below. For a full description of the evaluation tasks, read the Evaluation Plan for each specific evaluation.

Activity Detection (AD)

For the Activity Detection task, given a target activity, a system automatically detects and temporally localizes all instances of the activity. For a system-identified activity instance to be evaluated as correct, the type of activity must be correct and the temporal overlap must fall within a minimal requirement.

Activity and Object Detection (AOD)

For the Activity and Object Detection task, given a target activity, a system detects and temporally localizes all instances of the activity and spatially detects/localizes the people and/or objects associated with the target activity. For a system-identified instance to be scored as correct, it must meet the temporal overlap criteria for the AD task and in addition meet the spatial overlap of the identified objects during the activity instance.

Activity Object Detection and Tracking (AODT)

For the Activity Object Detection and Tracking task, given a target activity, a system detects and temporally localizes all instances of the activity, spatio-temporally detects/localizes the people and/or objects associated with the target activity, and properly assigns IDs the objects play in the activity. For a system-identified instance to be scored as correct, it must meet the temporal overlap criteria and spatio-temporal overlap of the objects for the AOD task and correctly assign the IDs to the objects as described in the activity definition.

ActEV: Activities in Extended Video

Datasets

Framework

Baseline Algorithms

Annotation Tools

For example: