BU Logo
Rodent3D
A Multi-view Multi-modal Rodent Dataset

Abstract

Accurate tracking of the 3D pose of animals from video recordings is critical for many behavioral studies, yet there is a dearth of publicly available datasets that the computer vision community could use for model development. We here introduce the Rodent3D dataset that records animals exploring their environment and/or interacting with each other with multiple cameras and modalities (RGB, depth, thermal infrared). Rodent3D consists of 200 minutes of multi-modal video recordings from up to three thermal and three RGB-D synchronized cameras (approximately 4 million frames). For the task of optimizing estimates of pose sequences provided by existing pose estimation methods, we provide a baseline model called OptiPose. While deep-learned attention mechanisms have been used for pose estimation in the past, with OptiPose, we propose a different way by representing 3D poses as tokens for which deep-learned context models pay attention to both spatial and temporal keypoint patterns. Our experiments show how OptiPose is highly robust to noise and occlusion and can be used to optimize pose sequences provided by state-of-the-art models for animal pose estimation.

We have partially started uploading the dataset at DropBox.

Open Access Article - IJCV

Camera Configuration

  • The dataset consists of approximately 200 minutes of multi-modal videos over different sessions
  • We provide synchronized Color, Depth(aligned) and Thermal streams from 6 Cameras. Their annotations are generated through the OptiPose framework.
  • The figure shows the different camera configurations of the dataset. Where the blue cameras are the thermal cameras and red cameras are RGB-D cameras
  • The table describes the camera specifications; The focal length, Sensor pitch, modality, and maximum resolution at respective FPS.

Skeleton

  • The dataset contains 8 10 keypoints: Snout, RightEar, LeftEar, HeadBase, Sp1, Mid, Sp2, TailBase, TailMid, and TailTip.
  • We have added back Sp1 and Sp2 retrospectively to previous versions.

Calibration

  • It contains synchronized sessions of a calibration cube made with thermoelectric devices.
  • The DLT coefficients were genrated using EasyWand software.

Session (Reward Dispenser)

  • The rodent was trained to alternate between two reward locations on the opposite sides of the arena, covered with walls on three sides.
  • The experiment can be helpful for mapping neural-circuits. Electrophysiology or calcium imaging data aligned with the OptiPose tracking data can be used to recognize egocentric boundary cells, grid cells, and place cells.

Session (No Objects)

  • In this configuration, the arena only has one wall, and the rodent is allowed to explore freely.
  • This experiment can be used to find behavioral patterns. Some expected behaviors are rearing, grooming, grazing, sniffing, and freezing.

Session (Objects)

  • In this configuration, the arena only has one wall, and the rodent is allowed to explore freely. However, there are novel objects inserted into the environment.
  • This experiment can be used to find behavioral patterns when new objects are inserted into the arena. With Ephys or Calcium imaging, we could detect cells corresponding to behaviors relating to those objects.