VideoDataset module

class video_dataset.ImglistToTensor[source]

Bases: torch.nn.modules.module.Module

Converts a list of PIL images in the range [0,255] to a torch.FloatTensor of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1]. Can be used as first transform for VideoFrameDataset.

static forward(img_list: List[PIL.Image.Image]) → torch.Tensor[NUM_IMAGES, CHANNELS, HEIGHT, WIDTH][source]

Converts each PIL image in a list to a torch Tensor and stacks them into a single tensor.

Parameters

img_list – list of PIL images.

Returns

tensor of size NUM_IMAGES x CHANNELS x HEIGHT x WIDTH

class video_dataset.VideoFrameDataset(root_path: str, annotationfile_path: str, num_segments: int = 3, frames_per_segment: int = 1, imagefile_template: str = 'img_{:05d}.jpg', transform=None, test_mode: bool = False)[source]

Bases: torch.utils.data.dataset.Dataset

A highly efficient and adaptable dataset class for videos. Instead of loading every frame of a video, loads x RGB frames of a video (sparse temporal sampling) and evenly chooses those frames from start to end of the video, returning a list of x PIL images or FRAMES x CHANNELS x HEIGHT x WIDTH tensors where FRAMES=x if the ImglistToTensor() transform is used.

More specifically, the frame range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS segments and FRAMES_PER_SEGMENT consecutive frames are taken from each segment.

Note

A demonstration of using this class can be seen in demo.py https://github.com/RaivoKoot/Video-Dataset-Loading-Pytorch

Note

This dataset broadly corresponds to the frame sampling technique introduced in Temporal Segment Networks at ECCV2016 https://arxiv.org/abs/1608.00859.

Note

This class relies on receiving video data in a structure where inside a ROOT_DATA folder, each video lies in its own folder, where each video folder contains the frames of the video as individual files with a naming convention such as img_001.jpg … img_059.jpg. For enumeration and annotations, this class expects to receive the path to a .txt file where each video sample has a row with four (or more in the case of multi-label, see README on Github) space separated values: VIDEO_FOLDER_PATH     START_FRAME      END_FRAME      LABEL_INDEX. VIDEO_FOLDER_PATH is expected to be the path of a video folder excluding the ROOT_DATA prefix. For example, ROOT_DATA might be home\data\datasetxyz\videos\, inside of which a VIDEO_FOLDER_PATH might be jumping\0052\ or sample1\ or 00053\.

Parameters
  • root_path – The root path in which video folders lie. this is ROOT_DATA from the description above.

  • annotationfile_path – The .txt annotation file containing one row per video sample as described above.

  • num_segments – The number of segments the video should be divided into to sample frames from.

  • frames_per_segment – The number of frames that should be loaded per segment. For each segment’s frame-range, a random start index or the center is chosen, from which frames_per_segment consecutive frames are loaded.

  • imagefile_template – The image filename template that video frame files have inside of their video folders as described above.

  • transform – Transform pipeline that receives a list of PIL images/frames.

  • test_mode – If True, frames are taken from the center of each segment, instead of a random location in each segment.