VideoDataset module¶
-
class
video_dataset.
ImglistToTensor
[source]¶ Bases:
torch.nn.modules.module.Module
Converts a list of PIL images in the range [0,255] to a torch.FloatTensor of shape (NUM_IMAGES x CHANNELS x HEIGHT x WIDTH) in the range [0,1]. Can be used as first transform for
VideoFrameDataset
.-
static
forward
(img_list: List[PIL.Image.Image]) → torch.Tensor[NUM_IMAGES, CHANNELS, HEIGHT, WIDTH][source]¶ Converts each PIL image in a list to a torch Tensor and stacks them into a single tensor.
- Parameters
img_list – list of PIL images.
- Returns
tensor of size
NUM_IMAGES x CHANNELS x HEIGHT x WIDTH
-
static
-
class
video_dataset.
VideoFrameDataset
(root_path: str, annotationfile_path: str, num_segments: int = 3, frames_per_segment: int = 1, imagefile_template: str = 'img_{:05d}.jpg', transform=None, test_mode: bool = False)[source]¶ Bases:
torch.utils.data.dataset.Dataset
A highly efficient and adaptable dataset class for videos. Instead of loading every frame of a video, loads x RGB frames of a video (sparse temporal sampling) and evenly chooses those frames from start to end of the video, returning a list of x PIL images or
FRAMES x CHANNELS x HEIGHT x WIDTH
tensors where FRAMES=x if theImglistToTensor()
transform is used.More specifically, the frame range [START_FRAME, END_FRAME] is divided into NUM_SEGMENTS segments and FRAMES_PER_SEGMENT consecutive frames are taken from each segment.
Note
A demonstration of using this class can be seen in
demo.py
https://github.com/RaivoKoot/Video-Dataset-Loading-PytorchNote
This dataset broadly corresponds to the frame sampling technique introduced in
Temporal Segment Networks
at ECCV2016 https://arxiv.org/abs/1608.00859.Note
This class relies on receiving video data in a structure where inside a
ROOT_DATA
folder, each video lies in its own folder, where each video folder contains the frames of the video as individual files with a naming convention such as img_001.jpg … img_059.jpg. For enumeration and annotations, this class expects to receive the path to a .txt file where each video sample has a row with four (or more in the case of multi-label, see README on Github) space separated values:VIDEO_FOLDER_PATH START_FRAME END_FRAME LABEL_INDEX
.VIDEO_FOLDER_PATH
is expected to be the path of a video folder excluding theROOT_DATA
prefix. For example,ROOT_DATA
might behome\data\datasetxyz\videos\
, inside of which aVIDEO_FOLDER_PATH
might bejumping\0052\
orsample1\
or00053\
.- Parameters
root_path – The root path in which video folders lie. this is ROOT_DATA from the description above.
annotationfile_path – The .txt annotation file containing one row per video sample as described above.
num_segments – The number of segments the video should be divided into to sample frames from.
frames_per_segment – The number of frames that should be loaded per segment. For each segment’s frame-range, a random start index or the center is chosen, from which frames_per_segment consecutive frames are loaded.
imagefile_template – The image filename template that video frame files have inside of their video folders as described above.
transform – Transform pipeline that receives a list of PIL images/frames.
test_mode – If True, frames are taken from the center of each segment, instead of a random location in each segment.