Downloading a Popsign dataset

Popsign Datasets

This applies to only Popsign Datasets (labelled popsign_*) and not all datasets. Each dataset will have it’s own heirarchy and hence it’s own commands to download.

While the download buttons exist with each sign preview page on the Popsign dataset, the datasets can be programatically downloaded using a simple utility like wget or curl, based on the following, simple API that we have exposed.

This API can be used in any programming language, with the help of a library to create HTTP Requests and manage HTTP responses.

When downloading a dataset, you need to keep track of the following:

  • Dataset: Which Popsign dataset you are downloading from (for example: popsign_v1_0)
  • Category: The category of the dataset you want. Currently we only have 2, game and non-game
  • Set split: The split of the set that you want. Currently we have a train dataset, a val (validation) dataset and a test dataset.
  • Sign: The actual sign for which you are downloading the video samples. The videos for that dataset split can be downloaded as a tar file. You can extract the tar file with the standard tar extraction command.

Single Tar file Download

To download a specfic tarfile, for example the popsign_v1_0 after sign from the game category of the train split set, you can run the following command. Note the command does a few extra things like creating an appropriate directory to store the file and extracting the tarfile. However, the URL highlighted in the example can be used to fetch the specific file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/usr/bin/bash
dataset="popsign_v1_0"
category="game"
set_split="train"
sign="after"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset, Category: $category, Set Split: $set_split, Sign: $sign" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset/$category/$set_split/" \
&& echo "----------Downloading Tar File----------" \
&& wget -O "./$dataset/$category/$set_split/$sign.tar" "https://signdata.cc.gatech.edu/data/$dataset/$category/$set_split/$sign.tar" \
&& echo "----------Extracting Content----------" \
&& tar -xf "./$dataset/$category/$set_split/$sign.tar" -C "./$dataset/$category/$set_split/" \
&& echo "----------Deleting Original Tarfile" \
&& rm "./$dataset/$category/$set_split/$sign.tar" \
&& echo "==========Done=========="
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/usr/bin/bash
dataset="popsign_v1_0"
category="game"
set_split="train"
sign="after"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset, Category: $category, Set Split: $set_split, Sign: $sign" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset/$category/$set_split/" \
&& echo "----------Downloading Tar File----------" \
&& curl -o "./$dataset/$category/$set_split/$sign.tar" "https://signdata.cc.gatech.edu/data/$dataset/$category/$set_split/$sign.tar" \
&& echo "----------Extracting Content----------" \
&& tar -xf "./$dataset/$category/$set_split/$sign.tar" -C "./$dataset/$category/$set_split/" \
&& echo "----------Deleting Original Tarfile" \
&& rm "./$dataset/$category/$set_split/$sign.tar" \
&& echo "==========Done=========="

Batched Download

Beyond single file download, it is possible to do a batched download at any of the following granularity levels:

  • Download an entire dataset.
  • Download an entire category within a dataset.
  • Download an entire split set.

Anything more granular and you will be downloading the single tar file for the sign. We do not support downloading of individual sample videos, rather encourage downloading the tarfile for a given sign in it’s dataset + category + split set.

Again, this API can be used in any programming language, with the help of a library to create HTTP Requests and manage HTTP responses.

Preview Data

The collages and the preview videos served with the collages are not accessible through this API. While they are being served for the website, it is not appropriate for them to be used beyond their purpose on this website. They have been downscaled by almost about 24 times, and their speeds have been either increased or decreased to get their durations to become the same. Please consider downloading the orginal videos in the dataset and resizing/ re-synchronising them as per your needs.

Dataset

To download an entire dataset, you just need to truncate the URL used for single file downloads and only specify the dataset. An example is shown below for popsign_v1_0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/bash
dataset="popsign_v1_0"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset" \
&& echo "----------Downloading Files----------" \
&& wget -r -np -nH --cut-dir=2 -P "./$dataset" "https://signdata.cc.gatech.edu/data/$dataset/" \
&& echo "==========Done=========="

Category

To download an entire category, you just need to truncate the URL used for single file downloads and only specify the dataset and category. An example is shown below for the popsign_v1_0 dataset’s game category.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/bash
dataset="popsign_v1_0"
category="game"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset/$category" \
&& echo "----------Downloading Files----------" \
&& wget -r -np -nH --cut-dir=2 -P "./$dataset" "https://signdata.cc.gatech.edu/data/$dataset/$category/" \
&& echo "==========Done=========="

Set Split

To download an entire split set, you just need to truncate the URL used for single file downloads and specify the dataset category and split set. An example is shown below for the popsign_v1_0 dataset’s game category and the train split set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/usr/bin/bash
dataset="popsign_v1_0"
category="game"
set_split="train"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset/$category/$set_split" \
&& echo "----------Downloading Files----------" \
&& wget -r -np -nH --cut-dir=2 -P "./$dataset" "https://signdata.cc.gatech.edu/data/$dataset/$category/$set_split/" \
&& echo "==========Done=========="