Downloading the Assistants dataset

This datasets can be programatically downloaded using a simple utility like wget, based on the following, simple API that we have exposed.

This API can be used in any programming language, with the help of a library to create HTTP Requests and manage HTTP responses.

When downloading a dataset, you need to keep track of the following:

  • Dataset: Which dataset you are downloading from (currently: assistant_phrase_v1_0)
  • Category: The category of the dataset you want. Currently we only have landmarks . landmarks include the Mediapipe pose and lips landmarks.
  • Set split: The split of the set that you want. Currently we only have a train dataset.
  • Index files The parquet files are an alternate to the tar files that we serve other datasets with - and will have their data spread across multiple files. We have an index file (usually an index.csv) that contains a mapping on how to stitch the parquet files into the big dataset.

Batched Download

Single file download doesnt make sense in this context, since these parquet files contain segments of a larger dataset, so we are showing how to do a batched download at any of the following granularity levels:

  • Download an entire dataset.
  • Download an entire category within a dataset.
  • Download an entire split set.

Anything more granular does not make sense unless you want to download a single index file. This can be achieved easily by using a simple directly send an HTTP request to that index file in it’s given dataset, category and set split.

Again, this API can be used in any programming language, with the help of a library to create HTTP Requests and manage HTTP responses.

Dataset

To download the entire dataset, you just need to truncate the URL used for single file downloads and only specify the dataset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#!/usr/bin/bash
dataset="assistant_phrase_v1_0"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset" \
&& echo "----------Downloading Files----------" \
&& wget -r -np -nH --cut-dir=2 -P "./$dataset" "https://signdata.cc.gatech.edu/data/$dataset/" \
&& echo "==========Done=========="

Category

To download an entire category, you just need to truncate the URL used for single file downloads and only specify the dataset and category. An example is shown below for the assistant_phrase_v1_0 dataset’s landmarks category.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/bash
dataset="assistant_phrase_v1_0"
category="landmarks"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset/$category" \
&& echo "----------Downloading Files----------" \
&& wget -r -np -nH --cut-dir=2 -P "./$dataset" "https://signdata.cc.gatech.edu/data/$dataset/$category/" \
&& echo "==========Done=========="

Set Split

To download an entire split set, you just need to truncate the URL used for single file downloads and specify the dataset category and split set. An example is shown below for the assistant_phrase_v1_0 dataset’s landmarks category and the train split set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#!/usr/bin/bash
dataset="assistant_phrase_v1_0"
category="landmarks"
set_split="train"

echo "==========Fetching==========" \
&& echo "Dataset: $dataset" \
&& echo "----------Creating Appropriate Directory----------" \
&& mkdir -p "./$dataset/$category/$set_split" \
&& echo "----------Downloading Files----------" \
&& wget -r -np -nH --cut-dir=2 -P "./$dataset" "https://signdata.cc.gatech.edu/data/$dataset/$category/$set_split/" \
&& echo "==========Done=========="