Downloading the Assistants dataset
This datasets can be programatically downloaded using a simple utility like wget, based on the following, simple API that we have exposed.
This API can be used in any programming language, with the help of a library to create HTTP Requests and manage HTTP responses.
When downloading a dataset, you need to keep track of the following:
- Dataset: Which dataset you are downloading from (currently:
assistant_phrase_v1_0
) - Category: The category of the dataset you want. Currently we only have
landmarks
.landmarks
include the Mediapipe pose and lips landmarks. - Set split: The split of the set that you want. Currently we only have a
train
dataset. - Index files The parquet files are an alternate to the tar files that we serve other datasets with - and will have their data spread across multiple files. We have an index file (usually an
index.csv
) that contains a mapping on how to stitch the parquet files into the big dataset.
Batched Download
Single file download doesnt make sense in this context, since these parquet files contain segments of a larger dataset, so we are showing how to do a batched download at any of the following granularity levels:
- Download an entire dataset.
- Download an entire category within a dataset.
- Download an entire split set.
Anything more granular does not make sense unless you want to download a single index file. This can be achieved easily by using a simple directly send an HTTP request to that index file in it’s given dataset, category and set split.
Again, this API can be used in any programming language, with the help of a library to create HTTP Requests and manage HTTP responses.
Dataset
To download the entire dataset, you just need to truncate the URL used for single file downloads and only specify the dataset.
|
|
Category
To download an entire category, you just need to truncate the URL used for single file downloads and only specify the dataset and category. An example is shown below for the assistant_phrase_v1_0
dataset’s landmarks
category.
|
|
Set Split
To download an entire split set, you just need to truncate the URL used for single file downloads and specify the dataset category and split set. An example is shown below for the assistant_phrase_v1_0
dataset’s landmarks
category and the train
split set.
|
|