Assistant Phrase ASL Version 1.0

This dataset has not been released

This dataset has not yet been released. While a reference exists on this website - the actual data provided by the dataset is not currently available.

License

Assistant Phrase ASL v1.0 is licensed under CC BY 4.0. For more information, please view the full license.

Assistant Phrase ASL v1.0

The Deaf have difficulty using voice-based assistants both because they can not hear results presented orally and, in some cases, the user may have low/no voice and computer-based speech recognition can have difficulty recognizing a Deaf voice. Several publications have established that the Deaf community desires to communication with these virtual assistants in sign language.

This dataset contains the landmarks (stored in the parquet format) from videos captured of Deaf individuals signing phrases appropriate for controlling a virtual Assistant. The text prompts for the phrases are taken from the public Presto dataset.

train.csv contains the text of all the phrases captured, the parquet filename for the file that contains the phrase and the location of the phrase in the file,

Dataset Link

https://signdata.cc.gatech.edu

Data Card Author(s)

Thad Starner, Georgia Tech: Owner

Authorship

Publishers

Publishing Organization(s)

Georgia Institute of Technology
Deaf Professional Arts Network

Industry Type(s)

Academic - Tech
Not-for-profit - Tech

Contact Detail(s)

Publishing POC: Thad Starner
Affiliation: Georgia Institute of Technology
Contact: thad@gatech.edu

Funding Sources

Institution(s)

Deaf Professional Arts Network

Funding or Grant Summary(ies)

DPAN (Deaf Professional Arts Network) is a non-profit. Funding for this project came through non-restrictive gifts to DPAN from both public and private entities.

Additional Notes: Georgia Tech contributed to this project through course work and volunteer efforts.

Dataset Overview

Data Subject(s)

Sensitive Data about people (video)
Non-Sensitive Data about people

Dataset Snapshot

Category	Data
Size of Dataset	3.9 GB (landmarks alone)
Total Number of Video Landmark Files	19,904

Content Description

This dataset was collected late 2023 to early 2024.

Descriptive Statistics

TBD

Sensitivity of Data

Sensitivity Type(s)

User Content
User Metadata
User Activity Data
Identifiable Data (video)
S/PII (video)

Field(s) with Sensitive Data

Intentional Collected Sensitive Data

(S/PII were collected as a part of the dataset creation process.)

Field Name	Description
Participant Video	Video of participant (upper body captured)
Participant Sign	Video of participant performing isolated sign gestures

Security and Privacy Handling

Method: Participants were given a consent form. They were only allowed to record after providing consent for the following:

“The app will collect video and photographic images of Your face, torso, hands, and whatever is in view of the camera(s) along with associated camera metadata (such as color correction, focal length, etc.). … Beyond the video, the following data may be recorded:

The details of each Task, such as the type of Task that was done, signing certain words, or performing specific actions as instructed
Date and time information associated with the Tasks
Self identified gender
Self identified age range
Self-identified ethnicity
Self assessed sign language proficiency
Signing style information (such as general location where You learned, type of sign learned, age range when you started learning, signing community You are most closely associated with, etc)

As described earlier, if you consent, we will use photos or video clips where your face can be identified. We may use identifiable photos or video clips of you in written or oral presentations about this work and in publicly available on-line databases.”

Risk Type(s)

Direct Risk (video)
Residual Risk

Risk(s) and Mitigation(s)

Video: The direct risk involves participants’ visual features (their face and body) being linked to their full name. To mitigate this risk, we use anonymized user IDs to identify users. There is still some residual risk. Participants may still be identified using their faces alone. This risk is unavoidable with video data. We have participants sign consent forms acknowledging that they are creating a dataset intended for public use.

Dataset Version and Maintenance

Maintenance Status

Regularly Updated - New versions of the dataset have been or will continue to be made available.

Version Details

Current Version: 1.0

Release Date: Sept 2024

Maintenance Plan

Versioning: Major updates will be released as a new version, incremented to the nearest tenth from the previous version. E.g. if the current version is between 1.0 and 1.09, then a major update will be released as version 1.1. Major updates include the addition of new users and/or new signs. Minor updates are covered below.

Updates: If there are missing/extraneous/erroneous videos (error cases described below), any fixes will be released as a new version, incremented by 0.01. E.g. if the current version is 1.0, then any minor updates will be released as 1.01.

Errors: Errors in the dataset include incorrectly labeled videos, missing videos, or extraneous videos. Missing videos include videos that the participant recorded but weren’t included in the final release. Extraneous videos include videos that only have a partial sign or no sign at all, but were included in the final dataset.

Feedback: We will accept feedback via thad@gatech.edu

Next Planned Update(s)

Version affected: 1.0

Next data update: TBD

Next version: 1.1

Next version update: TBD

Expected Change(s)

Updates to Dataset: TBD

Data Points

Primary Data Modality

Video Data

Typical Data Point

A typical data point includes only the full motion associated with the signed phrase, with little to no empty space (i.e. no motion) at the beginning or the end of the video. The full phrase must be completed within roughly 5 seconds and a full view of the signing motion must be included.

Atypical Data Point

An atypical data point may include a lot of empty space (i.e. moments without any motion) at the beginning or end of the video. The full sign may take longer than 5 seconds to complete. The beginning or the end of the sign may be obscured by poor camera framing, though the sign should still be recognizable.

Motivations & Intentions

Motivations

Purpose(s)

Research
Production
Education

Domain(s) of Application

Educational Technology, Accessibility, Sign Language Recognition, Machine Learning, Computer Vision

Motivating Factor(s)

Developing Sign Language Recognition
Teaching Sign
Developing Educational Technology

Intended Use

Dataset Use(s)

Safe for research use

Suitable Use Case(s)

Suitable Use Case: Sign Language Recognition for interacting with virtual assistants

Suitable Use Case: Incorporating with larger datasets for training a sign recognition system for training Sign Language Recognition for mobile phone/tablet applications and games

In general, the data can be used for sign language recognition and related downstream applications.

Unsuitable Use Case(s)

Unsuitable Use Case: General Continuous Sign Language Recognition/Interpretation

Unsuitable Use Case: Sign to English Translation

The data is not intended for systems seeking to make a general interpreter for continuous sign language recognition or sign language to English translation. It is not representative of the complexity of ASL grammar used in general interaction. Much more data would need to be collected.

Access, Rentention, & Wipeout

Access

Access Type

External - Open Access

Documentation Link(s)

Dataset Website URL: https://signdata.cc.gatech.edu

Retention

Duration

The dataset will be available for at least 5 years, but we have no plans to retire the dataset

Wipeout and Deletion

Policy

We do not have plans to retire the dataset, so there is no deletion policy/procedure

Provenance

Collection

Method(s) Used

We collected data from our mobile recording app, Record These Hands. The app presented 10 phrases for recording in a single recording session. The entire session was captured on video, but the sign recordings happened during specific time intervals. The participants were presented a phrase to record. They then tapped a record button to record themselves signing. The timestamps corresponding to the recording intervals for each sign were saved in a separate file.

Methodology Detail(s)

Collection Method: Record These Hands App

Platform: [Platform Name], Pixel Tablet

Dates of Collection: [10 2023 - 03 2024]

Primary modality of collection data:

Video Data

Update Frequency for collected data:

Static

Source Description(s)

Participants were recruited by DPAN

Collection Cadence

Static: Data was collected once from single or multiple sources.

Data Processing

Collection Method: Record These Hands app

Description: We split session recordings from the recordings using scripts. The resulting split videos were named following this convention: “<participant_id>--<recording_start_time>-.mp4”.

Tools or libraries: Python, FFMPEG

Collection Criteria

Data Selection

Sign contributors were encouraged to correct any mistakes they made while recording the phrases

Data Inclusion

We included any videos that seem to be of the correct size.

Data Exclusion

We excluded any videos that did not seem to be the correct size

Human and Other Sensitive Attributes

Sensitive Human Attribute(s)

(only for video)

Gender
Geography
Language
Age
Culture
Experience or Seniority

Intentionality

Intentionally Collected Attributes

Human attributes were labeled or collected as a part of the dataset creation process.

Field Name	Description
Sign Language	Participant’s signing style and dialect
Gender	Participant’s Gender
Age Range	Participant’s Age

Additional Notes: By providing sign data, participants provided information about their signing style and preferred dialect.

Unintentionally Collected Attributes

Human attributes were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.

Field Name	Description
Geography	Participant’s geographic location
Culture	Participant’s Ethnic Background
Seniority	Participant’s signing proficiency

Additional Notes: We did not intentionally collect the attributes listed above, but they may be (incorrectly) inferred from the videos. For instance, videos may suggestive of the participant’s age or their signing proficiency. Such inferences may be incorrect since may of these attributes cannot be determined using visual cues alone and may depend on the participant’s self identification.

Rationale

We intended to collect sign language phrase data; hence videos of the participant’s signing were collected. The collected attributes (both intentional and unintentional) may be inferred (though not always accurately) from the videos.

Risk(s) and Mitigation(s)

(video data) The direct risk with this type of video data is with the participant’s identity being revealed. For this reason, we use anonymous identifiers. There is still some residual risk with the participant being identified through their faces (in video) alone. These participants have signed a consent form (given in the Data Sensitivity section) to address this concern.

Annotations & Labeling

Annotation Workforce Type

Annotation Target in Data

Annotation Characteristic(s)

Annotation Type	Number
Number of unique annotations	30,000
Total number of annotations	30,000
Average annotations per example	1

Annotation Description(s)

Description: Annotations were automatically generated with the data, since participants were prompted to record specific phrases in each sessions. These sign labels served as the target for the phrase recognition problem.

Assistant Phrase ASL Version 1.0

License

Assistant Phrase ASL v1.0

Dataset Link

Data Card Author(s)

Authorship

Publishers

Publishing Organization(s)

Industry Type(s)

Contact Detail(s)

Funding Sources

Institution(s)

Funding or Grant Summary(ies)

Dataset Overview

Data Subject(s)

Dataset Snapshot

Content Description

Descriptive Statistics

Sensitivity of Data

Sensitivity Type(s)

Field(s) with Sensitive Data

Security and Privacy Handling

Risk Type(s)

Risk(s) and Mitigation(s)

Dataset Version and Maintenance

Maintenance Status

Version Details

Maintenance Plan

Next Planned Update(s)

Expected Change(s)

Data Points

Primary Data Modality

Typical Data Point

Atypical Data Point

Motivations & Intentions

Motivations

Purpose(s)

Domain(s) of Application

Motivating Factor(s)

Intended Use

Dataset Use(s)

Suitable Use Case(s)

Unsuitable Use Case(s)

Access, Rentention, & Wipeout

Access

Access Type

Documentation Link(s)

Retention

Duration

Wipeout and Deletion

Policy

Provenance

Collection

Method(s) Used

Methodology Detail(s)

Source Description(s)

Collection Cadence

Data Processing

Collection Criteria

Data Selection

Data Inclusion

Data Exclusion

Human and Other Sensitive Attributes

Sensitive Human Attribute(s)

Intentionality

Rationale

Risk(s) and Mitigation(s)

Annotations & Labeling

Annotation Workforce Type

Annotation Characteristic(s)

Annotation Description(s)

Annotation Distribution(s)

Validation Types

Method(s)

Description(s)

Evaluation Process(es)

Evaluation Result(s)