Assistant Phrase ASL Version 1.0
License
Assistant Phrase ASL v1.0 is licensed under CC BY 4.0. For more information, please view the full license.
Assistant Phrase ASL v1.0
The Deaf have difficulty using voice-based assistants both because they can not hear results presented orally and, in some cases, the user may have low/no voice and computer-based speech recognition can have difficulty recognizing a Deaf voice. Several publications have established that the Deaf community desires to communication with these virtual assistants in sign language.
This dataset contains the landmarks (stored in the parquet format) from videos captured of Deaf individuals signing phrases appropriate for controlling a virtual Assistant. The text prompts for the phrases are taken from the public Presto dataset.
train.csv contains the text of all the phrases captured, the parquet filename for the file that contains the phrase and the location of the phrase in the file,
Dataset Link
https://signdata.cc.gatech.edu
Data Card Author(s)
- Thad Starner, Georgia Tech: Owner
Authorship
Publishers
Publishing Organization(s)
- Georgia Institute of Technology
- Deaf Professional Arts Network
Industry Type(s)
- Academic - Tech
- Not-for-profit - Tech
Contact Detail(s)
- Publishing POC: Thad Starner
- Affiliation: Georgia Institute of Technology
- Contact: thad@gatech.edu
Funding Sources
Institution(s)
- Deaf Professional Arts Network
Funding or Grant Summary(ies)
DPAN (Deaf Professional Arts Network) is a non-profit. Funding for this project came through non-restrictive gifts to DPAN from both public and private entities.
Additional Notes: Georgia Tech contributed to this project through course work and volunteer efforts.
Dataset Overview
Data Subject(s)
- Sensitive Data about people (video)
- Non-Sensitive Data about people
Dataset Snapshot
Category | Data |
---|---|
Size of Dataset | 3.9 GB (landmarks alone) |
Total Number of Video Landmark Files | 19,904 |
Content Description
This dataset was collected late 2023 to early 2024.
Descriptive Statistics
TBD
Sensitivity of Data
Sensitivity Type(s)
- User Content
- User Metadata
- User Activity Data
- Identifiable Data (video)
- S/PII (video)
Field(s) with Sensitive Data
Intentional Collected Sensitive Data
(S/PII were collected as a part of the dataset creation process.)
Field Name | Description |
---|---|
Participant Video | Video of participant (upper body captured) |
Participant Sign | Video of participant performing isolated sign gestures |
Security and Privacy Handling
Method: Participants were given a consent form. They were only allowed to record after providing consent for the following:
“The app will collect video and photographic images of Your face, torso, hands, and whatever is in view of the camera(s) along with associated camera metadata (such as color correction, focal length, etc.). … Beyond the video, the following data may be recorded:
- The details of each Task, such as the type of Task that was done, signing certain words, or performing specific actions as instructed
- Date and time information associated with the Tasks
- Self identified gender
- Self identified age range
- Self-identified ethnicity
- Self assessed sign language proficiency
- Signing style information (such as general location where You learned, type of sign learned, age range when you started learning, signing community You are most closely associated with, etc)
As described earlier, if you consent, we will use photos or video clips where your face can be identified. We may use identifiable photos or video clips of you in written or oral presentations about this work and in publicly available on-line databases.”
Risk Type(s)
- Direct Risk (video)
- Residual Risk
Risk(s) and Mitigation(s)
Video: The direct risk involves participants’ visual features (their face and body) being linked to their full name. To mitigate this risk, we use anonymized user IDs to identify users. There is still some residual risk. Participants may still be identified using their faces alone. This risk is unavoidable with video data. We have participants sign consent forms acknowledging that they are creating a dataset intended for public use.
Dataset Version and Maintenance
Maintenance Status
Regularly Updated - New versions of the dataset have been or will continue to be made available.
Version Details
Current Version: 1.0
Release Date: Sept 2024
Maintenance Plan
Versioning: Major updates will be released as a new version, incremented to the nearest tenth from the previous version. E.g. if the current version is between 1.0 and 1.09, then a major update will be released as version 1.1. Major updates include the addition of new users and/or new signs. Minor updates are covered below.
Updates: If there are missing/extraneous/erroneous videos (error cases described below), any fixes will be released as a new version, incremented by 0.01. E.g. if the current version is 1.0, then any minor updates will be released as 1.01.
Errors: Errors in the dataset include incorrectly labeled videos, missing videos, or extraneous videos. Missing videos include videos that the participant recorded but weren’t included in the final release. Extraneous videos include videos that only have a partial sign or no sign at all, but were included in the final dataset.
Feedback: We will accept feedback via thad@gatech.edu
Next Planned Update(s)
Version affected: 1.0
Next data update: TBD
Next version: 1.1
Next version update: TBD
Expected Change(s)
Updates to Dataset: TBD
Data Points
Primary Data Modality
- Video Data
Typical Data Point
A typical data point includes only the full motion associated with the signed phrase, with little to no empty space (i.e. no motion) at the beginning or the end of the video. The full phrase must be completed within roughly 5 seconds and a full view of the signing motion must be included.
Atypical Data Point
An atypical data point may include a lot of empty space (i.e. moments without any motion) at the beginning or end of the video. The full sign may take longer than 5 seconds to complete. The beginning or the end of the sign may be obscured by poor camera framing, though the sign should still be recognizable.
Motivations & Intentions
Motivations
Purpose(s)
- Research
- Production
- Education
Domain(s) of Application
Educational Technology
, Accessibility
, Sign Language Recognition
, Machine Learning
, Computer Vision
Motivating Factor(s)
- Developing Sign Language Recognition
- Teaching Sign
- Developing Educational Technology
Intended Use
Dataset Use(s)
- Safe for research use
Suitable Use Case(s)
Suitable Use Case: Sign Language Recognition for interacting with virtual assistants
Suitable Use Case: Incorporating with larger datasets for training a sign recognition system for training Sign Language Recognition for mobile phone/tablet applications and games
In general, the data can be used for sign language recognition and related downstream applications.
Unsuitable Use Case(s)
Unsuitable Use Case: General Continuous Sign Language Recognition/Interpretation
Unsuitable Use Case: Sign to English Translation
The data is not intended for systems seeking to make a general interpreter for continuous sign language recognition or sign language to English translation. It is not representative of the complexity of ASL grammar used in general interaction. Much more data would need to be collected.
Access, Rentention, & Wipeout
Access
Access Type
- External - Open Access
Documentation Link(s)
- Dataset Website URL: https://signdata.cc.gatech.edu
Retention
Duration
The dataset will be available for at least 5 years, but we have no plans to retire the dataset
Wipeout and Deletion
Policy
We do not have plans to retire the dataset, so there is no deletion policy/procedure
Provenance
Collection
Method(s) Used
We collected data from our mobile recording app, Record These Hands. The app presented 10 phrases for recording in a single recording session. The entire session was captured on video, but the sign recordings happened during specific time intervals. The participants were presented a phrase to record. They then tapped a record button to record themselves signing. The timestamps corresponding to the recording intervals for each sign were saved in a separate file.
Methodology Detail(s)
Collection Method: Record These Hands App
Platform: [Platform Name], Pixel Tablet
Dates of Collection: [10 2023 - 03 2024]
Primary modality of collection data:
- Video Data
Update Frequency for collected data:
- Static
Source Description(s)
Participants were recruited by DPAN
Collection Cadence
Static: Data was collected once from single or multiple sources.
Data Processing
Collection Method: Record These Hands app
Description: We split session recordings from the recordings using scripts. The resulting split videos were named following this convention: “<participant_id>-
Tools or libraries: Python, FFMPEG
Collection Criteria
Data Selection
- Sign contributors were encouraged to correct any mistakes they made while recording the phrases
Data Inclusion
- We included any videos that seem to be of the correct size.
Data Exclusion
- We excluded any videos that did not seem to be the correct size
Human and Other Sensitive Attributes
Sensitive Human Attribute(s)
(only for video)
- Gender
- Geography
- Language
- Age
- Culture
- Experience or Seniority
Intentionality
Intentionally Collected Attributes
Human attributes were labeled or collected as a part of the dataset creation process.
Field Name | Description |
---|---|
Sign Language | Participant’s signing style and dialect |
Gender | Participant’s Gender |
Age Range | Participant’s Age |
Additional Notes: By providing sign data, participants provided information about their signing style and preferred dialect.
Unintentionally Collected Attributes
Human attributes were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.
Field Name | Description |
---|---|
Geography | Participant’s geographic location |
Culture | Participant’s Ethnic Background |
Seniority | Participant’s signing proficiency |
Additional Notes: We did not intentionally collect the attributes listed above, but they may be (incorrectly) inferred from the videos. For instance, videos may suggestive of the participant’s age or their signing proficiency. Such inferences may be incorrect since may of these attributes cannot be determined using visual cues alone and may depend on the participant’s self identification.
Rationale
We intended to collect sign language phrase data; hence videos of the participant’s signing were collected. The collected attributes (both intentional and unintentional) may be inferred (though not always accurately) from the videos.
Risk(s) and Mitigation(s)
(video data) The direct risk with this type of video data is with the participant’s identity being revealed. For this reason, we use anonymous identifiers. There is still some residual risk with the participant being identified through their faces (in video) alone. These participants have signed a consent form (given in the Data Sensitivity section) to address this concern.
Annotations & Labeling
Annotation Workforce Type
- Annotation Target in Data
Annotation Characteristic(s)
Annotation Type | Number |
---|---|
Number of unique annotations | 30,000 |
Total number of annotations | 30,000 |
Average annotations per example | 1 |
Annotation Description(s)
Description: Annotations were automatically generated with the data, since participants were prompted to record specific phrases in each sessions. These sign labels served as the target for the phrase recognition problem.