Assistant Phrase ASL Version 1.0

License

Assistant Phrase ASL v1.0 is licensed under CC BY 4.0. For more information, please view the full license.

Assistant Phrase ASL v1.0

The Deaf have difficulty using voice-based assistants both because they can not hear results presented orally and, in some cases, the user may have low/no voice and computer-based speech recognition can have difficulty recognizing a Deaf voice. Several publications have established that the Deaf community desires to communication with these virtual assistants in sign language.

This dataset contains the landmarks (stored in the parquet format) from videos captured of Deaf individuals signing phrases appropriate for controlling a virtual Assistant. The text prompts for the phrases are taken from the public Presto dataset.

train.csv contains the text of all the phrases captured, the parquet filename for the file that contains the phrase and the location of the phrase in the file,

https://signdata.cc.gatech.edu

Data Card Author(s)

  • Thad Starner, Georgia Tech: Owner

Authorship

Publishers

Publishing Organization(s)

  • Georgia Institute of Technology
  • Deaf Professional Arts Network

Industry Type(s)

  • Academic - Tech
  • Not-for-profit - Tech

Contact Detail(s)

  • Publishing POC: Thad Starner
  • Affiliation: Georgia Institute of Technology
  • Contact: thad@gatech.edu

Funding Sources

Institution(s)

  • Deaf Professional Arts Network

Funding or Grant Summary(ies)

DPAN (Deaf Professional Arts Network) is a non-profit. Funding for this project came through non-restrictive gifts to DPAN from both public and private entities.

Additional Notes: Georgia Tech contributed to this project through course work and volunteer efforts.

Dataset Overview

Data Subject(s)

  • Sensitive Data about people (video)
  • Non-Sensitive Data about people

Dataset Snapshot

Category Data
Size of Dataset 3.9 GB (landmarks alone)
Total Number of Video Landmark Files 19,904

Content Description

This dataset was collected late 2023 to early 2024.

Descriptive Statistics

TBD

Sensitivity of Data

Sensitivity Type(s)

  • User Content
  • User Metadata
  • User Activity Data
  • Identifiable Data (video)
  • S/PII (video)

Field(s) with Sensitive Data

Intentional Collected Sensitive Data

(S/PII were collected as a part of the dataset creation process.)

Field Name Description
Participant Video Video of participant (upper body captured)
Participant Sign Video of participant performing isolated sign gestures

Security and Privacy Handling

Method: Participants were given a consent form. They were only allowed to record after providing consent for the following:

“The app will collect video and photographic images of Your face, torso, hands, and whatever is in view of the camera(s) along with associated camera metadata (such as color correction, focal length, etc.). … Beyond the video, the following data may be recorded:

  • The details of each Task, such as the type of Task that was done, signing certain words, or performing specific actions as instructed
  • Date and time information associated with the Tasks
  • Self identified gender
  • Self identified age range
  • Self-identified ethnicity
  • Self assessed sign language proficiency
  • Signing style information (such as general location where You learned, type of sign learned, age range when you started learning, signing community You are most closely associated with, etc)

As described earlier, if you consent, we will use photos or video clips where your face can be identified. We may use identifiable photos or video clips of you in written or oral presentations about this work and in publicly available on-line databases.”

Risk Type(s)

  • Direct Risk (video)
  • Residual Risk

Risk(s) and Mitigation(s)

Video: The direct risk involves participants’ visual features (their face and body) being linked to their full name. To mitigate this risk, we use anonymized user IDs to identify users. There is still some residual risk. Participants may still be identified using their faces alone. This risk is unavoidable with video data. We have participants sign consent forms acknowledging that they are creating a dataset intended for public use.

Dataset Version and Maintenance

Maintenance Status

Regularly Updated - New versions of the dataset have been or will continue to be made available.

Version Details

Current Version: 1.0

Release Date: Sept 2024

Maintenance Plan

Versioning: Major updates will be released as a new version, incremented to the nearest tenth from the previous version. E.g. if the current version is between 1.0 and 1.09, then a major update will be released as version 1.1. Major updates include the addition of new users and/or new signs. Minor updates are covered below.

Updates: If there are missing/extraneous/erroneous videos (error cases described below), any fixes will be released as a new version, incremented by 0.01. E.g. if the current version is 1.0, then any minor updates will be released as 1.01.

Errors: Errors in the dataset include incorrectly labeled videos, missing videos, or extraneous videos. Missing videos include videos that the participant recorded but weren’t included in the final release. Extraneous videos include videos that only have a partial sign or no sign at all, but were included in the final dataset.

Feedback: We will accept feedback via thad@gatech.edu

Next Planned Update(s)

Version affected: 1.0

Next data update: TBD

Next version: 1.1

Next version update: TBD

Expected Change(s)

Updates to Dataset: TBD

Data Points

Primary Data Modality

  • Video Data

Typical Data Point

A typical data point includes only the full motion associated with the signed phrase, with little to no empty space (i.e. no motion) at the beginning or the end of the video. The full phrase must be completed within roughly 5 seconds and a full view of the signing motion must be included.

Atypical Data Point

An atypical data point may include a lot of empty space (i.e. moments without any motion) at the beginning or end of the video. The full sign may take longer than 5 seconds to complete. The beginning or the end of the sign may be obscured by poor camera framing, though the sign should still be recognizable.

Motivations & Intentions

Motivations

Purpose(s)

  • Research
  • Production
  • Education

Domain(s) of Application

Educational Technology, Accessibility, Sign Language Recognition, Machine Learning, Computer Vision

Motivating Factor(s)

  • Developing Sign Language Recognition
  • Teaching Sign
  • Developing Educational Technology

Intended Use

Dataset Use(s)

  • Safe for research use

Suitable Use Case(s)

Suitable Use Case: Sign Language Recognition for interacting with virtual assistants

Suitable Use Case: Incorporating with larger datasets for training a sign recognition system for training Sign Language Recognition for mobile phone/tablet applications and games

In general, the data can be used for sign language recognition and related downstream applications.

Unsuitable Use Case(s)

Unsuitable Use Case: General Continuous Sign Language Recognition/Interpretation

Unsuitable Use Case: Sign to English Translation

The data is not intended for systems seeking to make a general interpreter for continuous sign language recognition or sign language to English translation. It is not representative of the complexity of ASL grammar used in general interaction. Much more data would need to be collected.

Access, Rentention, & Wipeout

Access

Access Type

  • External - Open Access

Retention

Duration

The dataset will be available for at least 5 years, but we have no plans to retire the dataset

Wipeout and Deletion

Policy

We do not have plans to retire the dataset, so there is no deletion policy/procedure

Provenance

Collection

Method(s) Used

We collected data from our mobile recording app, Record These Hands. The app presented 10 phrases for recording in a single recording session. The entire session was captured on video, but the sign recordings happened during specific time intervals. The participants were presented a phrase to record. They then tapped a record button to record themselves signing. The timestamps corresponding to the recording intervals for each sign were saved in a separate file.

Methodology Detail(s)

Collection Method: Record These Hands App

Platform: [Platform Name], Pixel Tablet

Dates of Collection: [10 2023 - 03 2024]

Primary modality of collection data:

  • Video Data

Update Frequency for collected data:

  • Static

Source Description(s)

Participants were recruited by DPAN

Collection Cadence

Static: Data was collected once from single or multiple sources.

Data Processing

Collection Method: Record These Hands app

Description: We split session recordings from the recordings using scripts. The resulting split videos were named following this convention: “<participant_id>--<recording_start_time>-.mp4”.

Tools or libraries: Python, FFMPEG

Collection Criteria

Data Selection

  • Sign contributors were encouraged to correct any mistakes they made while recording the phrases

Data Inclusion

  • We included any videos that seem to be of the correct size.

Data Exclusion

  • We excluded any videos that did not seem to be the correct size

Human and Other Sensitive Attributes

Sensitive Human Attribute(s)

(only for video)

  • Gender
  • Geography
  • Language
  • Age
  • Culture
  • Experience or Seniority

Intentionality

Intentionally Collected Attributes

Human attributes were labeled or collected as a part of the dataset creation process.

Field Name Description
Sign Language Participant’s signing style and dialect
Gender Participant’s Gender
Age Range Participant’s Age

Additional Notes: By providing sign data, participants provided information about their signing style and preferred dialect.

Unintentionally Collected Attributes

Human attributes were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.

Field Name Description
Geography Participant’s geographic location
Culture Participant’s Ethnic Background
Seniority Participant’s signing proficiency

Additional Notes: We did not intentionally collect the attributes listed above, but they may be (incorrectly) inferred from the videos. For instance, videos may suggestive of the participant’s age or their signing proficiency. Such inferences may be incorrect since may of these attributes cannot be determined using visual cues alone and may depend on the participant’s self identification.

Rationale

We intended to collect sign language phrase data; hence videos of the participant’s signing were collected. The collected attributes (both intentional and unintentional) may be inferred (though not always accurately) from the videos.

Risk(s) and Mitigation(s)

(video data) The direct risk with this type of video data is with the participant’s identity being revealed. For this reason, we use anonymous identifiers. There is still some residual risk with the participant being identified through their faces (in video) alone. These participants have signed a consent form (given in the Data Sensitivity section) to address this concern.

Annotations & Labeling

Annotation Workforce Type

  • Annotation Target in Data

Annotation Characteristic(s)

Annotation Type Number
Number of unique annotations 30,000
Total number of annotations 30,000
Average annotations per example 1

Annotation Description(s)

Description: Annotations were automatically generated with the data, since participants were prompted to record specific phrases in each sessions. These sign labels served as the target for the phrase recognition problem.

Annotation Distribution(s)

Validation Types

Method(s)

Description(s)

Evaluation Process(es)

Evaluation Result(s)