Popsign Version 1.0

License

Popsign v1.0 is licensed under CC BY 4.0. For more information, please view the full license.

Papers

PopSign ASL v1.0

95% of deaf children are born to hearing parents. Since many hearing parents do not know sign, these deaf children are at risk for language acquisition delays resulting in cognitive issues. We are making an educational smartphone game PopSign that helps hearing parents practice their signing vocabulary.

Our dataset is the largest collection of isolated sign videos collected using mobile phones. We are using the data to train recognition models for use in smartphone applications, including the PopSign game. PopSign and related educational technology teach hearing parents and deaf children to sign, reducing developmental problems.

https://signdata.cc.gatech.edu

Data Card Author(s)

  • Thad Starner, Georgia Tech: Owner
  • Rohit Sridhar, Georgia Tech: Contributor
  • Matthew So, Georgia Tech: Contributor
  • Gururaj Deshpande, Georgia Tech: Contributor

Authorship

Publishers

Publishing Organization(s)

  • Georgia Institute of Technology
  • Deaf Professional Arts Network

Industry Type(s)

  • Academic - Tech
  • Not-for-profit - Tech

Contact Detail(s)

Funding Sources

Institution(s)

  • Deaf Professional Arts Network

Funding or Grant Summary(ies)

DPAN (Deaf Professional Arts Network) is a non-profit. Funding for this project came through non-restrictive gifts to DPAN from both public and private entities.

Additional Notes: Georgia Tech contributed to this project through course work and volunteer efforts.

Dataset Overview

Data Subject(s)

  • Sensitive Data about people
  • Non-Sensitive Data about people

Dataset Snapshot

Category Data
Size of Dataset 1.1 TB
Total Number of Videos 200,686
Number of Game Videos 165198
Total Number of Signs 250
Total Number of Signers 47
Average Videos Per Sign 803
Number of Video Quality Categories 3

Content Description

This dataset was collected from October 2022 to March 2023. Videos were sorted into three categories: game (videos which only contained the sign intended to be used for the PopSign game), unrecognizable (videos which clearly did not correspond to any sign and are not included), and variant (videos which contained signs that did not match the game sign). This dataset was collected from October 2022 to March 2023. Videos were sorted into three categories: game (videos which only contained the sign intended to be used for the PopSign game), unrecognizable (videos which clearly did not correspond to any sign and are not included), and variant (videos which contained signs that did not match the game sign).

Descriptive Statistics

Statistic Game Videos Per Sign Variant Videos Per Sign
count 250 250
mean 660 141
std 172 160
min 96 0
25% 557 24
50% 727 85
75% 807 267
max 868 679

Sensitivity of Data

Sensitivity Type(s)

  • User Content
  • User Metadata
  • User Activity Data
  • Identifiable Data
  • S/PII

Field(s) with Sensitive Data

Intentional Collected Sensitive Data

(S/PII were collected as a part of the dataset creation process.)

Field Name Description
Participant Video Video of participant (upper body captured)
Participant Sign Video of participant performing isolated sign gestures

Security and Privacy Handling

Method: Participants were given a consent form. They were only allowed to record after providing consent for the following:

“The app will collect video and photographic images of Your face, torso, hands, and whatever is in view of the camera(s) along with associated camera metadata (such as color correction, focal length, etc.). … Beyond the video, the following data may be recorded:

  • The details of each Task, such as the type of Task that was done, signing certain words, or performing specific actions as instructed
  • Date and time information associated with the Tasks
  • Self identified gender
  • Self identified age range
  • Self-identified ethnicity
  • Self assessed sign language proficiency
  • Signing style information (such as general location where You learned, type of sign learned, age range when you started learning, signing community You are most closely associated with, etc)

As described earlier, if you consent, we will use photos or video clips where your face can be identified. We may use identifiable photos or video clips of you in written or oral presentations about this work and in publicly available on-line databases.”

Risk Type(s)

  • Direct Risk
  • Residual Risk

Risk(s) and Mitigation(s)

The direct risk involves participants’ visual features (their face and body) being linked to their full name. To mitigate this risk, we use anonymized user IDs to identify users. There is still some residual risk. Participants may still be identified using their faces alone. This risk is unavoidable with video data. We have participants sign consent forms acknowledging that they are creating a dataset intended for public use.

Dataset Version and Maintenance

Maintenance Status

Regularly Updated - New versions of the dataset have been or will continue to be made available.

Version Details

Current Version: 1.0

Release Date: Sept 2023

Maintenance Plan

Versioning: Major updates will be released as a new version, incremented to the nearest tenth from the previous version. For example, if the current version is between 1.0 and 1.09, then a major update will be released as version 1.1. Major updates include the addition of new users and/or new signs. Minor updates are covered below.

Updates: If there are missing/extraneous/erroneous videos (error cases described below), any fixes will be released as a new version, incremented by 0.01. E.g. if the current version is 1.0, then any minor updates will be released as 1.01.

Errors: Errors in the dataset include incorrectly labeled videos, missing videos, or extraneous videos. Missing videos include videos that the participant recorded but weren’t included in the final release. Extraneous videos include videos that only have a partial sign or no sign at all, but were included in the final dataset.

Feedback: We will accept feedback via our group email, popsigngame@gmail.com.

Next Planned Update(s)

Version affected: 1.0

Next data update: TBD

Next version: 1.1

Next version update: TBD

Expected Change(s)

Updates to Dataset: The current dataset includes 250 signs from the MacArthur Bates CDI. We plan to include an additional 313 signs in the new version. It will be recorded by a new set of participants.

Data Points

Primary Data Modality

  • Video Data

Typical Data Point

A typical data point includes only the full sign motion, with little to no empty space (i.e. no motion) at the beginning or the end of the video. The full sign must be completed within roughly 1 or 2 seconds and a full view of the signing motion must be included. The sign must be the example variant provided in our in house ASL Capture app.

Atypical Data Point

An atypical data point may include a lot of empty space (i.e. moments without any motion) at the beginning or end of the video. The full sign may take longer than 1 or 2 seconds to complete. The beginning or the end of the sign may be obscured by poor camera framing, though the sign should still be recognizable. Atypical data points also include signs that are alternate variants or are fingerspelled, rather than the example sign variant provided by the in house ASL Capture App.

Motivations & Intentions

Motivations

Purpose(s)

  • Research
  • Production
  • Education

Domain(s) of Application

Educational Technology, Accessibility, Sign Language Recognition, Machine Learning, Computer Vision

Motivating Factor(s)

  • Teaching Sign
  • Developing Educational Technology
  • Developing Sign Language Recognition

95% of deaf children are born to hearing parents. The communication barrier can cause language deficiencies and cognitive issues in these children. We want to close the gap by developing interactive technologies using sign language recognition, to actively teach hearing parents sign.

Our dataset is collected on mobile phones and is designed to facilitate sign language recognition in mobile games to interactively teach sign. We also hope our example will bring wider focus on the use of interactive machine learning in general to improve educational technology.

Intended Use

Dataset Use(s)

  • Safe for research use

Suitable Use Case(s)

Suitable Use Case: Isolated Sign Language Recognition

Suitable Use Case: Isolated Sign Language Recognition for mobile phone applications and games

In general, the data can be used for isolated sign language recognition and related downstream applications.

Unsuitable Use Case(s)

Unsuitable Use Case: Continuous Sign Language Recognition

Unsuitable Use Case: Sign to English Translation

The data is not intended for use in continuous sign language recognition, or sign language to English translation.

Access, Rentention, & Wipeout

Access

Access Type

  • External - Open Access

Retention

Duration

The dataset will be available for at least 5 years, but we have no plans to retire the dataset

Wipeout and Deletion

Policy

We do not have plans to retire the dataset, so there is no deletion policy/procedure

Provenance

Collection

Method(s) Used

We collected data from our mobile recording app, the ASL Capture App. The app presented 10 signs for recording in a single recording session. The entire session was captured on video, but the sign recordings happened during specific time intervals. The participants were presented a sign to record. They then tapped and held a record button to record themselves signing. The timestamps corresponding to the recording intervals for each sign were saved in a separate file.

Methodology Detail(s)

Collection Method: ASL Capture App

Platform: [Platform Name], Google Pixel 4a

Dates of Collection: [10 2022 - 02 2023]

Primary modality of collection data:

  • Video Data

Update Frequency for collected data:

  • Static

Source Description(s)

Participants were recruited by DPAN

Collection Cadence

Static: Data was collected once from single or multiple sources.

Data Processing

Collection Method: ASL Capture App

Description: We split session recordings from the ASL Capture App using a python. The resulting split videos were named following this convention: “<participant_id>--<recording_start_time>-.mp4”.

Tools or libraries: Python, FFMPEG

Collection Criteria

Data Selection

  • We selected data based on whether a sign was recognizable or not, i.e. the sign could be clearly made out from the video. We further categorized whether the sign presented was the example sign we (the data collectors) intended for the participant to sign, or another variant.

Data Inclusion

  • We included any videos that contained a discernible sign and most of the participant’s sign was in frame.

Data Exclusion

  • We excluded any videos that did not contain a full sign from the participant. We also excluded videos where the participant was out of frame so as to make the sign unrecognizable.

Human and Other Sensitive Attributes

Sensitive Human Attribute(s)

  • Gender
  • Geography
  • Language
  • Age
  • Culture
  • Experience or Seniority

Intentionality

Intentionally Collected Attributes

Human attributes were labeled or collected as a part of the dataset creation process.

Field Name Description
Sign Language Participant’s signing style and dialect
Gender Participant’s Gender
Age Range Participant’s Age

Additional Notes: By providing isolated sign data, participants provided information about their signing style and preferred dialect.

Unintentionally Collected Attributes

Human attributes were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.

Field Name Description
Geography Participant’s geographic location
Culture Participant’s Ethnic Background
Seniority Participant’s signing proficiency

Additional Notes: We did not intentionally collect the attributes listed above, but they may be (incorrectly) inferred from the videos. For instance, videos may suggestive of the participant’s age or their signing proficiency. Such inferences may be incorrect since may of these attributes cannot be determined using visual cues alone and may depend on the participant’s self identification.

Rationale

We intended to collect isolated sign language data; hence videos of the participant’s signing were collected. The collected attributes (both intentional and unintentional) may be inferred (though not always accurately) from the videos.

Risk(s) and Mitigation(s)

The direct risk with this type of video data is with the participant’s identity being revealed. For this reason, we use anonymous identifiers. There is still some residual risk with the participant being identified through their faces alone. These participants have signed a consent form (given in the Data Sensitivity section) to address this concern.

Annotations & Labeling

Annotation Workforce Type

  • Annotation Target in Data

Annotation Characteristic(s)

Annotation Type Number
Number of unique annotations 250
Total number of annotations 210,598
Average annotations per example 1

Annotation Description(s)

Description: Annotations were automatically generated with the data, since participants were prompted to record specific signs in each sessions. These sign labels served as the target for the Isolated Sign Language Recognition problem.

Annotation Distribution(s)

Sign Count
call (on phone) 869
hear 865
pen 864
man 863
hen 861

Above: We provide the top 5 sign annotations that occur in the dataset. Note that these counts do not include cases where the sign was unrecognizable in the video (these are post-validation counts). To understand our validation procedure, see the next section.

Validation Types

Method(s)

  • Annotation/Label Validation

Description(s)

(Validation Type)

Method: Each video was validated by a team of reviewers. Reviewers would first check whether the video contained a discernible sign and then check whether the sign was the example variant we provided. Only those videos with any discernible sign are kept, while the videos with different sign variants have been preserved for linguistic analysis.

Platforms, tools, or libraries:

  • Python, openCV

Description of Human Validators

Characteristic(s)

(Validation Type)

  • Unique validators: 15
  • Number of examples per validator: 14,774
  • Training provided: Yes
  • Expertise required: No

Description(s)

(Validation Type)

Training provided: We trained validators in watching videos for discernible signs and in detecting the variants. We also trained validators in how to use the tool.

Validator selection criteria: We selected validators who had interest in sign language research and who were technically proficient enough to use the Validation tool.

Gender(s)

(Validation Type)

  • Identifies as Male (60%)
  • Identifies as Female (40%)

Known Applications & Benchmarks

ML Application(s)

  • Isolated Sign Language Recognition
  • Mobile Applications using Sign Language Recognition

Evaluation Process(es)

We train an LSTM model designed to output a label (one of the 250 signs) on the training set and then compute accuracies on the validation/test sets.

Evaluation Result(s)

Evaluation Results

  • Accuracy (over Total Videos): 82% (Val), 84% (Test)

We provide accuracy in which the denominator consists of all of the originally recorded videos. For a small percent of videos, MediaPipe did not generate features. Note that during PopSign gameplay, players naturally ensure that the hand tracking is showing the overlay skeleton on their hands before signing (i.e., the players are active participants in trying to get the recognition to work). Thus, for the current application, using accuracy measures for the files with Mediapipe Hands features gives a better sense of the accuracy expected during gameplay.