Popsign Version 1.0
License
Popsign v1.0 is licensed under CC BY 4.0. For more information, please view the full license.
Papers
PopSign ASL v1.0
95% of deaf children are born to hearing parents. Since many hearing parents do not know sign, these deaf children are at risk for language acquisition delays resulting in cognitive issues. We are making an educational smartphone game PopSign that helps hearing parents practice their signing vocabulary.
Our dataset is the largest collection of isolated sign videos collected using mobile phones. We are using the data to train recognition models for use in smartphone applications, including the PopSign game. PopSign and related educational technology teach hearing parents and deaf children to sign, reducing developmental problems.
Dataset Link
https://signdata.cc.gatech.edu
Data Card Author(s)
- Thad Starner, Georgia Tech: Owner
- Rohit Sridhar, Georgia Tech: Contributor
- Matthew So, Georgia Tech: Contributor
- Gururaj Deshpande, Georgia Tech: Contributor
Authorship
Publishers
Publishing Organization(s)
- Georgia Institute of Technology
- Deaf Professional Arts Network
Industry Type(s)
- Academic - Tech
- Not-for-profit - Tech
Contact Detail(s)
- Publishing POC: Thad Starner
- Affiliation: Georgia Institute of Technology
- Contact: thad@gatech.edu
- Mailing List: popsigngame@gmail.com
- Website: popsign.org
Funding Sources
Institution(s)
- Deaf Professional Arts Network
Funding or Grant Summary(ies)
DPAN (Deaf Professional Arts Network) is a non-profit. Funding for this project came through non-restrictive gifts to DPAN from both public and private entities.
Additional Notes: Georgia Tech contributed to this project through course work and volunteer efforts.
Dataset Overview
Data Subject(s)
- Sensitive Data about people
- Non-Sensitive Data about people
Dataset Snapshot
Category | Data |
---|---|
Size of Dataset | 1.1 TB |
Total Number of Videos | 200,686 |
Number of Game Videos | 165198 |
Total Number of Signs | 250 |
Total Number of Signers | 47 |
Average Videos Per Sign | 803 |
Number of Video Quality Categories | 3 |
Content Description
This dataset was collected from October 2022 to March 2023. Videos were sorted into three categories: game (videos which only contained the sign intended to be used for the PopSign game), unrecognizable (videos which clearly did not correspond to any sign and are not included), and variant (videos which contained signs that did not match the game sign). This dataset was collected from October 2022 to March 2023. Videos were sorted into three categories: game (videos which only contained the sign intended to be used for the PopSign game), unrecognizable (videos which clearly did not correspond to any sign and are not included), and variant (videos which contained signs that did not match the game sign).
Descriptive Statistics
Statistic | Game Videos Per Sign | Variant Videos Per Sign |
---|---|---|
count | 250 | 250 |
mean | 660 | 141 |
std | 172 | 160 |
min | 96 | 0 |
25% | 557 | 24 |
50% | 727 | 85 |
75% | 807 | 267 |
max | 868 | 679 |
Sensitivity of Data
Sensitivity Type(s)
- User Content
- User Metadata
- User Activity Data
- Identifiable Data
- S/PII
Field(s) with Sensitive Data
Intentional Collected Sensitive Data
(S/PII were collected as a part of the dataset creation process.)
Field Name | Description |
---|---|
Participant Video | Video of participant (upper body captured) |
Participant Sign | Video of participant performing isolated sign gestures |
Security and Privacy Handling
Method: Participants were given a consent form. They were only allowed to record after providing consent for the following:
“The app will collect video and photographic images of Your face, torso, hands, and whatever is in view of the camera(s) along with associated camera metadata (such as color correction, focal length, etc.). … Beyond the video, the following data may be recorded:
- The details of each Task, such as the type of Task that was done, signing certain words, or performing specific actions as instructed
- Date and time information associated with the Tasks
- Self identified gender
- Self identified age range
- Self-identified ethnicity
- Self assessed sign language proficiency
- Signing style information (such as general location where You learned, type of sign learned, age range when you started learning, signing community You are most closely associated with, etc)
As described earlier, if you consent, we will use photos or video clips where your face can be identified. We may use identifiable photos or video clips of you in written or oral presentations about this work and in publicly available on-line databases.”
Risk Type(s)
- Direct Risk
- Residual Risk
Risk(s) and Mitigation(s)
The direct risk involves participants’ visual features (their face and body) being linked to their full name. To mitigate this risk, we use anonymized user IDs to identify users. There is still some residual risk. Participants may still be identified using their faces alone. This risk is unavoidable with video data. We have participants sign consent forms acknowledging that they are creating a dataset intended for public use.
Dataset Version and Maintenance
Maintenance Status
Regularly Updated - New versions of the dataset have been or will continue to be made available.
Version Details
Current Version: 1.0
Release Date: Sept 2023
Maintenance Plan
Versioning: Major updates will be released as a new version, incremented to the nearest tenth from the previous version. For example, if the current version is between 1.0 and 1.09, then a major update will be released as version 1.1. Major updates include the addition of new users and/or new signs. Minor updates are covered below.
Updates: If there are missing/extraneous/erroneous videos (error cases described below), any fixes will be released as a new version, incremented by 0.01. E.g. if the current version is 1.0, then any minor updates will be released as 1.01.
Errors: Errors in the dataset include incorrectly labeled videos, missing videos, or extraneous videos. Missing videos include videos that the participant recorded but weren’t included in the final release. Extraneous videos include videos that only have a partial sign or no sign at all, but were included in the final dataset.
Feedback: We will accept feedback via our group email, popsigngame@gmail.com.
Next Planned Update(s)
Version affected: 1.0
Next data update: TBD
Next version: 1.1
Next version update: TBD
Expected Change(s)
Updates to Dataset: The current dataset includes 250 signs from the MacArthur Bates CDI. We plan to include an additional 313 signs in the new version. It will be recorded by a new set of participants.
Data Points
Primary Data Modality
- Video Data
Typical Data Point
A typical data point includes only the full sign motion, with little to no empty space (i.e. no motion) at the beginning or the end of the video. The full sign must be completed within roughly 1 or 2 seconds and a full view of the signing motion must be included. The sign must be the example variant provided in our in house ASL Capture app.
Atypical Data Point
An atypical data point may include a lot of empty space (i.e. moments without any motion) at the beginning or end of the video. The full sign may take longer than 1 or 2 seconds to complete. The beginning or the end of the sign may be obscured by poor camera framing, though the sign should still be recognizable. Atypical data points also include signs that are alternate variants or are fingerspelled, rather than the example sign variant provided by the in house ASL Capture App.
Motivations & Intentions
Motivations
Purpose(s)
- Research
- Production
- Education
Domain(s) of Application
Educational Technology
, Accessibility
, Sign Language Recognition
, Machine Learning
, Computer Vision
Motivating Factor(s)
- Teaching Sign
- Developing Educational Technology
- Developing Sign Language Recognition
95% of deaf children are born to hearing parents. The communication barrier can cause language deficiencies and cognitive issues in these children. We want to close the gap by developing interactive technologies using sign language recognition, to actively teach hearing parents sign.
Our dataset is collected on mobile phones and is designed to facilitate sign language recognition in mobile games to interactively teach sign. We also hope our example will bring wider focus on the use of interactive machine learning in general to improve educational technology.
Intended Use
Dataset Use(s)
- Safe for research use
Suitable Use Case(s)
Suitable Use Case: Isolated Sign Language Recognition
Suitable Use Case: Isolated Sign Language Recognition for mobile phone applications and games
In general, the data can be used for isolated sign language recognition and related downstream applications.
Unsuitable Use Case(s)
Unsuitable Use Case: Continuous Sign Language Recognition
Unsuitable Use Case: Sign to English Translation
The data is not intended for use in continuous sign language recognition, or sign language to English translation.
Access, Rentention, & Wipeout
Access
Access Type
- External - Open Access
Documentation Link(s)
- Dataset Website URL: https://signdata.cc.gatech.edu
Retention
Duration
The dataset will be available for at least 5 years, but we have no plans to retire the dataset
Wipeout and Deletion
Policy
We do not have plans to retire the dataset, so there is no deletion policy/procedure
Provenance
Collection
Method(s) Used
We collected data from our mobile recording app, the ASL Capture App. The app presented 10 signs for recording in a single recording session. The entire session was captured on video, but the sign recordings happened during specific time intervals. The participants were presented a sign to record. They then tapped and held a record button to record themselves signing. The timestamps corresponding to the recording intervals for each sign were saved in a separate file.
Methodology Detail(s)
Collection Method: ASL Capture App
Platform: [Platform Name], Google Pixel 4a
Dates of Collection: [10 2022 - 02 2023]
Primary modality of collection data:
- Video Data
Update Frequency for collected data:
- Static
Source Description(s)
Participants were recruited by DPAN
Collection Cadence
Static: Data was collected once from single or multiple sources.
Data Processing
Collection Method: ASL Capture App
Description: We split session recordings from the ASL Capture App using a python. The resulting split videos were named following this convention: “<participant_id>-
Tools or libraries: Python, FFMPEG
Collection Criteria
Data Selection
- We selected data based on whether a sign was recognizable or not, i.e. the sign could be clearly made out from the video. We further categorized whether the sign presented was the example sign we (the data collectors) intended for the participant to sign, or another variant.
Data Inclusion
- We included any videos that contained a discernible sign and most of the participant’s sign was in frame.
Data Exclusion
- We excluded any videos that did not contain a full sign from the participant. We also excluded videos where the participant was out of frame so as to make the sign unrecognizable.
Human and Other Sensitive Attributes
Sensitive Human Attribute(s)
- Gender
- Geography
- Language
- Age
- Culture
- Experience or Seniority
Intentionality
Intentionally Collected Attributes
Human attributes were labeled or collected as a part of the dataset creation process.
Field Name | Description |
---|---|
Sign Language | Participant’s signing style and dialect |
Gender | Participant’s Gender |
Age Range | Participant’s Age |
Additional Notes: By providing isolated sign data, participants provided information about their signing style and preferred dialect.
Unintentionally Collected Attributes
Human attributes were not explicitly collected as a part of the dataset creation process but can be inferred using additional methods.
Field Name | Description |
---|---|
Geography | Participant’s geographic location |
Culture | Participant’s Ethnic Background |
Seniority | Participant’s signing proficiency |
Additional Notes: We did not intentionally collect the attributes listed above, but they may be (incorrectly) inferred from the videos. For instance, videos may suggestive of the participant’s age or their signing proficiency. Such inferences may be incorrect since may of these attributes cannot be determined using visual cues alone and may depend on the participant’s self identification.
Rationale
We intended to collect isolated sign language data; hence videos of the participant’s signing were collected. The collected attributes (both intentional and unintentional) may be inferred (though not always accurately) from the videos.
Risk(s) and Mitigation(s)
The direct risk with this type of video data is with the participant’s identity being revealed. For this reason, we use anonymous identifiers. There is still some residual risk with the participant being identified through their faces alone. These participants have signed a consent form (given in the Data Sensitivity section) to address this concern.
Annotations & Labeling
Annotation Workforce Type
- Annotation Target in Data
Annotation Characteristic(s)
Annotation Type | Number |
---|---|
Number of unique annotations | 250 |
Total number of annotations | 210,598 |
Average annotations per example | 1 |
Annotation Description(s)
Description: Annotations were automatically generated with the data, since participants were prompted to record specific signs in each sessions. These sign labels served as the target for the Isolated Sign Language Recognition problem.
Annotation Distribution(s)
Sign | Count |
---|---|
call (on phone) | 869 |
hear | 865 |
pen | 864 |
man | 863 |
hen | 861 |
Above: We provide the top 5 sign annotations that occur in the dataset. Note that these counts do not include cases where the sign was unrecognizable in the video (these are post-validation counts). To understand our validation procedure, see the next section.
Validation Types
Method(s)
- Annotation/Label Validation
Description(s)
(Validation Type)
Method: Each video was validated by a team of reviewers. Reviewers would first check whether the video contained a discernible sign and then check whether the sign was the example variant we provided. Only those videos with any discernible sign are kept, while the videos with different sign variants have been preserved for linguistic analysis.
Platforms, tools, or libraries:
- Python, openCV
Description of Human Validators
Characteristic(s)
(Validation Type)
- Unique validators: 15
- Number of examples per validator: 14,774
- Training provided: Yes
- Expertise required: No
Description(s)
(Validation Type)
Training provided: We trained validators in watching videos for discernible signs and in detecting the variants. We also trained validators in how to use the tool.
Validator selection criteria: We selected validators who had interest in sign language research and who were technically proficient enough to use the Validation tool.
Gender(s)
(Validation Type)
- Identifies as Male (60%)
- Identifies as Female (40%)
Known Applications & Benchmarks
ML Application(s)
- Isolated Sign Language Recognition
- Mobile Applications using Sign Language Recognition
Evaluation Process(es)
We train an LSTM model designed to output a label (one of the 250 signs) on the training set and then compute accuracies on the validation/test sets.
Evaluation Result(s)
Evaluation Results
- Accuracy (over Total Videos): 82% (Val), 84% (Test)
We provide accuracy in which the denominator consists of all of the originally recorded videos. For a small percent of videos, MediaPipe did not generate features. Note that during PopSign gameplay, players naturally ensure that the hand tracking is showing the overlay skeleton on their hands before signing (i.e., the players are active participants in trying to get the recognition to work). Thus, for the current application, using accuracy measures for the files with Mediapipe Hands features gives a better sense of the accuracy expected during gameplay.