What if Speech Could Unlock Early Detection of Alzheimer’s Disease?

In the early stages, Alzheimer's symptoms can be subtle and mistaken for normal age-related cognitive changes. While many people and their families may not recognize the signs as indicative of a serious cognitive disorder, early diagnosis can result in better quality of life and more effective treatment for those with Alzheimer’s disease.

The key to unlocking early diagnosis of Alzheimer’s disease may lie in our speech. Ongoing advances and research have shown subtle changes in speech patterns may precede noticeable cognitive symptoms. Furthermore, the ubiquity of smartphones, tablets and smart-home devices offers the potential to capture speech data that could be used to predict or monitor cognitive decline passively and securely without the need for a visit to the doctor’s office or invasive testing.

SpeechDx is working to enable speech biomarkers

SpeechDx is a pre-competitive initiative of leading researchers from academia and industry. Our mission is to facilitate the development of speech-based biomarkers to identify potential Alzheimer’s disease and related dementias (ADRD) early on. We are building a broad and deep dataset of speech data across multiple languages that is longitudinal and linked to individuals who are clinically well-characterized. This initiative will provide researchers with high-quality, harmonized voice and clinical data to develop early and accurate diagnostic technology for ADRD.

Our vision of speech biomarkers to transform Alzheimer’s care

By building the world’s largest repository of longitudinal, harmonized speech and clinical data, SpeechDx is working to overcome the challenges and realize the promise of speech biomarkers. We envision a future where we can illuminate the path to accurate early detection of ADRD and ultimately, better care for those with Alzheimer’s disease.

Contact us to join the initiative

SpeechDx is building the world’s largest dataset for developing and validating ADRD speech biomarkers. Contact us to join the partnership and gain access to SpeechDx data.

speechdxteam@alzdiscovery.org

About the SpeechDx Study

The SpeechDx study is a multi-site, observational study of 2,650 participants across several global clinical sites (details below). SpeechDx participants are diverse, well-characterized, and span the brain health spectrum. On a quarterly basis, each participant generously provides three years of high-quality speech data, and this speech data is paired with in-depth clinical data and harmonized across SpeechDx partner sites.

SpeechDx Participant Details

Site	Principal Investigator	Primary Language	Cognitively Normal	Subjective Cognitive Decline	Mild Cognitive Impairment	Alzheimer's disease	Other (Frontotemporal degeneration, Parkinson's disease, not yet classified)	Total
Boston University	Dr. Rhoda Au	English	27	74	50	22	27	200
Emory University	Dr. Felicia Goldstein	English	225		113	45	67	450
Barcelona Brain Health Initiative	Dr. Javier Solana Sanchez	Spanish, Catalan	360	140				500
Barcelona Beta Brain Research Centre	Dr. Andreea Rădoi	Spanish, Catalan	141	249	10	0	0	400
Bogalusa Heart Study	Dr. Ileana De Anda-Duran	English	300	80	20			400
Ace Alzheimer Center	Dr. Sergi Valero Ventura	Spanish, Catalan		6	230	164		400
Other (TBA)	TBA	English	150		50	10	90	300
All			1203	549	473	241	184	2650

Data Collection and Harmonization

SpeechDx collects two types of data—speech data and clinical data. These two types of data are paired and harmonized across participating sites.

Speech data

Participants contribute up to three years of speech data quarterly via the SpeechDx app. The SpeechDx app battery comprises a total of nine speech-eliciting tasks and baseline assessments:

Baseline assessments

Questionnaire (PHQ-8)
Sleepiness scale (Karolinska)
Vigilance assessment (PVT)

Speech-eliciting tasks

Picture description 1
Open-ended questions (3x)
Picture description 1 recall
Picture description 2
Story recall task
Storytelling task
Storytelling task recall

Clinical data

SpeechDx participants are thoroughly characterized via regular clinical visits over the three-year duration of the study. Each SpeechDx participant undergoes thorough neuropsychological testing, MRI imaging, and has defined blood-biomarker amyloid status. Many participants additionally undergo PET imaging and have defined cerebrospinal fluid status.

Data journey

The privacy of our participating subjects is paramount, and SpeechDx protects and deidentifies all participant data.

Speech data will be collected from each subject via a study-provided tablet once every quarter for the duration of the study. This data will be encrypted, but may contain identifiable voice data (for example, if a subject discloses their name or other personal information in the recording). To remove any personal identifying information (PII), the encrypted voice data will be transferred to a secure backend server where any PII is manually spliced out of the recording, resulting in pseudonymized voice data free of any PII that may inadvertently have been shared by the participant. This voice data will then be securely transferred to the AD Curation Studio hosted by the Alzheimer’s Disease Data Initiative (ADDI).

Within the AD Curation Studio, clinical sites will also contribute their pseudonymized clinical data. This data will be paired with the corresponding voice data and harmonized across sites.

A committee will review and approve any proposed use of the data to ensure protection of this data and compliance with ethical and legal requirements. Approved researchers will receive access to a protected, controlled environment within the AD Curation Studio to access and analyze the full, harmonized SpeechDx dataset. Data is not downloadable or otherwise exported from the AD Curation Studio, and no audio reproduction software is allowed to be installed on the workspace.

Join Us to Become a SpeechDx Partner

Be part of groundbreaking Alzheimer's research! SpeechDx will make data available to qualified researchers and institutions who wish to develop speech-based biomarker tools for ADRD by applying to join the SpeechDx initiative.

Our SpeechDX Partners

FAQ

What is SpeechDx?

SpeechDx is a partnership of researchers from academia and industry dedicated to advancing speech biomarkers for early Alzheimer's disease diagnosis.

What is the goal of SpeechDx?

The goal of SpeechDx is to facilitate the development of accurate and early speech-related diagnostic technology for Alzheimer's disease and related dementias (ADRD).

Who is SpeechDx for?

SpeechDx is for researchers, companies and other entities interested in contributing to Alzheimer's disease diagnosis and research through speech biomarker development.

How do I participate in SpeechDx data?

To participate in SpeechDx, researchers and companies can apply to access the SpeechDx dataset for the purpose of speech biomarker development. SpeechDx is not currently enrolling additional participants wishing to contribute data.

How do I access SpeechDx data?

To participate in SpeechDx, researchers and companies can apply to access the SpeechDx dataset for the purpose of speech biomarker development. Applications will be assessed based on alignment with SpeechDx’s mission and to ensure that all ethical requirements are met. Given the large philanthropic investment made for this study, ADDF will negotiate investment terms in line with its venture philanthropy model of investing in exchange for non-exclusive access to this dataset.

SpeechDx is not currently recruiting for new subjects or participating sites, but we welcome you to get in touch with us for more information at speechdx@alzdiscovery.org.

What type of data will SpeechDx members get access to?

SpeechDx members will have access to high-quality, longitudinal paired speech and clinical data from approximately 2,600 individuals, which are harmonized across global sites.

How will patient data be protected and deidentified?

Patient data will be safeguarded, deidentified, and handled in accordance with ethical and privacy standards to ensure confidentiality. Speech data will be encrypted but may contain identifiable voice data. To remove any personal identifying information (PII), the encrypted voice data will be transferred to a secure backend where any PII is manually spliced out of the recording, resulting in pseudonymized voice data. This voice data will then be securely transferred to the AD Curation Studio hosted by the Alzheimer’s Disease Data Initiative (ADDI). Within the AD Curation Studio, clinical sites will also contribute their pseudonymized clinical data.

Can SpeechDx members use clinical data for non-speech analysis?

No – this data was collected for the specific purpose of speech-based analyses. Other use cases may violate the ethical conditions for data collection and therefore, will not be considered. Further details about use of this dataset can be obtained through SpeechDx’s guidelines and approvals.

When will data be available for analysis to SpeechDx partners?

The first set of data will become available to SpeechDx partners in Q1 2025. Data will then be released in batches quarterly as longitudinal data collection continues until the end of the study (anticipated in Q4 2027).