. 2019 Nov 22:13:1267.

doi: 10.3389/fnins.2019.01267. eCollection 2019.

Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices

Christian Herff^{1

2}, Lorenz Diener², Miguel Angrick², Emily Mugler³, Matthew C Tate⁴, Matthew A Goldrick⁵, Dean J Krusienski⁶, Marc W Slutzky^{3

7

8}, Tanja Schultz²

Affiliations

¹ School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands.
² Cognitive Systems Lab, University of Bremen, Bremen, Germany.
³ Department of Neurology, Northwestern University, Chicago, IL, United States.
⁴ Department of Neurosurgery, Northwestern University, Chicago, IL, United States.
⁵ Department of Linguistics, Northwestern University, Chicago, IL, United States.
⁶ Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States.
⁷ Department of Physiology, Northwestern University, Chicago, IL, United States.
⁸ Department of Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, United States.

PMID: 31824257
PMCID: PMC6882773
DOI: 10.3389/fnins.2019.01267

Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices

Christian Herff et al. Front Neurosci. 2019.

. 2019 Nov 22:13:1267.

doi: 10.3389/fnins.2019.01267. eCollection 2019.

Authors

Christian Herff^{1

2}, Lorenz Diener², Miguel Angrick², Emily Mugler³, Matthew C Tate⁴, Matthew A Goldrick⁵, Dean J Krusienski⁶, Marc W Slutzky^{3

7

8}, Tanja Schultz²

Affiliations

¹ School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands.
² Cognitive Systems Lab, University of Bremen, Bremen, Germany.
³ Department of Neurology, Northwestern University, Chicago, IL, United States.
⁴ Department of Neurosurgery, Northwestern University, Chicago, IL, United States.
⁵ Department of Linguistics, Northwestern University, Chicago, IL, United States.
⁶ Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States.
⁷ Department of Physiology, Northwestern University, Chicago, IL, United States.
⁸ Department of Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, United States.

PMID: 31824257
PMCID: PMC6882773
DOI: 10.3389/fnins.2019.01267

Abstract

Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.

Keywords: BCI; ECoG; brain-computer interface; brain-to-speech; speech; synthesis.

PubMed Disclaimer

Figures

**Figure 1**
Experimental Setup: ECoG and audible speech (light blue) were measured simultaneously while participants read words shown on a computer screen. We recorded ECoG data on inferior frontal (green), premotor (blue), and motor (purple) cortices.

**Figure 2**
Electrode grid positions for all six participants. Grids always covered areas in inferior frontal gyrus pars opercularis (IFG, green), ventral premotor cortex (PMv, blue), and ventral motor cortex (M1v, purple).

**Figure 3**
Speech Generation Approach: For each window of high gamma activity in the test data (top left), the cosine similarity to each window in the training data (center bottom) was computed. The window in the training data that maximized the cosine similarity was determined and the corresponding speech unit (center top) was selected. The resulting overlapping speech units (top right) were combined using Hanning windows to form the generated speech output (bottom right). Also see Supplementary Video 1.

**Figure 4**
Generation example: Examples of actual (top) and generated (bottom) audio waveforms **(A)** and spectrograms **(B)** of seven words spoken by participant 5. Similarities between the generation and actual speech are striking, especially in the spectral domain **(B)**. These generated examples can be found in the Supplementary Audio 1.

**Figure 5**
Performance of our generation approach. **(A)** Correlation coefficients between the spectrograms of original and generated audio waveforms for the best (purple) and average (green) participant. While all regions yielded better than randomized results on average, M1v provided most information for our reconstruction process. **(B)** Results of listening test with 55 human listeners. Accuracies in the 4-option forced intelligibility test were above chance level (25%, dashed line) for all listeners.

**Figure 6**
Detailed decoding results. **(A)** Correlations between original and reconstructed spectrograms (melscaled) for all participants and electrode locations. Stars indicate significance levels (^* Larger than 95% of random activations, ^*** Larger than 99.9% of random activations). M1v contains most information for our decoding approach. **(B)** Detailed results for best participant using all electrodes and the entire temporal context (blue) and only using activity prior to the current moment (cyan) across all frequency coefficients. Shaded areas denote 95% confidence intervals. Reconstruction is reliable across all frequency ranges and above chance level (maximum of all randomizations, red) for all frequency ranges.

See this image and copyright information in PMC

Cited by

Iterative alignment discovery of speech-associated neural activity.
Rabbani Q, Shah S, Milsap G, Fifer M, Hermansky H, Crone N. Rabbani Q, et al. J Neural Eng. 2024 Aug 28;21(4):046056. doi: 10.1088/1741-2552/ad663c. J Neural Eng. 2024. PMID: 39194182 Free PMC article.
A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages.
Silva AB, Liu JR, Metzger SL, Bhaya-Grossman I, Dougherty ME, Seaton MP, Littlejohn KT, Tu-Chan A, Ganguly K, Moses DA, Chang EF. Silva AB, et al. Nat Biomed Eng. 2024 Aug;8(8):977-991. doi: 10.1038/s41551-024-01207-5. Epub 2024 May 20. Nat Biomed Eng. 2024. PMID: 38769157
Representation of internal speech by single neurons in human supramarginal gyrus.
Wandelt SK, Bjånes DA, Pejsa K, Lee B, Liu C, Andersen RA. Wandelt SK, et al. Nat Hum Behav. 2024 Jun;8(6):1136-1149. doi: 10.1038/s41562-024-01867-y. Epub 2024 May 13. Nat Hum Behav. 2024. PMID: 38740984 Free PMC article.
A flexible intracortical brain-computer interface for typing using finger movements.
Shah NP, Willsey MS, Hahn N, Kamdar F, Avansino DT, Fan C, Hochberg LR, Willett FR, Henderson JM. Shah NP, et al. bioRxiv [Preprint]. 2024 Apr 26:2024.04.22.590630. doi: 10.1101/2024.04.22.590630. bioRxiv. 2024. PMID: 38712189 Free PMC article. Preprint.
Online speech synthesis using a chronically implanted brain-computer interface in an individual with ALS.
Angrick M, Luo S, Rabbani Q, Candrea DN, Shah S, Milsap GW, Anderson WS, Gordon CR, Rosenblatt KR, Clawson L, Tippett DC, Maragakis N, Tenore FV, Fifer MS, Hermansky H, Ramsey NF, Crone NE. Angrick M, et al. Sci Rep. 2024 Apr 26;14(1):9617. doi: 10.1038/s41598-024-60277-2. Sci Rep. 2024. PMID: 38671062 Free PMC article.

See all "Cited by" articles

References

1. Akbari H., Khalighinejad B., Herrero J. L., Mehta A. D., Mesgarani N. (2019). Towards reconstructing intelligible speech from the human auditory cortex. Sci. Rep. 9:874. 10.1038/s41598-018-37359-z - DOI - PMC - PubMed
1. Angrick M., Herff C., Mugler E., Tate M. C., Slutzky M. W., Krusienski D. J., et al. . (2019). Speech synthesis from ecog using densely connected 3d convolutional neural networks. J. Neural Eng. 16:036019. 10.1088/1741-2552/ab0c59 - DOI - PMC - PubMed
1. Anumanchipalli G. K., Chartier J., Chang E. F. (2019). Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498. 10.1038/s41586-019-1119-1 - DOI - PMC - PubMed
1. Black A. W., Taylor P. A. (1997). Automatically clustering similar units for unit selection in speech synthesis. EUROSPEECH (Rhodes: ), 601–604.
1. Bouchard K. E., Mesgarani N., Johnson K., Chang E. F. (2013). Functional organization of human sensorimotor cortex for speech articulation. Nature 495, 327–332. 10.1038/nature11911 - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices

Affiliations

Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous