Gollapalli, S., Ang, B. & Ng, S. K. Identifying early maladaptive schemas from mental health question texts. In The 2023 Conference on Empirical Methods in Natural Language Processing (2023).
Zhou, W. et al. Identifying rare circumstances preceding female firearm suicides: Validating a large language model approach. JMIR Mental Health 10, e49359 (2023).
Google Scholar
Olteanu, A., Castillo, C., Diaz, F. & Kıcıman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Front. Big Data 2, 13 (2019).
Google Scholar
Navigli, R., Conia, S. & Ross, B. Biases in large language models: Origins, inventory, and discussion. ACM J. Data Inf. Quality 15, 1–21 (2023).
Chang, Y. et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 5(3), 1–45 (2023).
Google Scholar
Obradovich, N. et al. Opportunities and risks of large language models in psychiatry. NPP-Digital Psychiatry Neurosci. 2, 219–221 (2024).
Jin, Y. et al. The applications of large language models in mental health: Scoping review. J. Med. Internet Res. 27, e69284 (2025).
Google Scholar
Trifu, R. N. & NEMEŞ, B., Bodea-Hategan, C. & Cozman, D.,.(2017) Linguistic indicators of language in major depressive disorder (MDD). An evidence based research. J. Evidence Based Psychother. 17, 105–128 .
OECD and European Union. Promoting mental health in Europe: Why and how. OECD iLibrary (2018).
Rush, A. J., Carmody, T. & Reimitz, P.-E. The inventory of depressive symptomatology (IDS): Clinician (IDS-C) and self-report (IDS-SR) ratings of depressive symptoms. Int. J. Methods Psychiatric Res 9, 45–59 (2000).
World Health Organization. Depressive disorder (depression). (2023). Accessed: June. 15, 2025.
Cummins, N. et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015).
Zimmermann, J., Brockmeyer, T., Hunn, M., Schauenburg, H. & Wolf, M. First-person pronoun use in spoken language as a predictor of future depressive symptoms: Preliminary evidence from a clinical sample of depressed patients. Clin. Psychol. Psychother. 24, 384–391 (2017).
Google Scholar
Qasim, A., Mehak, G., Hussain, N., Gelbukh, A. & Sidorov, G. Detection of depression severity in social media text using transformer-based models. Information 16, 114 (2025).
Ohse, J. et al. Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection. Comput. Speech Lang. 88, 101663 (2024).
Zhang, X., Liu, H., Zhang, Q., Ahmed, B. & Epps, J. SpeechT-RAG: Reliable depression detection in LLMs with retrieval-augmented generation using speech timing information. arXiv preprint arXiv:2502.10950 (2025).
Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 1–67 (2020).
Google Scholar
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).
Muneton-Santa, G., Escobar-Grisales, D., López-Pabón, F. O., Pérez-Toro, P. A. & Orozco-Arroyave, J. R. Classification of poverty condition using natural language processing. Soc. Indicators Res. 162, 1413–1435 (2022).
Zhang, Y. et al. Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model. J. Affective Disorders 355, 40–49 (2024).
Hargittai, E. Potential biases in big data: Omitted voices on social media. Soc. Sci. Comput. Rev. 38, 10–24 (2020).
Benton, A., Mitchell, M. & Hovy, D. Multitask learning for mental health conditions with limited social media data. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017. Proceedings of Conference (Association for Computational Linguistics, 2017).
Sheng, E., Chang, K.-W., Natarajan, P. & Peng, N. Societal biases in language generation: progress and challenges. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4275–4293 (2021).
Timmons, A. C. et al. A call to action on assessing and mitigating bias in artificial intelligence applications for mental health. Perspect. Psychol. Sci. 18, 1062–1096 (2023).
Google Scholar
Matcham, F. et al. Remote assessment of disease and relapse in major depressive disorder (RADAR-MDD): A multi-centre prospective cohort study protocol. BMC Psych. 19, 1–11 (2019).
Cummins, N. et al. Multilingual markers of depression in remotely collected speech samples: A preliminary analysis. J. Affect. Disord. 341, 128–136 (2023).
Google Scholar
Kroenke, K. et al. The PHQ-8 as a measure of current depression in the general population. J. Affect. Disorders 114, 163–173 (2009).
Google Scholar
Radford, A. et al. Robust speech recognition via large-scale weak supervision. In International conference on machine learning, 28492–28518 (PMLR, 2023).
Márquez, A. S. whisper-large-v3-turbo-es. (2024). Fine-tuned Whisper large-v3 for Spanish on Common Voice.
Bălan, D. A., Ordelman, R. J. & Truong, K. P. Evaluating the state-of-the-art automatic speech recognition systems for Dutch. In 34th Meeting of Computational Linguistics in The Netherlands, CLIN 2024 (2024).
Devlin, J., Chang, M., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186, (Association for Computational Linguistics, Minneapolis, Minnesota, 2019).
Han, L. & Zhang, X. Depression tendency detection method based on multi-feature fusion of BERT word embeddings. In International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2023), vol. 12941, 525–530 (SPIE, 2023).
Han, Z. et al. Spatial-temporal feature network for speech-based depression recognition. IEEE Trans. Cognitive Dev. Syst. 16(1), 308–318 (2023).
Wu, W., Zhang, C. & Woodland, P. C. Self-supervised representations in speech-based depression detection. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
Zaman, A. et al. A multilevel depression detection from twitter using fine-tuned RoBERTa. In 2023 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD) , 280–284 (IEEE, 2023).
Rasipuram, S., Bhat, J. H., Maitra, A., Shaw, B. & Saha, S. Multimodal depression detection using task-oriented transformer-based embedding. In 2022 IEEE Symposium on Computers and Communications (ISCC) , 01–04 (IEEE, 2022).
Yang, K. et al. MentaLLaMA: Interpretable mental health analysis on social media with large language models. Proce. ACM Web Confer. 2024, 4489–4500 (2024).
Lou, R. et al. Muffin: Curating multi-faceted instructions for improving instruction following. In The Twelfth International Conference on Learning Representations (2023).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022).
Mishra, S., Khashabi, D., Baral, C. & Hajishirzi, H. Cross-task generalization via natural language crowdsourcing instructions. In ACL (2022).
Wang, Y. et al. Super-NaturalInstructions:Generalization via Declarative Instructions on 1600+ Tasks. In EMNLP (2022).
Schäfer, R. CommonCOW: massively huge web corpora from CommonCrawl data and a method to distribute them freely under restrictive EU copyright laws. In Proceedings of the Tenth International Language Resources and Evaluation Conference (LREC’16), 4500–4504 (2016).
Ji, S. et al. MentalBERT: Publicly available pretrained language models for mental healthcare. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (LREC’22) , 7184–7190 (2022).
Reese, S., Boleda, G., Cuadros, M., Padró, L. & Rigau, G. Wikicorpus: A word-sense disambiguated multilingual wikipedia corpus. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), (2010).
Touvron, H. et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst., 30 (2017).
Liu, Z., Lin, W., Shi, Y. & Zhao, J. A robustly optimized BERT pre-training approach with post-training. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, 1218–1227 (2021).
Purba, A. K. et al. Social media use and health risk behaviours in young people: Systematic review and meta-analysis. bmj 383, e073552 (2023).
Google Scholar
Pew Research Center: Internet, S. . T. Teens, social media and technology 2023 (December 11, 2023). Accessed on March 10, 2024.
Albert, P. R. Why is depression more prevalent in women?. J. Psych. Neurosci. 40(4), 219–221 (2015).
Cross, J. L., Choma, M. A. & Onofrey, J. A. Bias in medical AI: Implications for clinical decision-making. PLOS Digital Health 3, e0000651 (2024).
Google Scholar
Shin, D., Kim, H., Lee, S., Cho, Y. & Jung, W. Using large language models to detect depression from user-generated diary text data as a novel approach in digital mental health screening: Instrument validation study. J. Med. Int. Res. 26, e54617 (2024).
Koudounas, A., Giobergia, F., Pastor, E. & Baralis, E. A contrastive learning approach to mitigate bias in speech models. In Interspeech 2024, 827–831, (2024).
link
