Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. & Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41, 391–407 (1990).
Google Scholar
Alghamdi, R. & Alfalqi, K. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6 (2015).
Boyd-Graber, J. et al. Applications of topic models. Foundations Trends Inform. Retrieval 11, 143–296 (2017).
Google Scholar
Blei, D. M. Probabilistic topic models. Commun. ACM 55, 77–84 (2012).
Google Scholar
Wu, X., Dong, X., Nguyen, T. T. & Luu, A. T. Effective neural topic modeling with embedding clustering regularization. In International Conference on Machine Learning, 37335–37357 (PMLR, 2023).
Paul, M. J. & Dredze, M. Discovering health topics in social media using topic models. PloS one 9, e103408 (2014).
Google Scholar
Nguyen, T., Phung, D., Dao, B., Venkatesh, S. & Berk, M. Affective and content analysis of online depression communities. IEEE Trans. Affective Comput. 5, 217–226 (2014).
Google Scholar
Nguyen, T. et al. Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools Appl. 76, 10653–10676 (2017).
Google Scholar
Seo, H. & Song, M. An analysis of the discourse topics of users who exhibit symptoms of depression on social media. J. Korean Soc. Inform. Manage. 36, 207–226 (2019).
Sik, D., Németh, R. & Katona, E. Topic modelling online depression forums: beyond narratives of self-objectification and self-blaming. J. Mental Health 32, 386–395 (2023).
Google Scholar
Liu, Y. et al. Monitoring covid-19 pandemic through the lens of social media using natural language processing and machine learning. Health Inform. Sci. Syst. 9, 25 (2021).
Google Scholar
Yu, L. et al. Detecting changes in attitudes toward depression on chinese social media: a text analysis. J. Affective Disorders 280, 354–363 (2021).
Google Scholar
Chandrasekaran, R., Kotaki, S. & Nagaraja, A. H. Detecting and tracking depression through temporal topic modeling of tweets: insights from a 180-day study. Npj Mental Health Res. 3, 1–10 (2024).
Google Scholar
Hoyle, A. et al. Is automated topic model evaluation broken? the incoherence of coherence. Adv. Neural Inform. Processing Syst. 34, 2018–2033 (2021).
Doogan, C. & Buntine, W. Topic model or topic twaddle? re-evaluating demantic interpretability measures. In: North American Association for Computational Linguistics 2021, 3824–3848 (Association for Computational Linguistics (ACL), 2021).
Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. J. Mach, Learning Res. 3, 993–1022 (2003).
Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure (2022). ArXiv:2203.05794 [cs].
Meng, Y., Zhang, Y., Huang, J., Zhang, Y. & Han, J. Topic discovery via latent space clustering of pretrained language model representations. In Proceedings of the ACM Web Conference 2022, 3143–3152 (2022).
Losada, D. E., Crestani, F. & Parapar, J. erisk 2017: Clef lab on early risk prediction on the internet: experimental foundations. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11–14, 2017, Proceedings 8, 346–360 (Springer, 2017).
Losada, D. E., Crestani, F. & Parapar, J. Overview of erisk: early risk prediction on the internet. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 9th International Conference of the CLEF Association, CLEF 2018, Avignon, France, September 10-14, 2018, Proceedings 9, 343–361 (Springer, 2018).
Parapar, J., Martín-Rodilla, P., Losada, D. E. & Crestani, F. eRisk 2022: pathological gambling, depression, and eating disorder challenges. In: European Conference on Information Retrieval, 436–442 (Springer, 2022).
Paatero, P. & Tapper, U. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994).
Google Scholar
Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999) (Publisher: Nature Publishing Group.).
Google Scholar
Blei, D. M. & Lafferty, J. D. A correlated topic model of science. Ann. Appl. Statistics 1, 17–35 (2007).
Google Scholar
Li, W. & McCallum, A. Pachinko allocation: Dag-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on Machine learning, 577–584 (2006).
Blei, D. M. & Lafferty, J. D. Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning, 113–120 (2006).
Bianchi, F., Terragni, S. & Hovy, D. Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. Preprint at arXiv:2004.03974 (2020).
Reimers, N. Sentence-BERT: Sentence embeddings using siamese bert-networks. Preprint at arXiv:1908.10084 (2019).
Miao, Y., Yu, L. & Blunsom, P. Neural variational inference for text processing. In: International conference on machine learning, 1727–1736 (PMLR, 2016).
Srivastava, A. & Sutton, C. Autoencoding variational inference for topic models. Preprint at arXiv:1703.01488 (2017).
Xu, Y. et al. Hyperminer: Topic taxonomy mining with hyperbolic embedding. Adv. Neural Inform. Processing Syst. 35, 31557–31570 (2022).
Wang, D. et al. Representing mixtures of word embeddings with mixtures of topic embeddings. Preprint at arXiv:2203.01570 (2022).
Angelov, D. Top2vec: Distributed representations of topics. Preprint at arXiv:2008.09470 (2020).
Wu, X., Nguyen, T., Zhang, D. C., Wang, W. Y. & Luu, A. T. Fastopic: A fast, adaptive, stable, and transferable topic modeling paradigm. Preprint atarXiv:2405.17978 (2024).
AlSumait, L., Barbará, D., Gentle, J. & Domeniconi, C. Topic significance ranking of LDA generative models. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I 20, 67–82 (Springer, 2009).
Newman, D., Lau, J. H., Grieser, K. & Baldwin, T. Automatic evaluation of topic coherence. In: Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics, 100–108 (2010).
Mimno, D., Wallach, H., Talley, E., Leenders, M. & McCallum, A. Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing, 262–272 (2011).
Aletras, N. & Stevenson, M. Evaluating topic coherence using distributional semantics. In Koller, A. & Erk, K. (eds.) Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013), 13–22 (Association for Computational Linguistics, Potsdam, Germany, 2013).
Röder, M., Both, A. & Hinneburg, A. Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on Web search and data mining, 399–408 (2015).
Nikolenko, S. I. Topic quality metrics based on distributed word representations. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 1029–1032 (2016).
Fang, A., Macdonald, C., Ounis, I. & Habel, P. Using word embedding to evaluate the coherence of topics from twitter data. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 1057–1060 (2016).
Rahimi, H. et al. Contextualized topic coherence metrics. Preprint at arXiv:2305.14587 (2023).
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. & Blei, D. Reading tea leaves: How humans interpret topic models. Adv. Neural Inform. Processing Syst. 22 (2009).
Losada, D. E., Crestani, F. & Parapar, J. Overview of eRisk at CLEF 2019: Early risk prediction on the internet (extended overview). Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum 4, 21 (2019).
Parapar, J., Martín-Rodilla, P., Losada, D. E. & Crestani, F. Overview of erisk at clef 2021: Early risk prediction on the internet (extended overview).. Working Notes of CLEF 2021 Conference and Labs of the Evaluation Forum 1, 864–887 (2021).
Parapar, J., Martín-Rodilla, P., Losada, D. E. & Crestani, F. Overview of erisk 2023: Early risk prediction on the internet. In: International Conference of the Cross-Language Evaluation Forum for European Languages, 294–315 (Springer, 2023).
Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K. & Mitchell, M. Clpsych 2015 shared task: Depression and ptsd on twitter. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, 31–39 (2015).
Milne, D. N., Pink, G., Hachey, B. & Calvo, R. A. Clpsych 2016 shared task: Triaging content in online peer-support forums. In: Proceedings of the third workshop on computational linguistics and clinical psychology, 118–127 (2016).
Lynn, V. et al. CLPsych 2018 shared task: Predicting current and future psychological health from childhood essays. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 37–46 (2018).
Zirikly, A., Resnik, P., Uzuner, O. & Hollingshead, K. Clpsych 2019 shared task: Predicting the degree of suicide risk in reddit posts. In: Proceedings of the sixth workshop on computational linguistics and clinical psychology, 24–33 (2019).
Tsakalidis, A. et al. Overview of the CLPsych 2022 shared task: Capturing moments of change in longitudinal user posts. In: Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, 184–198 (2022).
Chim, J. et al. Overview of the CLPsych 2024 shared task: Leveraging large language models to identify evidence of suicidality risk in online posts. In: Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024), 177–190 (2024).
Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H. & Eichstaedt, J. C. Detecting depression and mental illness on social media: an integrative review. Current Opinion Behav. Sci. 18, 43–49 (2017).
Google Scholar
Chancellor, S. & De Choudhury, M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digital Med. 3, 43 (2020).
Google Scholar
Ríssola, E. A., Losada, D. E. & Crestani, F. A survey of computational methods for online mental state assessment on social media. ACM Trans. Comput. Healthcare 2, 1–31 (2021).
Google Scholar
Chen, X. & Genc, Y. A systematic review of artificial intelligence and mental health in the context of social media. In: International Conference on Human-Computer Interaction, 353–368 (Springer, 2022).
Crestani, F., Losada, D. E. & Parapar, J. Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years of the ERisk Project, 1018 (Springer Nature, 2022).
Ríssola, E. A., Parapar, J., Losada, D. E. & Crestani, F. A survey of the first five years of erisk: Findings and conclusions. In: Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years of the eRisk Project, 31–57 (Springer, 2022).
Shensa, A., Sidani, J. E., Dew, M. A., Escobar-Viera, C. G. & Primack, B. A. Social media use and depression and anxiety symptoms: A cluster analysis. Am. J. Health Behav. 42, 116–128 (2018).
Google Scholar
Aragón, M. E., López-Monroy, A. P. & Montes-y Gómez, M. INAOE-CIMAT at eRisk 2019: Detecting signs of anorexia using fine-grained emotions. In: Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum (2019).
De Choudhury, M., Gamon, M., Counts, S. & Horvitz, E. Predicting depression via social media. In: Proceedings of the international AAAI conference on web and social media 7(1), 128–137 (2013).
Google Scholar
Couto, M., Parapar, J. & Losada, D. E. Comparison of clustering algorithms for knowledge discovery in social media publications: A case study of mental health analysis. Procesamiento del lenguaje natural 73, 69–81 (2024).
Yang, P., Han, K. & Diesner, J. Topics, temporal patterns, and network characteristics of AI-related discourse on reddit. In: International Conference on Advances in Social Networks Analysis and Mining, 333–344 (Springer, 2024).
Kerasiotis, M., Ilias, L. & Askounis, D. Depression detection in social media posts using transformer-based models and auxiliary features. Soc. Netw. Analysis Mining 14, 196 (2024).
Google Scholar
Kanahuati-Ceballos, M. & Valdivia, L. J. Detection of depressive comments on social media using rnn, lstm, and random forest: comparison and optimization. Soc. Netw. Anal. Mining 14, 44 (2024).
Google Scholar
Naseem, U. et al. Incorporating historical information by disentangling hidden representations for mental health surveillance on social media. Soc. Netw. Anal. Mining 14, 9 (2023).
Google Scholar
He, M., Bakker, E. M. & Lew, M. S. Dpd (depression detection) net: a deep neural network for multimodal depression detection. Health Inform. Sci. Syst. 12, 53 (2024).
Google Scholar
Thushari, P. D. et al. Identifying discernible indications of psychological well-being using ml: explainable ai in reddit social media interactions. Soc. Netw. Anal. Mining 13, 141 (2023).
Google Scholar
Bao, E., Pérez, A. & Parapar, J. Explainable depression symptom detection in social media. Health Inform. Sci. Syst. 12, 47 (2024).
Google Scholar
Kherwa, P. & Bansal, P. Topic Modeling: A Comprehensive Review. EAI Endorsed Trans. Scalable Inform. Syst. 7 (2019).
Curiskis, S. A., Drake, B., Osborn, T. R. & Kennedy, P. J. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inform. Proces. Manag. 57, 102034 (2020).
Google Scholar
Blair, S. J., Bi, Y. & Mulvenna, M. D. Aggregated topic models for increasing social media topic coherence. Appl. Intell. 50, 138–156 (2020).
Google Scholar
Zhao, H. et al. Topic modelling meets deep neural networks: a survey. Preprint at arXiv:2103.00498 (2021).
Laureate, C. D. P., Buntine, W. & Linger, H. A systematic review of the use of topic models for short text social media analysis. Artif. Intell. Rev. 56, 14223–14255 (2023).
Google Scholar
McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. Preprint at arXiv:1802.03426 (2018).
Feldbauer, R. & Flexer, A. A comprehensive empirical comparison of hubness reduction in high-dimensional spaces. Knowl. Inform. Syst. 59, 137–166 (2019).
Google Scholar
Sarkar, S. & Ghosh, A. K. On perfect clustering of high dimension, low sample size data. Preprint at arXiv:1612.09121 (2016).
Peng, D., Gui, Z. & Wu, H. Interpreting the curse of dimensionality from distance concentration and manifold effect. Preprint at arXiv:2401.00422 (2024).
Pestov, V. On the geometry of similarity search: dimensionality curse and concentration of measure. Preprint at cs/9901004 (1999).
McInnes, L. et al. HDBSCAN: Hierarchical density based clustering. J. Open Sourc. Softw. 2, 205 (2017).
Google Scholar
Losada, D. E., Crestani, F. & Parapar, J. eRisk 2020: Self-harm and depression challenges. In: European conference on information retrieval, 557–563 (Springer, 2020).
Hoyle, A., Goel, P. & Resnik, P. Improving neural topic models using knowledge distillation. Preprint at arXiv:2010.02377 (2020).
Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971).
Google Scholar
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977) (Standard reference for interpreting Kappa levels.).
Google Scholar
McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012).
Google Scholar
Fernández-Pichel, M., Aragón, M. E., Saborido-Patiño, J. & Losada, D. E. Personality trait analysis during the covid-19 pandemic: a comparative study on social media. J. Intell. Inform. Syst. 62, 117–142 (2023).
Google Scholar
link
