Exploiting topic analysis models to explore psychological dimensions in social media data

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. & Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41, 391–407 (1990).

Article

Google Scholar

Alghamdi, R. & Alfalqi, K. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6 (2015).

Boyd-Graber, J. et al. Applications of topic models. Foundations Trends Inform. Retrieval 11, 143–296 (2017).

Article

Google Scholar

Blei, D. M. Probabilistic topic models. Commun. ACM 55, 77–84 (2012).

Article

Google Scholar

Wu, X., Dong, X., Nguyen, T. T. & Luu, A. T. Effective neural topic modeling with embedding clustering regularization. In International Conference on Machine Learning, 37335–37357 (PMLR, 2023).

Paul, M. J. & Dredze, M. Discovering health topics in social media using topic models. PloS one 9, e103408 (2014).

Article
ADS
PubMed
PubMed Central

Google Scholar

Nguyen, T., Phung, D., Dao, B., Venkatesh, S. & Berk, M. Affective and content analysis of online depression communities. IEEE Trans. Affective Comput. 5, 217–226 (2014).

Article

Google Scholar

Nguyen, T. et al. Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools Appl. 76, 10653–10676 (2017).

Article

Google Scholar

Seo, H. & Song, M. An analysis of the discourse topics of users who exhibit symptoms of depression on social media. J. Korean Soc. Inform. Manage. 36, 207–226 (2019).

Google Scholar

Sik, D., Németh, R. & Katona, E. Topic modelling online depression forums: beyond narratives of self-objectification and self-blaming. J. Mental Health 32, 386–395 (2023).

Article
PubMed

Google Scholar

Liu, Y. et al. Monitoring covid-19 pandemic through the lens of social media using natural language processing and machine learning. Health Inform. Sci. Syst. 9, 25 (2021).

Article
PubMed
PubMed Central

Google Scholar

Yu, L. et al. Detecting changes in attitudes toward depression on chinese social media: a text analysis. J. Affective Disorders 280, 354–363 (2021).

Article
PubMed

Google Scholar

Chandrasekaran, R., Kotaki, S. & Nagaraja, A. H. Detecting and tracking depression through temporal topic modeling of tweets: insights from a 180-day study. Npj Mental Health Res. 3, 1–10 (2024).

Article

Google Scholar

Hoyle, A. et al. Is automated topic model evaluation broken? the incoherence of coherence. Adv. Neural Inform. Processing Syst. 34, 2018–2033 (2021).

Google Scholar

Doogan, C. & Buntine, W. Topic model or topic twaddle? re-evaluating demantic interpretability measures. In: North American Association for Computational Linguistics 2021, 3824–3848 (Association for Computational Linguistics (ACL), 2021).

Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. J. Mach, Learning Res. 3, 993–1022 (2003).

Google Scholar

Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure (2022). ArXiv:2203.05794 [cs].

Meng, Y., Zhang, Y., Huang, J., Zhang, Y. & Han, J. Topic discovery via latent space clustering of pretrained language model representations. In Proceedings of the ACM Web Conference 2022, 3143–3152 (2022).

Google Scholar

Losada, D. E., Crestani, F. & Parapar, J. erisk 2017: Clef lab on early risk prediction on the internet: experimental foundations. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11–14, 2017, Proceedings 8, 346–360 (Springer, 2017).

Losada, D. E., Crestani, F. & Parapar, J. Overview of erisk: early risk prediction on the internet. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 9th International Conference of the CLEF Association, CLEF 2018, Avignon, France, September 10-14, 2018, Proceedings 9, 343–361 (Springer, 2018).

Parapar, J., Martín-Rodilla, P., Losada, D. E. & Crestani, F. eRisk 2022: pathological gambling, depression, and eating disorder challenges. In: European Conference on Information Retrieval, 436–442 (Springer, 2022).

Paatero, P. & Tapper, U. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994).

Article

Google Scholar

Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999) (Publisher: Nature Publishing Group.).

Article
ADS
CAS
PubMed

Google Scholar

Blei, D. M. & Lafferty, J. D. A correlated topic model of science. Ann. Appl. Statistics 1, 17–35 (2007).

MathSciNet

Google Scholar

Li, W. & McCallum, A. Pachinko allocation: Dag-structured mixture models of topic correlations. In: Proceedings of the 23rd international conference on Machine learning, 577–584 (2006).

Blei, D. M. & Lafferty, J. D. Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning, 113–120 (2006).

Bianchi, F., Terragni, S. & Hovy, D. Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. Preprint at arXiv:2004.03974 (2020).

Reimers, N. Sentence-BERT: Sentence embeddings using siamese bert-networks. Preprint at arXiv:1908.10084 (2019).

Miao, Y., Yu, L. & Blunsom, P. Neural variational inference for text processing. In: International conference on machine learning, 1727–1736 (PMLR, 2016).

Srivastava, A. & Sutton, C. Autoencoding variational inference for topic models. Preprint at arXiv:1703.01488 (2017).

Xu, Y. et al. Hyperminer: Topic taxonomy mining with hyperbolic embedding. Adv. Neural Inform. Processing Syst. 35, 31557–31570 (2022).

Google Scholar

Wang, D. et al. Representing mixtures of word embeddings with mixtures of topic embeddings. Preprint at arXiv:2203.01570 (2022).

Angelov, D. Top2vec: Distributed representations of topics. Preprint at arXiv:2008.09470 (2020).

Wu, X., Nguyen, T., Zhang, D. C., Wang, W. Y. & Luu, A. T. Fastopic: A fast, adaptive, stable, and transferable topic modeling paradigm. Preprint atarXiv:2405.17978 (2024).

AlSumait, L., Barbará, D., Gentle, J. & Domeniconi, C. Topic significance ranking of LDA generative models. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I 20, 67–82 (Springer, 2009).

Newman, D., Lau, J. H., Grieser, K. & Baldwin, T. Automatic evaluation of topic coherence. In: Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics, 100–108 (2010).

Mimno, D., Wallach, H., Talley, E., Leenders, M. & McCallum, A. Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing, 262–272 (2011).

Aletras, N. & Stevenson, M. Evaluating topic coherence using distributional semantics. In Koller, A. & Erk, K. (eds.) Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013), 13–22 (Association for Computational Linguistics, Potsdam, Germany, 2013).

Röder, M., Both, A. & Hinneburg, A. Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on Web search and data mining, 399–408 (2015).

Nikolenko, S. I. Topic quality metrics based on distributed word representations. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 1029–1032 (2016).

Fang, A., Macdonald, C., Ounis, I. & Habel, P. Using word embedding to evaluate the coherence of topics from twitter data. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 1057–1060 (2016).

Rahimi, H. et al. Contextualized topic coherence metrics. Preprint at arXiv:2305.14587 (2023).

Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. & Blei, D. Reading tea leaves: How humans interpret topic models. Adv. Neural Inform. Processing Syst. 22 (2009).

Losada, D. E., Crestani, F. & Parapar, J. Overview of eRisk at CLEF 2019: Early risk prediction on the internet (extended overview). Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum 4, 21 (2019).

Google Scholar

Parapar, J., Martín-Rodilla, P., Losada, D. E. & Crestani, F. Overview of erisk at clef 2021: Early risk prediction on the internet (extended overview).. Working Notes of CLEF 2021 Conference and Labs of the Evaluation Forum 1, 864–887 (2021).

Google Scholar

Parapar, J., Martín-Rodilla, P., Losada, D. E. & Crestani, F. Overview of erisk 2023: Early risk prediction on the internet. In: International Conference of the Cross-Language Evaluation Forum for European Languages, 294–315 (Springer, 2023).

Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K. & Mitchell, M. Clpsych 2015 shared task: Depression and ptsd on twitter. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, 31–39 (2015).

Milne, D. N., Pink, G., Hachey, B. & Calvo, R. A. Clpsych 2016 shared task: Triaging content in online peer-support forums. In: Proceedings of the third workshop on computational linguistics and clinical psychology, 118–127 (2016).

Lynn, V. et al. CLPsych 2018 shared task: Predicting current and future psychological health from childhood essays. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 37–46 (2018).

Zirikly, A., Resnik, P., Uzuner, O. & Hollingshead, K. Clpsych 2019 shared task: Predicting the degree of suicide risk in reddit posts. In: Proceedings of the sixth workshop on computational linguistics and clinical psychology, 24–33 (2019).

Tsakalidis, A. et al. Overview of the CLPsych 2022 shared task: Capturing moments of change in longitudinal user posts. In: Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, 184–198 (2022).

Chim, J. et al. Overview of the CLPsych 2024 shared task: Leveraging large language models to identify evidence of suicidality risk in online posts. In: Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024), 177–190 (2024).

Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H. & Eichstaedt, J. C. Detecting depression and mental illness on social media: an integrative review. Current Opinion Behav. Sci. 18, 43–49 (2017).

Article

Google Scholar

Chancellor, S. & De Choudhury, M. Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digital Med. 3, 43 (2020).

Article
PubMed
PubMed Central

Google Scholar

Ríssola, E. A., Losada, D. E. & Crestani, F. A survey of computational methods for online mental state assessment on social media. ACM Trans. Comput. Healthcare 2, 1–31 (2021).

Article

Google Scholar

Chen, X. & Genc, Y. A systematic review of artificial intelligence and mental health in the context of social media. In: International Conference on Human-Computer Interaction, 353–368 (Springer, 2022).

Crestani, F., Losada, D. E. & Parapar, J. Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years of the ERisk Project, 1018 (Springer Nature, 2022).

Ríssola, E. A., Parapar, J., Losada, D. E. & Crestani, F. A survey of the first five years of erisk: Findings and conclusions. In: Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years of the eRisk Project, 31–57 (Springer, 2022).

Shensa, A., Sidani, J. E., Dew, M. A., Escobar-Viera, C. G. & Primack, B. A. Social media use and depression and anxiety symptoms: A cluster analysis. Am. J. Health Behav. 42, 116–128 (2018).

Article
PubMed
PubMed Central

Google Scholar

Aragón, M. E., López-Monroy, A. P. & Montes-y Gómez, M. INAOE-CIMAT at eRisk 2019: Detecting signs of anorexia using fine-grained emotions. In: Working Notes of CLEF 2019 Conference and Labs of the Evaluation Forum (2019).

De Choudhury, M., Gamon, M., Counts, S. & Horvitz, E. Predicting depression via social media. In: Proceedings of the international AAAI conference on web and social media 7(1), 128–137 (2013).

Article

Google Scholar

Couto, M., Parapar, J. & Losada, D. E. Comparison of clustering algorithms for knowledge discovery in social media publications: A case study of mental health analysis. Procesamiento del lenguaje natural 73, 69–81 (2024).

Google Scholar

Yang, P., Han, K. & Diesner, J. Topics, temporal patterns, and network characteristics of AI-related discourse on reddit. In: International Conference on Advances in Social Networks Analysis and Mining, 333–344 (Springer, 2024).

Kerasiotis, M., Ilias, L. & Askounis, D. Depression detection in social media posts using transformer-based models and auxiliary features. Soc. Netw. Analysis Mining 14, 196 (2024).

Article

Google Scholar

Kanahuati-Ceballos, M. & Valdivia, L. J. Detection of depressive comments on social media using rnn, lstm, and random forest: comparison and optimization. Soc. Netw. Anal. Mining 14, 44 (2024).

Article

Google Scholar

Naseem, U. et al. Incorporating historical information by disentangling hidden representations for mental health surveillance on social media. Soc. Netw. Anal. Mining 14, 9 (2023).

Article

Google Scholar

He, M., Bakker, E. M. & Lew, M. S. Dpd (depression detection) net: a deep neural network for multimodal depression detection. Health Inform. Sci. Syst. 12, 53 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Thushari, P. D. et al. Identifying discernible indications of psychological well-being using ml: explainable ai in reddit social media interactions. Soc. Netw. Anal. Mining 13, 141 (2023).

Article

Google Scholar

Bao, E., Pérez, A. & Parapar, J. Explainable depression symptom detection in social media. Health Inform. Sci. Syst. 12, 47 (2024).

Article
PubMed
PubMed Central

Google Scholar

Kherwa, P. & Bansal, P. Topic Modeling: A Comprehensive Review. EAI Endorsed Trans. Scalable Inform. Syst. 7 (2019).

Curiskis, S. A., Drake, B., Osborn, T. R. & Kennedy, P. J. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inform. Proces. Manag. 57, 102034 (2020).

Article

Google Scholar

Blair, S. J., Bi, Y. & Mulvenna, M. D. Aggregated topic models for increasing social media topic coherence. Appl. Intell. 50, 138–156 (2020).

Article

Google Scholar

Zhao, H. et al. Topic modelling meets deep neural networks: a survey. Preprint at arXiv:2103.00498 (2021).

Laureate, C. D. P., Buntine, W. & Linger, H. A systematic review of the use of topic models for short text social media analysis. Artif. Intell. Rev. 56, 14223–14255 (2023).

Article

Google Scholar

McInnes, L., Healy, J. & Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. Preprint at arXiv:1802.03426 (2018).

Feldbauer, R. & Flexer, A. A comprehensive empirical comparison of hubness reduction in high-dimensional spaces. Knowl. Inform. Syst. 59, 137–166 (2019).

Article
PubMed

Google Scholar

Sarkar, S. & Ghosh, A. K. On perfect clustering of high dimension, low sample size data. Preprint at arXiv:1612.09121 (2016).

Peng, D., Gui, Z. & Wu, H. Interpreting the curse of dimensionality from distance concentration and manifold effect. Preprint at arXiv:2401.00422 (2024).

Pestov, V. On the geometry of similarity search: dimensionality curse and concentration of measure. Preprint at cs/9901004 (1999).

McInnes, L. et al. HDBSCAN: Hierarchical density based clustering. J. Open Sourc. Softw. 2, 205 (2017).

Article
ADS

Google Scholar

Losada, D. E., Crestani, F. & Parapar, J. eRisk 2020: Self-harm and depression challenges. In: European conference on information retrieval, 557–563 (Springer, 2020).

Hoyle, A., Goel, P. & Resnik, P. Improving neural topic models using knowledge distillation. Preprint at arXiv:2010.02377 (2020).

Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971).

Article

Google Scholar

Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977) (Standard reference for interpreting Kappa levels.).

Article
CAS
PubMed

Google Scholar

McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012).

Article
PubMed
PubMed Central

Google Scholar

Fernández-Pichel, M., Aragón, M. E., Saborido-Patiño, J. & Losada, D. E. Personality trait analysis during the covid-19 pandemic: a comparative study on social media. J. Intell. Inform. Syst. 62, 117–142 (2023).

Article

Google Scholar