a Department of Forensic Psychiatry, Psychiatric Hospital, University of Zurich, Switzerland
The rapid increase in the amount of available data within the field of psychiatry makes modern statistical analysis methods increasingly important. Machine learning offers a possibility to identify complex relationships within large data sets and to support scientific work. The following article presents a brief overview of machine learning and exemplary discusses first clinically relevant results of such analyses on a patient collective of a forensic psychiatric institution with focus on treatment of schizophrenia spectrum disorders.
Machine learning and its value in psychiatry
In the following, a short introduction to the complex matter of machine learning, in particular in psychiatric research, will be provided. Further knowledge for the interested reader can be found in the following articles [1–8].
Clarifying the term machine learning is difficult and its definition is highly debated even among experts. Ultimately, it is the application of complex algorithms to statistics: due to technical progress with the increase in the processing power of computers, complex mathematical algorithms, some of which have been known in theory for a long time, can now be carried out in statistical analysis with ease.
From previous psychiatric research, it is apparent that the majority of statistical analyses have been conducted using null hypothesis significance testing (NHST) or regression models such as GLM (generalised linear model) and survival analysis. These statistical approaches are applied to the data of interest, parameters are estimated, and conclusions are drawn regarding the research questions. Ultimately a simple and understandable impression of the relationship between the input variables and responses is provided .
Although these methods have their justification and strengths, there are several limitations. (1) Statistical models are chosen in advance and may not fit the structure of the data and ultimately the research question. For example, in the case of linear regressions, only linear relationships among the variables are assumed. (2) Only a limited number of variables can be investigated (to avoid alpha error accumulation). (3) NHST is not able to analyse the interaction of numerous variables with each other (as is often the case in large data sets). (4) Only a statement about the probability of a (null)hypothesis being true or not true (p-value) can be made. As a result, research questions must be precisely defined and limited (i.e., unambiguous hypotheses must be formulated in advance).
Machine learning, on the other hand, does not apply an a priori statistical model, but rather focuses on identifying the best statistical algorithm that can discover existing patterns in a data set and then use those patterns to predict future observations. The focus here is not on the estimation of the parameters and not to best explain the variability in the outcome or to go through hypothesis testing, but to evaluate the performance of statistical algorithms by mainly measuring how well they predict the out-of-sample outcomes. However, it must be kept in mind that, as explained, the focus of machine learning is on the detection of correlations and prediction, and not on the explanation of causal relationships. The identified algorithms may describe the existing data well, but possibly may not be applied to other data sets or new situations – caution is required when interpreting and generalising machine learning results [10, 11]. In summary, machine learning offers possibilities and alternative applications: big datasets with a multitude of variables can be processed, complex and non-linear interrelationships can be analysed, the quality of a statistical model can be quantified in a differentiated way with various parameters (accuracy, area under the curve, sensitivity, specificity) and, finally, simple and accurate predictive models can be derived from complex datasets. Especially in psychiatric research, poorly understood phenomena and complex multifactorial questions are investigated, with data structures that are therefore particularly suitable for the use of machine learning and may provide feasible models for everyday clinical decisions.
A rough illustration of the application steps follows below. The dataset of interest is divided a priori into a training dataset and a test dataset. The training dataset is used for the next machine learning steps, while the test dataset is not manipulated initially. A machine learning algorithm is calculated or trained on the training dataset. The algorithms are numerous and the most suitable one depends on the data structure and research question (e.g., supported vector machine, trees, k-nearest neighbours algorithm, neural networks).
During this algorithm training, various intermediate steps can be incorporated to improve data processing, such as the imputation of missing values or the initial reduction of variables. Regarding imputation, it must be noted that it can only be performed if the data missing are random, and not if the value of the missing variable is related to the reason it is missing. The rates of missing data that can be imputed depend on the type of missing values and the imputation method and can be up to 50% . After processing, identification and training of the algorithm on the training dataset, the final model is then applied to the still unmanipulated test dataset: the trained machine learning algorithm is used to predict an outcome based on the test data. Finally, a receiver operating characteristic curve (ROC) with corresponding performance measures is calculated and the quality of the model can be evaluated. In the course of this process, the variables considered to be the most important are identified and their significance for the respective question is quantified.
Eventually, the issue of "overfitting" has to be mentioned [13, 14]. Overfitting occurs when a machine learning model learns the details and noise in the training data (e.g., include outliers) to such a degree that it reduces the ability to predict the investigated outcome in a new dataset. This means that the noise in the training data is incorporated in the machine learning algorithm. Consequently, this algorithm becomes less applicable to new data and the generalisability of the model on other samples or populations suffers. For this reason, machine learning algorithms must be carefully adjusted – one speaks here of “tuning” in order to counteract overfitting: data are increased artificially by various technical possibilities in the process of algorithm training (e.g., resampling methods) . Finally, as should be common in research, the corresponding models have to be tested and optimised several times on different populations.
Machine learning is already applied in several research areas of psychiatry where a wealth of complex data is collected. These include, for example, studies on pharmaceuticals, psychotherapeutic interventions, neuroimaging and genetics .
A practical application of machine learning in the field of complex data analysis is presented as an example in the following, based on findings of our research group at the Department of Forensic Psychiatry at the University Hospital Zurich. To prevent accusations of self-plagiarism, we would like to point out in advance that the following section contains text passages from our previous studies, which was hardly avoidable upon completion.
As part of our research efforts on the topic of schizophrenia spectrum disorders (SSDs), violence and offending we collected data from medical files of 370 offender patients with SSD according to ICD-9  or ICD-10 , who were judicially admitted to the Centre for Inpatient Forensic Therapies at the Zurich University Hospital of Psychiatry between 1982 and 2016. Over 500 different variables were collected from the medical records and included sociodemographic data, data on childhood and adolescence experiences, psychiatric history, criminal history, social and sexual functioning, details on the offence leading to forensic hospitalisation, prison data, particularities of the current hospitalisation and psychopathological symptoms (thereby closely adopting the positive and negative symptom scale; PANSS ). These data were analysed with regard to various research questions using machine learning. The technical design of the machine learning procedure and algorithm selection of our individual studies is complex and beyond the scoop of this review. For detailed guidelines, the following literature and their references can be consulted [20, 21].
A summary of selected results, which may also have implications for clinical practice, follows below.
One of our exploratory studies  addressed the distinction between violent and non-violent forensic psychiatrically hospitalised offenders with SSD. Understanding influences and relationships regarding the drivers of violent offending could inform the development of more accurate risk assessment and violence prevention strategies tailored to individuals with SSD, since it is one of the most common diagnoses in forensic institutions. In this study, Gradient Boosting was identified as the best performing algorithm and ten indicative variables, which were applied to the validation subset (30% of the total dataset) and yielded a balanced accuracy of 67.83% and an area under the curve (AUC) of 0.76. The model showed a sensitivity of 72.73% (i.e., ability to correctly classify the cases of non-violent index offences) and a lower specificity of 62.92% (i.e., ability to correctly identify cases of violent index offences). Violent offenders were found to spent more time in forensic hospitalisation and prison settings. Our findings indicated a more severe manifestation of the psychotic disorder in violent offenders, as higher PANSS scores at admission and discharge, higher daily cumulative olanzapine-equivalent antipsychotic dosage at discharge and younger age at SSD diagnosis were found to be influential variables. Non-violent offenders were found to have more past criminal convictions than violent offenders, which may reflect a tendency for non-violent offenders to be perceived negatively through socially inappropriate and bizarre behaviour and to tend to commit minor offences, which may lead to more contact with the legal system but may also lead to more psychiatric treatment (e.g., probation) and regular supervision. Other factors found included more isolation in adulthood in non-violent offenders and childhood poverty in violent offenders. Other than PANSS scores and antipsychotic medication, all factors found are static and therefore have little or no modifiability. However, a patient’s perception of these experiences may be modified through more individualised therapeutic approaches that take into account their unique histories.
In another study  we analysed whether accumulation and type of stressors in the inpatient's history influenced the severity of an offence. Employing findings of general strain theory [24–29], we analysed stressors that could be surveyed in our files and that are assumed to be probable drivers of offending. The antecedents of offending postulated by the general strain theory, namely the impossibility of achieving positive stimuli, the elimination of positive stimuli and the presentation of negative stimuli underline the role of factors that affect the probability of offending even independently of mental illness. Logistic regression revealed that more stressors led to a higher probability of committing a severe offence. Boosted Classification Trees was identified as the best machine learning algorithm and by using the five most important variables, an AUC of 0.76 was attained, as compared with an AUC of 0.83 using all 21 predictor variables. Social isolation was again found to be related to minor offences, suggesting a division between internalising and externalising coping behaviours in dealing with stressors. Also, failure in school was related to minor offences. Patients who had received coercive psychiatric treatment in the past and who were unemployed at the time of the index offence tended to commit violent offences. Furthermore, patients who had been separated from their caregivers in childhood or adolescence were found to be more prone to committing violent offences.
Our exploratory study on substance use disorder (SUD; ) addressed the relationship between SSD, offending behaviour, and substance abuse. The goal was to identify factors that distinguish SSD patients with and without SUD from those with cannabis abuse alone and to determine a predictive value for this distinction. Naïve Bayes was identified as the best algorithm and 15 variables were found to be influential. The variables were applied to the validation subsets (30% of the total dataset) and yielded a balanced accuracy of 67.02% and an AUC of 0.70 for model 1 (comparing SUD vs no SUD), and a balanced accuracy of 68.85% and an AUC of 0.78 for model 2 (comparing cannabis use disorder vs no SUD). The sensitivity of model 1 was 94.03% and of model 2 81.82% (i.e., ability to correctly classify the actual cases of SUD or cannabis use disorder). A lower specificity of 40% for model 1 and 55.88% for model 2 indicated their ability to correctly identify those having no SUD. Patients with SUD (including cannabis) tended to be younger when a psychiatric diagnosis was first recorded and at first entry into the federal central criminal registry than those without any SUD. Patients with any SUD were found less likely to be married, have children, or live in private (non-institutionalised) housing and more likely to work in low-wage jobs. We identified more behavioural and disciplinary problems in childhood and adolescence among patients with cannabis use disorder. They also tended to show more negative behaviour towards fellow patients and constant breaches of rules on the ward. Our study also found a higher symptom severity in patients with SSD and any SUD at admission and a longer duration of forensic psychiatric inpatient treatment in comparison with non-users. They were more frequently discharged with higher anti-psychotic medication prescriptions. Symptoms such as motor retardation, poor attention and poor impulse control tended to be more present in patients with cannabis use disorder. This study led to the conclusion that in the examined sample of patients, the main problem appeared to be the SUD itself and the distinction between cannabis use disorders and other SUDs was of less clinical relevance.
General latent variable modelling was employed in our study on differences between female and male offender patients with SSD. Since the latent variable gender was nominal, latent class analysis was applied  – a statistical method specifically designed for the identification of unobservable (i.e., latent) classes within a data set. Latent class analysis identified the two groups based on all specified variables and found that gender can account for some differences between the two identified classes. The female-dominated class showed a higher probability to attempt or commit homicide and tended to more frequently target individuals with whom they had a close relationship. The female-dominated class seemed less likely to commit sexual offences and to be single, but they more frequently lived in separation or had been divorced. They were likely to be older at first diagnosis of SSD and first inpatient treatment, have experienced fewer psychiatric inpatient treatments, have fewer comorbidities, have been married, have a higher level of formal education and have not been homeless. Females in our sample were less likely than men to experience remission in psychopathology during inpatient treatment. Our study therefore challenges clinicians to help reduce the disadvantages of female offenders with SSD by recognising that women have different treatment needs than men in many ways and by adopting new treatment approaches to address specific treatment needs, such as by being in a single-gender environment since they may feel safer and more comfortable talking to other women about their experiences.
To gain a better understanding of the criminal behaviour of migrants with SSD and to identify factors that distinguish European from non-European migrant offender patients with SSD, we conducted yet another study employing machine learning-algorithms . A tree algorithm achieved the best results in distinguishing European from non-European patients with an accuracy of 74.5%, an AUC of 0.75, a sensitivity of 75% (i.e, ability to correctly classify Europeans) and a specificity of 69% (i.e., ability to correctly identify non-Europeans). As compared with European individuals, more non-European offender patients suffered from poverty during childhood and adolescence. In contrast, fewer of the migrant patients suffered from social isolation during childhood and adolescence, compared with Europeans. These group differences were attributed to the fact that the migrants came from countries with weak economies (the majority from Africa or the Middle East) but with collectivist cultures having closer family structures and less isolation. Non-Europeans also showed more language problems during psychotherapy, were more likely to be engaged in only the simplest occupational therapy tasks and received a higher dose of antipsychotic medication at discharge, suggesting cultural and linguistic misunderstandings between migrants and mental health professionals. In terms of criminal and psychiatric factors, our findings demonstrated that the subgroup of non-European migrants may be quite similar to the rest of the offenders with SSD.
In summary, tree and naïve Bayes algorithms were shown to be the most efficient in our dataset with a variety of complex variables, i.e., they achieved the highest AUC. These algorithms are easy to implement, can handle continuous and categorical variables as well as a large number of predictors and are relatively robust against outliers.
The studies performed illustrate the strengths of machine learning: its wide range of possible applications as well as the ability to handle large datasets, to identify key variables among a multitude of possible predictors, to develop simple and reproducible models and to describe the performance of these models. Ultimately, the machine-learning models can help to identify specific individuals and their particular characteristics and needs at an early stage and can thus provide the foundation for personalised tailored treatment. It is important to emphasise that machine learning should not be used as a possible automated method for classifying psychiatric patients into groups (e.g., dangerous/non-dangerous, migrant/non-migrant) or even to predict future criminal or deviant behaviour. Models with sensitivity and specificity below 100% inevitably result in patients being mistakenly assigned to the wrong groups or incorrect predictions being made. In view of such sensitive areas, where human lives are ultimately at stake, an intense discourse is needed on how data can be collected and used, and which criteria machine-learning models must meet to be deemed useful.
In the studies presented here, machine learning should be considered as an alternative statistical method for retrospective differentiation of individuals rather than a predictive modelling technique. Thus, our results can serve as a guide and generator of new hypotheses to optimise existing risk assessment and treatment, but need to be explored in further studies, preferably conducted prospectively.
No financial support and no other potential conflict of interest relevant to this article was reported.
Johannes Kirchebner, MD
Department of Forensic Psychiatry
University Hospital of Psychiatry
University of Zurich
1. Rutledge RB, Chekroud AM, Huys QJ. Machine learning and big data in psychiatry: toward clinical applications. Curr Opin Neurobiol. 2019 Apr;55:152–9. http://dx.doi.org/10.1016/j.conb.2019.02.006 PubMed 1873-6882
2. Gillan CM, Whelan RA. What big data can do for treatment in psychiatry. Curr Opin Behav Sci. 2017;18:34–42. http://dx.doi.org/10.1016/j.cobeha.2017.07.003 2352-1546
3. Bzdok D, Meyer-Lindenberg A. Machine Learning for Precision Psychiatry: opportunities and Challenges. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018 Mar;3(3):223–30. http://dx.doi.org/10.1016/j.bpsc.2017.11.007 PubMed 2451-9030
4. Janssen RJ, Mourão-Miranda J, Schnack HG. Making Individual Prognoses in Psychiatry Using Neuroimaging and Machine Learning. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018 Sep;3(9):798–808. http://dx.doi.org/10.1016/j.bpsc.2018.04.004 PubMed 2451-9030
5. Dwyer DB, Falkai P, Koutsouleris N, Machine Learning Approaches for Clinical Psychology and Psychiatry. Machine Learning Approaches for Clinical Psychology and Psychiatry. Annu Rev Clin Psychol. 2018 May;14(1):91–118. http://dx.doi.org/10.1146/annurev-clinpsy-032816-045037 PubMed 1548-5951
6. Iniesta R, Stahl D, McGuffin P. Machine learning, statistical learning and the future of biological research in psychiatry. Psychol Med. 2016 Sep;46(12):2455–65. http://dx.doi.org/10.1017/S0033291716001367 PubMed 1469-8978
7. Oquendo MA, Baca-Garcia E, Artés-Rodríguez A, Perez-Cruz F, Galfalvy HC, Blasco-Fontecilla H Machine learning and data mining: strategies for hypothesis generation. Mol Psychiatry. 2012 Oct;17(10):956–9. http://dx.doi.org/10.1038/mp.2011.173 PubMed 1476-5578
9. Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231. http://dx.doi.org/10.1214/ss/1009213726 0883-4237
10. Pearl J.Causal inference in statistics: An overview. Statistics Surveys, 2009. 3:96-146, 51.
11. Wager S, Athey S. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. J Am Stat Assoc. 2018;113(523):1228–42. http://dx.doi.org/10.1080/01621459.2017.1319839 0162-1459
12. Scheffer J. Dealing with missing data. Res Lett Inf Math Sci. 2002;3:152–60.1175-2777
13. Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014 Oct;11(10):e1001744. http://dx.doi.org/10.1371/journal.pmed.1001744 PubMed 1549-1676
14. Studerus E, Ramyead A, Riecher-Rössler A. Prediction of transition to psychosis in patients with a clinical high risk for psychosis: a systematic review of methodology and reporting. Psychol Med. 2017 May;47(7):1163–78. http://dx.doi.org/10.1017/S0033291716003494 PubMed 1469-8978
16. Chekroud AM, Bondar J, Delgadillo J, Doherty G, Wasil A, Fokkema M The promise of machine learning in predicting treatment outcomes in psychiatry. World Psychiatry. 2021 Jun;20(2):154–70. http://dx.doi.org/10.1002/wps.20882 PubMed 1723-8617
18. World Health, O., ICD-10 : international statistical classification of diseases and related health problems : tenth revision. 2004, World Health Organization: Geneva.
20. Murphy KP. Machine learning: a probabilistic perspective. 2012: MIT press.
21. James G An introduction to statistical learning. Vol. 112. 2013: Springer.
22. Sonnweber M, Lau S, Kirchebner J. Violent and non-violent offending in patients with schizophrenia: exploring influences and differences via machine learning. Compr Psychiatry. 2021 May;107:152238. http://dx.doi.org/10.1016/j.comppsych.2021.152238 PubMed 1532-8384
23. Kirchebner J Stress, schizophrenia, and violence: a machine learning approach. J Interpers Violence. 2022;•••:0886260520913641. PubMed 0886-2605
24. Dohrenwend BP, Dohrenwend BS. Social and cultural influences on psychopathology. Annu Rev Psychol. 1974;25(1):417–52. http://dx.doi.org/10.1146/annurev.ps.25.020174.002221 PubMed 0066-4308
25. Link NW, Cullen FT, Agnew R, Link BG. Can general strain theory help us understand violent behaviors among people with mental illnesses? Justice Q. 2016;33(4):729–54. http://dx.doi.org/10.1080/07418825.2015.1005656 0741-8825
26. Moon B, Blurton D, McCluskey JD. General strain theory and delinquency: focusing on the influences of key strain characteristics on delinquency. Crime Delinq. 2008;54(4):582–613. http://dx.doi.org/10.1177/0011128707301627 0011-1287
27. Silver E. Mental disorder and violent victimization: the mediating role of involvement in conflicted social relationships. Criminology. 2002;40(1):191–212. http://dx.doi.org/10.1111/j.1745-9125.2002.tb00954.x 0011-1384
28. Agnew R. Building on the foundation of general strain theory: specifying the types of strain most likely to lead to crime and delinquency. J Res Crime Delinq. 2001;38(4):319–61. http://dx.doi.org/10.1177/0022427801038004001 0022-4278
29. Agnew R. General strain theory: Current status and directions for further research, in Taking stock. 2017, Routledge. p. 101-123.
30. Patterson A, Sonnweber M, Lau S, Günther MP, Seifritz E, Kirchebner J. Schizophrenia and substance use disorder: characteristics of coexisting issues in a forensic setting. Drug Alcohol Depend. 2021 Sep;226:108850. http://dx.doi.org/10.1016/j.drugalcdep.2021.108850 PubMed 1879-0046
31. Muthén BO, Beyond SE. General latent variable modeling. Behaviormetrika. 2002;29(1):81–117. http://dx.doi.org/10.2333/bhmk.29.81 0385-7417
32. Huber DA, Lau S, Sonnweber M, Günther MP, Kirchebner J. Exploring Similarities and Differences of Non-European Migrants among Forensic Patients with Schizophrenia. Int J Environ Res Public Health. 2020 Oct;17(21):7922. http://dx.doi.org/10.3390/ijerph17217922 PubMed 1660-4601
Published under the copyright license
“Attribution – Non-Commercial – NoDerivatives 4.0”.
No commercial reuse without permission.