Machine learning is a process of teaching a computer system to make accurate predictions when fed data, without it being explicitly programmed to do so; instead, the computer ‘learns’ by itself through experience. Over the past couple of decades, machine learning has made multiple inroads into our daily life: banks use it to identify suspicious transactions; Amazon and Netflix use it to recommend books and movies; email providers use it to filter out spam. And now, public health researchers are starting to apply machine learning methods to explore very large data sets, such as the Sax Institute’s 45 and Up Study.
In one recent study, researchers used machine learning techniques to predict the factors involved in the development of type 2 diabetes, basing their work on health data from over 230,000 participants in the 45 and Up Study.
The study identified machine learning models that performed much better than the conventional regression models commonly used to predict diabetes and identify risk factors, with a 73% to 80% accuracy for diabetes prediction for up to ten years. One key benefit of the machine learning models was that they were based solely on self-reported information from Study participants rather than biomarkers. In other words, they didn’t involve the blood samples that are generally required for conventional prediction tools, which are costly to organise and require the intervention of health professionals.
The researchers’ models predicted BMI as the most significant factor contributing to the development of diabetes. Obese participants were twice as likely to develop diabetes over ten years, and the highest-performing model predicted that if BMI in obese and overweight participants could be reduced to a healthy range, the ten-year probability of new diabetes diagnoses would be reduced from 8.3% of participants to just 2.8%.
Another research project applied machine learning techniques to data from the 45 and Up Study to look at the shared risk factors for cancer and mental disorders. The researchers from the University of Melbourne and other institutions used a machine learning model to crunch through 48 potential risk factors for multimorbidity with the two conditions – the first study of its kind. They found that Study participants with cancer were 3.41 times more likely to develop mental disorders than other participants – and that participants with mental disorders were 3.06 times more likely to develop cancer than people without mental disorders.
Although it’s hard to tease out cause-and-effect between these two conditions, the researchers were able to identify some shared risk factors, such as female gender, smoking, psychological distress, low fruit intake, hypertension, arthritis, asthma and diabetes.
Machine learning, according to this paper, has come of age in population health, and it is now possible to automate complex tasks with big data sets that could have only been done previously with substantial human labour. But there remain risks, the authors explain, particularly in creating methods that are “explainable, that respect privacy and that make accurate causal inferences”.
Find out more