Abstract
A randomized controlled trial (RCT) is used to study the safety and efficacy of new treatments, by comparing patient outcomes of an intervention group with a control group. Traditionally, RCTs rely on statistical analyses to assess the differences between the treatment and control groups. However, such statistical analyses are generally not designed to assess the impact of the intervention at an individual level. In this paper, we explore machine learning models in conjunction with an RCT for personalized predictions of a depression treatment intervention, where patients were longitudinally monitored with wearable devices. We formulate individual-level predictions in the intervention and control groups from an RCT as a multi-task learning (MTL) problem, and propose a novel MTL model specifically designed for RCTs. Instead of training separate models for the intervention and control groups, the proposed MTL model is trained on both groups, effectively enlarging the training dataset. We develop a hierarchical model architecture to aggregate data from different sources and different longitudinal stages of the trial, which allows the MTL model to exploit the commonalities and capture the differences between the two groups. We evaluated the MTL approach in an RCT involving 106 patients with depression, who were randomized to receive an integrated intervention treatment. Our proposed MTL model outperforms both single-task models and the traditional multi-task model in predictive performance, representing a promising step in utilizing data collected in RCTs to develop predictive models for precision medicine.
- [n.d.]. What are Active Zone Minutes or active minutes on my Fitbit device? https://help.fitbit.com/articles/en_US/Help_article/1379.htm. (Accessed on 10/06/2021).Google Scholar
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google Scholar
- Daniel Almirall, Inbal Nahum-Shani, Nancy E Sherwood, and Susan A Murphy. 2014. Introduction to SMART designs for the development of adaptive interventions: with application to weight loss research. Translational behavioral medicine 4, 3 (2014), 260--274.Google Scholar
- Michael A Andrykowski, Matthew J Cordova, Jamie L Studts, and Thomas W Miller. 1998. Posttraumatic stress disorder after treatment for breast cancer: Prevalence of diagnosis and use of the PTSD Checklist---Civilian Version (PCL---C) as a screening instrument. Journal of consulting and clinical psychology 66, 3 (1998), 586.Google ScholarCross Ref
- Dror Ben-Zeev, Emily A Scherer, Rui Wang, Haiyi Xie, and Andrew T Campbell. 2015. Next-generation psychiatric assessment: Using smartphone sensors to monitor behavior and mental health. Psychiatric rehabilitation journal 38, 3 (2015), 218.Google Scholar
- Danilo Bzdok and Andreas Meyer-Lindenberg. 2018. Machine learning for precision psychiatry: opportunities and challenges. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 3, 3 (2018), 223--230.Google ScholarCross Ref
- Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing. 1293--1304.Google ScholarDigital Library
- Daniel G Carey. 2009. Quantifying differences in the "fat burning" zone and the aerobic zone: implications for training. The Journal of Strength & Conditioning Research 23, 7 (2009), 2090--2095.Google ScholarCross Ref
- Charles S Carver, Michael F Scheier, and Jagdish K Weintraub. 1989. Assessing coping strategies: a theoretically based approach. Journal of personality and social psychology 56, 2 (1989), 267.Google ScholarCross Ref
- Adam Mourad Chekroud, Ryan Joseph Zotti, Zarrar Shehzad, Ralitza Gueorguieva, Marcia K Johnson, Madhukar H Trivedi, Tyrone D Cannon, John Harrison Krystal, and Philip Robert Corlett. 2016. Cross-trial prediction of treatment outcome in depression: a machine learning approach. The Lancet Psychiatry 3, 3 (2016), 243--250.Google ScholarCross Ref
- Jenny Chum, Min Suk Kim, Laura Zielinski, Meha Bhatt, Douglas Chung, Sharon Yeung, Kathryn Litke, Kathleen McCabe, Jeff Whattam, Laura Garrick, et al. 2017. Acceptability of the Fitbit in behavioural activation therapy for depression: a qualitative study. Evidence-based mental health 20, 4 (2017), 128--133.Google Scholar
- Paul Ciechanowski, Naomi Chaytor, John Miller, Robert Fraser, Joan Russo, Jurgen Unutzer, and Frank Gilliam. 2010. PEARLS depression treatment for individuals with epilepsy: a randomized controlled trial. Epilepsy & Behavior 19, 3 (2010), 225--231.Google ScholarCross Ref
- Paul Ciechanowski, Edward Wagner, Karen Schmaling, Sheryl Schwartz, Barbara Williams, Paula Diehr, Jayne Kulzer, Shelly Gray, Cheza Collier, and James LoGerfo. 2004. Community-integrated home-based depression treatment in older adults: a randomized controlled trial. Jama 291, 13 (2004), 1569--1577.Google ScholarCross Ref
- Nicholas Cummins, Stefan Scherer, Jarek Krajewski, Sebastian Schnieder, Julien Epps, and Thomas F Quatieri. 2015. A review of depression and suicide risk assessment using speech analysis. Speech Communication 71 (2015), 10--49.Google ScholarDigital Library
- F Dimeo, M Bauer, I Varahram, G Proest, and U Halter. 2001. Benefits from aerobic exercise in patients with major depression: a pilot study. British journal of sports medicine 35, 2 (2001), 114--117.Google Scholar
- Thomas J D'Zurilla, Arthur M Nezu, and Albert Maydeu-Olivares. 2002. Social problem-solving inventory-revised. (2002).Google Scholar
- Takeshi Emura, Shigeyuki Matsui, and Hsuan-Yu Chen. 2019. compound. Cox: univariate feature selection and compound covariate for predicting survival. Computer methods and programs in biomedicine 168 (2019), 21--37.Google Scholar
- Sherrill Evans, Sube Banerjee, Morven Leese, and Peter Huxley. 2007. The impact of mental illness on quality of life: A comparison of severe mental illness, common mental disorder and healthy population samples. Quality of life research 16, 1 (2007), 17--29.Google Scholar
- Michael W Eysenck and Małgorzata Fajkowska. 2018. Anxiety and depression: toward overlapping and distinctive features.Google Scholar
- Michael P Fay and Michael A Proschan. 2010. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics surveys 4(2010), 1.Google Scholar
- Julio Fernandez-Mendoza, Sarah Shea, Alexandros N Vgontzas, Susan L Calhoun, Duanping Liao, and Edward O Bixler. 2015. Insomnia and incident depression: role of objective sleep duration and natural history. Journal of sleep research 24, 4 (2015), 390--398.Google ScholarCross Ref
- Mads Frost, Gabriela Marcu, Rene Hansen, Karoly Szaántó, and Jakob E Bardram. 2011. The MONARCA self-assessment system: Persuasive personal monitoring for bipolar patients. In 2011 5th international conference on pervasive computing technologies for healthcare (PervasiveHealth) and workshops. IEEE, 204--205.Google ScholarCross Ref
- Nicole B Gabler, Naihua Duan, Sunita Vohra, and Richard L Kravitz. 2011. N-of-1 trials in the medical literature: a systematic review. Medical care (2011), 761--768.Google ScholarCross Ref
- Richard M Glass, Andrew T Allan, EH Uhlenhuth, Chase P Kimball, and Dennis I Borinstein. 1978. Psychiatric screening in a medical clinic: An evaluation of a self-report inventory. Archives of General Psychiatry 35, 10 (1978), 1189--1195.Google ScholarCross Ref
- Shahab Haghayegh, Sepideh Khoshnevis, Michael H Smolensky, Kenneth R Diller, and Richard J Castriotta. 2019. Accuracy of wristband Fitbit models in assessing sleep: systematic review and meta-analysis. Journal of medical Internet research 21, 11 (2019), e16273.Google ScholarCross Ref
- Eduardo Hariton and Joseph J Locascio. 2018. Randomised controlled trials---the gold standard for effectiveness research. BJOG: an international journal of obstetrics and gynaecology 125, 13 (2018), 1716.Google Scholar
- Anne-Claire Haury, Pierre Gestraud, and Jean-Philippe Vert. 2011. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS one 6, 12 (2011), e28210.Google ScholarCross Ref
- The Lancet Global Health. 2020. Mental health matters. The Lancet. Global Health 8, 11 (2020), e1352.Google ScholarCross Ref
- Emily T Hébert, Chaelin K Ra, Adam C Alexander, Angela Helt, Rachel Moisiuc, Darla E Kendzor, Damon J Vidrine, Rachel K Funk-Lawler, and Michael S Businelle. 2020. A mobile Just-in-Time adaptive intervention for smoking cessation: pilot randomized controlled trial. Journal of medical Internet research 22, 3 (2020).Google ScholarCross Ref
- CJK Henry. 2005. Basal metabolic rate studies in humans: measurement and development of new equations. Public health nutrition 8, 7a (2005), 1133--1152.Google Scholar
- Hugh Hunkin, Daniel L King, and Ian T Zajac. 2020. Perceived acceptability of wearable devices for the treatment of mental health problems. Journal of clinical psychology 76, 6 (2020), 987--1003.Google ScholarCross Ref
- Frank R Ihmig, Frank Neurohr-Parakenings, Sarah K Schäfer, Johanna Lass-Hennemann, and Tanja Michael. 2020. On-line anxiety level detection from biosignals: Machine learning based on a randomized controlled trial with spider-fearful individuals. Plos one 15, 6 (2020), e0231517.Google ScholarCross Ref
- Dan V Iosifescu, Scott Greenwald, Philip Devlin, David Mischoulon, John W Denninger, Jonathan E Alpert, and Maurizio Fava. 2009. Frontal EEG predictors of treatment outcome in major depressive disorder. European Neuropsychopharmacology 19, 11 (2009), 772--777.Google ScholarCross Ref
- Ali Jalali, Pradeep Ravikumar, Vishvas Vasuki, and Sujay Sanghavi. 2011. On learning discrete graphical models using group-sparse regularization. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 378--387.Google Scholar
- Natasha Jaques, Sara Taylor, Akane Sano, and Rosalind Picard. 2015. Multi-task, multi-kernel learning for estimating individual wellbeing. In Proc. NIPS Workshop on Multimodal Machine Learning, Montreal, Quebec, Vol. 898. 3.Google Scholar
- Houtan Jebelli, Byungjoo Choi, Hyeonseung Kim, and SangHyun Lee. 2018. Feasibility study of a wristband-type wearable sensor to understand construction workers' physical and mental status. In Construction Research Congress. 367--377.Google ScholarCross Ref
- Maurice Jetté, Ken Sidney, and G Blümchen. 1990. Metabolic equivalents (METS) in exercise testing, exercise prescription, and evaluation of functional capacity. Clinical cardiology 13, 8 (1990), 555--565.Google Scholar
- Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482--7491.Google Scholar
- Ronald C Kessler and Alex Luedtke. 2021. Pragmatic Precision Psychiatry---A New Direction for Optimizing Treatment Selection. JAMA psychiatry (2021).Google Scholar
- Jungyoon Kim, Jangwoon Park, and Jaehyun Park. 2020. Development of a statistical model to classify driving stress levels using galvanic skin responses. Human Factors and Ergonomics in Manufacturing & Service Industries 30, 5 (2020), 321--328.Google ScholarCross Ref
- Meelim Kim, Jaeyeong Yang, Woo-Young Ahn, Hyung Jin Choi, et al. 2021. Machine Learning Analysis to Identify Digital Behavioral Phenotypes for Engagement and Health Outcome Efficacy of an mHealth Intervention for Obesity: Randomized Controlled Trial. Journal of medical Internet research 23, 6 (2021), e27218.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Kent C Kowalski, Peter RE Crocker, and Nanette P Kowalski. 1997. Convergent validity of the physical activity questionnaire for adolescents. Pediatric exercise science 9, 4 (1997), 342--352.Google Scholar
- Richard L Kravitz, Naihua Duan, Sunita Vohra, Jiang Li, et al. 2014. Introduction to N-of-1 trials: indications and barriers. Design and Implementation of N-of-1 Trials: A User's Guide (2014), 1--11.Google Scholar
- Kurt Kroenke and Robert L Spitzer. 2002. The PHQ-9: a new depression diagnostic and severity measure.Google Scholar
- Michael Lecocke and Kenneth Hess. 2006. An empirical study of univariate and genetic algorithm-based feature selection in binary classification with microarray data. Cancer Informatics 2 (2006), 117693510600200016.Google ScholarCross Ref
- Dingwen Li, Jay Vaidya, Michael Wang, Ben Bush, Chenyang Lu, Marin Kollef, and Thomas Bailey. 2020. Feasibility Study of Monitoring Deterioration of Outpatients Using Multimodal Data Collected by Wearables. ACM Transactions on Computing for Healthcare 1, 1 (2020), 1--22.Google ScholarDigital Library
- Elizabeth O Lillie, Bradley Patay, Joel Diamant, Brian Issell, Eric J Topol, and Nicholas J Schork. 2011. The n-of-1 clinical trial: the ultimate strategy for individualizing medicine? Personalized medicine 8, 2 (2011), 161--173.Google Scholar
- Jin Lu, Chao Shang, Chaoqun Yue, Reynaldo Morillo, Shweta Ware, Jayesh Kamath, Athanasios Bamis, Alexander Russell, Bing Wang, and Jinbo Bi. 2018. Joint modeling of heterogeneous sensing data for depression assessment via multi-task learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--21.Google ScholarDigital Library
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems. 4768--4777.Google Scholar
- Nan Lv, Olusola A Ajilore, Corina R Ronneberg, Elizabeth M Venditti, Mark B Snowden, Philip W Lavori, Lan Xiao, Andrea N Goldstein-Piekarski, Joseph Wielgosz, Nancy E Wittels, et al. 2020. The ENGAGE-2 study: engaging self-regulation targets to understand the mechanisms of behavior change and improve mood and weight outcomes in a randomized controlled trial (Phase 2). Contemporary clinical trials 95 (2020), 106072.Google Scholar
- Jun Ma, Nan Lv, Lan Xiao, Andrea Goldstein-Piekarski, Joseph Wielgosz, Philip Lavori, Patrick Stetz, Lisa Goldman Rosas, Elizabeth Venditti, Mark Snowden, et al. 2020. Reduced Nonconscious Reactivity to Threat in Amygdala Mediates Physical Activity and Energy Expenditure in Integrated Behavior Therapy for Adults with Obesity and Comorbid Depression. In CIRCULATION, Vol. 141.Google Scholar
- RD Mirza, S Punja, S Vohra, and G Guyatt. 2017. The history and development of N-of-1 trials. Journal of the Royal Society of Medicine 110, 8 (2017), 330--340.Google ScholarCross Ref
- Kenneth E Mobily, Linda M Rubenstein, Jon H Lemke, Michael W O'Hara, and Robert B Wallace. 1996. Walking and depression in a cohort of older adults: The Iowa 65+ Rural Health Study. Journal of Aging and Physical Activity 4, 2 (1996), 119--135.Google ScholarCross Ref
- Susan A Murphy and Derek Bingham. 2009. Screening experiments for developing dynamic treatment regimes. J. Amer. Statist. Assoc. 104, 485 (2009), 391--408.Google ScholarCross Ref
- Ada Ng, Madhu Reddy, Alyson K Zalta, Stephen M Schueller, et al. 2018. Veterans' perspectives on fitbit use in treatment for post-traumatic stress disorder: an interview study. JMIR mental health 5, 2 (2018), e10415.Google Scholar
- T Christian North, PENNY McCullagh, Zung Vu Tran, David Ed Lavallee, Jean M Williams, Marc V Jones, and Anthony Col Papathomas. 2008. Effect of exercise on depression. (2008).Google Scholar
- World Health Organization et al. 2017. Depression and other common mental disorders: global health estimates. Technical Report. World Health Organization.Google Scholar
- Junbiao Pang, Qingming Huang, and Shuqiang Jiang. 2008. Multiple instance boost using graph embedding based decision stump for pedestrian detection. In European conference on computer vision. Springer, 541--552.Google ScholarCross Ref
- David Paper and David Paper. 2020. Scikit-Learn Classifier Tuning from Simple Training Sets. Hands-on Scikit-Learn for Machine Learning Applications: Data Science Fundamentals with Python (2020), 137--163.Google ScholarCross Ref
- Meenal J Patel, Alexander Khalaf, and Howard J Aizenstein. 2016. Studying depression using imaging and machine learning methods. NeuroImage: Clinical 10 (2016), 115--123.Google ScholarCross Ref
- Alfredo Raglio, Marcello Imbriani, Chiara Imbriani, Paola Baiardi, Sara Manzoni, Marta Gianotti, Mauro Castelli, Leonardo Vanneschi, Francisco Vico, and Luca Manzoni. 2020. Machine learning techniques to predict the effectiveness of music therapy: A randomized controlled trial. Computer methods and programs in biomedicine 185 (2020), 105160.Google Scholar
- Piyush Rai, Abhishek Kumar, and Hal Daume. 2012. Simultaneously leveraging output and task structures for multiple-output regression. Advances in Neural Information Processing Systems 25 (2012), 3185--3193.Google Scholar
- Amanda L Rebar, Robert Stanton, David Geard, Camille Short, Mitch J Duncan, and Corneel Vandelanotte. 2015. A meta-meta-analysis of the effect of physical activity on depression and anxiety in non-clinical adult populations. Health psychology review 9, 3 (2015), 366--378.Google Scholar
- Douglas K Russell. 1996. The Boltzmann distribution. Journal of Chemical Education 73, 4 (1996), 299.Google ScholarCross Ref
- Yvan Saeys, Inaki Inza, and Pedro Larranaga. 2007. A review of feature selection techniques in bioinformatics. bioinformatics 23, 19 (2007), 2507--2517.Google Scholar
- Robert J Schalkoff. 2007. Pattern recognition. Wiley Encyclopedia of Computer Science and Engineering (2007).Google Scholar
- Fernando Seoane, Inmaculada Mohino-Herranz, Javier Ferreira, Lorena Alvarez, Ruben Buendia, David Ayllón, Cosme Llerena, and Roberto Gil-Pita. 2014. Wearable biomedical measurement systems for assessment of mental stress of combatants in real time. Sensors 14, 4 (2014), 7120--7141.Google ScholarCross Ref
- Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International Conference on Machine Learning. PMLR, 3145--3153.Google Scholar
- Robert L Spitzer, Kurt Kroenke, Janet BW Williams, and Bernd Löwe. 2006. A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of internal medicine 166, 10 (2006), 1092--1097.Google Scholar
- Andreas Ströhle. 2009. Physical activity, exercise, depression and anxiety disorders. Journal of neural transmission 116, 6 (2009), 777--784.Google ScholarCross Ref
- John WG Tiller. 2013. Depression and anxiety. The Medical Journal of Australia 199, 6 (2013), S28-S31.Google ScholarCross Ref
- Diane M Turner-Bowker, Martha S Bayliss, John E Ware, and Mark Kosinski. 2003. Usefulness of the SF-8™ Health Survey for comparing the impact of migraine and other conditions. Quality of Life Research 12, 8 (2003), 1003--1012.Google ScholarCross Ref
- John Wallert, Emelie Gustafson, Claes Held, Guy Madison, Fredrika Norlund, Louise von Essen, and Erik Martin Gustaf Olsson. 2018. Predicting adherence to internet-delivered psychotherapy for symptoms of depression and anxiety after myocardial infarction: machine learning insights from the U-CARE heart randomized controlled trial. Journal of medical Internet research 20, 10 (2018), e10754.Google ScholarCross Ref
- Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T Campbell. 2014. StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing. 3--14.Google ScholarDigital Library
- Rui Wang, Weichen Wang, Alex DaSilva, Jeremy F Huckins, William M Kelley, Todd F Heatherton, and Andrew T Campbell. 2018. Tracking depression dynamics in college students using mobile phone and wearable sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--26.Google ScholarDigital Library
- Leanne M Williams, Nicholas J Cooper, Stephen R Wisniewski, Justine M Gatt, Stephen H Koslow, Jayashri Kulkarni, Savannah DeVarney, Evian Gordon, and Augustus John Rush. 2012. Sensitivity, specificity, and predictive power of the "Brief Risk-resilience Index for Screening," a brief pan-diagnostic web screen for emotional health. Brain and behavior 2, 5 (2012), 576--589.Google Scholar
- Lan Yu, Daniel J Buysse, Anne Germain, Douglas E Moul, Angela Stover, Nathan E Dodds, Kelly L Johnston, and Paul A Pilkonis. 2012. Development of short forms from the PROMIS™ sleep disturbance and sleep-related impairment item banks. Behavioral sleep medicine 10, 1 (2012), 6--24.Google Scholar
- Yuezhou Zhang, Amos A Folarin, Shaoxiong Sun, Nicholas Cummins, Rebecca Bendayan, Yatharth Ranjan, Zulqarnain Rashid, Pauline Conde, Callum Stewart, Petroula Laiou, et al. 2021. Relationship Between Major Depression Symptom Severity and Sleep Collected Using a Wristband Wearable Device: Multicenter Longitudinal Observational Study. JMIR mHealth and uHealth 9, 4 (2021), e24604.Google Scholar
Index Terms
- Multi-Task Learning for Randomized Controlled Trials: A Case Study on Predicting Depression with Wearable Data
Recommendations
Designing Software for Online Randomized Controlled Trials
The Next Wave of Sociotechnical DesignAbstractResearchers in psychosocial care paid an increasing interest in providing treatment online, e.g., self-help and cognitive behavioral therapy for patients with different conditions. Consequently, they need to design both complex interventions and ...
Saliency-Regularized Deep Multi-Task Learning
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data MiningMulti-task learning (MTL) is a framework that enforces multiple learning tasks to share their knowledge to improve their generalization abilities. While shallow multi-task learning can learn task relations, it can only handle pre-defined features. ...
Comments