SciELO - Scientific Electronic Library Online

 
 issue121Predilection of streams for water quality monitoring: an ecosystem provision service in the Itajaí-Mirim river basin (Brazil).Telework in times of pandemic: from advantages to uncertainties in the daily lives of families living in the Northern Lisbon Metropolitan Area, Portugal. author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Finisterra - Revista Portuguesa de Geografia

Print version ISSN 0430-5027

Abstract

GIOIA, Thamy Barbara; BARROS, Juliana Ramalho  and  SILVA, Renato Rodrigues da. Socioeconomic factors and machine learning algorithms applied to neglected diseases risk prediction. Case study in the municipalities of the Goiás State and Federal District, Brazil. Finisterra [online]. 2022, n.121, pp.109-123.  Epub Dec 31, 2022. ISSN 0430-5027.  https://doi.org/10.18055/finis28635.

Analyzing the relation between socioeconomic variables and neglected tropical diseases can help managers in the conception of public policies to reduce cases. The objective of this study was to evaluate, based on machine learning algorithms, which socioeconomic variables are more important for the risk classification of three neglected diseases: leprosy, cutaneous leishmaniasis, and dengue. Three algorithms based on decision trees were evaluated: Random Forest (RF), XGBoost, and C5.0. As a study area, the municipalities of the state of Goiás and of the Federal District - Brazil, were delimited. For the dengue risk classes, both the RF algorithm and the XGBoost showed accuracy values above 0.6. Both emphasizing the low-income conditions, literacy, and race as the most important predictive variables. In the leprosy risk classes case, the three algorithms presented accuracy results above 0.6, indicating the variables water supply, literacy, race, and housing as important. For the cutaneous leishmaniasis risk classes, the algorithms showed an accuracy lower than 0.4, making the evaluation of possible predictive variables to the model unfeasible. The three evaluated algorithms revealed approximate predictive performance; however, the RF was slightly higher. The most important socioeconomic variables for dengue and leprosy risk classes prediction were similar.

Keywords : Neglected tropical diseases; social determinants; XGBoost; Random Forest; C5.0.

        · abstract in Portuguese | French | Spanish     · text in English     · English ( pdf )