Decoding Heart Disease, A Comprehensive Data Analysis

Introduction Our project centers on modeling and predicting heart disease, a critical concern responsible for a significant global health burden. Heart attacks, in particular, hold a prominent place as a leading cause of mortality. According to the Centers for Disease Control and Prevention (CDC), the United States alone witnesses a heart attack-related death every 36 seconds.

Data Access We accessed pertinent data from the CDC website (Retrieved 21 April 2022, from https://www.cdc.gov/brfss/annual_data/annual_2020.html), freely available to all users. The dataset, containing a substantial 401,958 entries and 279 variables, stems from annual telephonic surveys. A potential bias introduced by oversampling a specific demographic segment was managed through weighted modeling, ensuring robust analyses.

Data Curation Our focus entailed distilling essential insights by filtering extraneous variables, emphasizing key health indicators and socioeconomic parameters. The curated dataset encompasses crucial attributes, including:

Heart Attack (Target variable) Coronary Heart Disease Stroke Incidents Gender (Sex) Geographic Region (State) Ethnicity (Race) Age Distribution Body Mass Index (BMI) Smoking Patterns Alcohol Consumption Asthma Prevalence Kidney Disease Incidence Cancer (All Types) Skin Cancer Reports Chronic Obstructive Pulmonary Disease (COPD) General Health Assessment Physical Well-being Mental Health Evaluation Diabetes Prevalence Exercise Habits Marital Status (Marital_Status) Income Levels (Income_Level)

Hypothesis Exploration Central to our investigation is the scrutiny of causal attributions (“factor X”) commonly linked to heart attacks. Our analytical arsenal, bolstered by diverse predictive methodologies, aims to unravel a set of influential variables—either singularly or synergistically—augmented by robust predictive models. Guided by the CDC’s comprehensive dataset, our pursuit is dedicated to a meticulous examination of our hypothesis.

Some of the models tested include: -logistic regression

Conclusion: Our scientific pursuit is marked by a resolute endeavor to decode the complexities of heart disease. Grounded in data and methodological rigor, we strive to unveil the underlying mechanisms of heart attacks. This undertaking holds profound implications, offering potential insights for enhanced prevention and care strategies.

Brook Tilahun
Brook Tilahun
Associate Sequencing Scientist II

My research interests include multi-omics analysis, single cell genomics and neurodegeneration.