Heritage Health Prize Competition

Introduction to HHP

The Heritage Health Prize is a $3 million reward for the team which can best “identify patients who will be admitted to a hospital within the next year, using historical claims data.” The purpose of this competition is apparent when considering that over $30 billion was spent on unnecessary hospital admissions alone in 2006. An accurate model would allow health care providers to administer more personalized care, thereby decreasing both these unnecessary hospital admissions and medical spending as a whole.

The prize is hosted by Kaggle, a website where teams of researchers tackle machine learning problems in a competitive environment. Kaggle provides relevant datasets as well as quantitative feedback for predictions made. The team that produces the most accurate model within the time frame wins the competition and is typically compensated in return for a description of their method.

Dataset

https://foreverdata.org/1015/index.html

(Due to Github's limitation, archived dataset have been store in data.zip. Please extract and save it in ./data folder)

Evaluation metric

To measure the performance of model, we use a score called RMSLE (Root Mean Squared Logarithmic Error):

Where:

i is a member;
n is the total number of members;
p is the predicted number of days spent in hospital for member i in the test period;
a is the actual number of days spent in hospital for member i in the test period.

Task Lists

1. Data preparation

Importing data
Data cleaning (Hanlding missing value, hanlding categorical and continuous data,...)
Merging files

2. Feature extraction

Handling outliners
Exploring data (Visualizations, Distribution of features,...)
Feature scaling (Standardisation, Mean Normalisation, Min-Max Scaling,... )

3. Predictive modelling

4. Models summary

Model cross-validation (Stratified K-Fold)
Hyperparameter tuning

5. Ensemble methods

Models selection (Observation models correlations)
Ensembling

Summary of Individual Predictors

Predictors	Score
Linear Regression	0.4850
Ridge Regression	0.4844
Support Vector Regression	0.4783
Random Forests	0.4846
Logistic Regression	0.5065
Neural Networks	0.48
Gradient Boosting Machines	0.4790

Ensemble method

Ensemble result: 0.00

Optimal constant value + Ensemble (Classification predictors): 0.4653

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
1.data-preparation		1.data-preparation
2.feature-extraction		2.feature-extraction
3.predictive-modelling		3.predictive-modelling
4.fine-tuning		4.fine-tuning
5.ensemble		5.ensemble
data		data
img		img
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data.zip		data.zip
ensemble.ipynb		ensemble.ipynb
submission.csv		submission.csv

truongkhanhduy95/Heritage-Health-Prize

Folders and files

Latest commit

History

Repository files navigation

Heritage Health Prize Competition

Introduction to HHP

Dataset

Evaluation metric

Task Lists

1. Data preparation

2. Feature extraction

3. Predictive modelling

4. Models summary

5. Ensemble methods

Summary of Individual Predictors

Ensemble method

About

Topics

Resources

Stars

Watchers

Forks

Languages