Skip to content

🖥️🔬 Project employing data science and machine learning techniques to visualise and classify an open dataset of benign and metastatic breast cancer samples.

License

Notifications You must be signed in to change notification settings

magg01/Predicting_metastatic_breast_cancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting_metastatic_breast_cancer

A data science and machine learning project to visualise and classify an open dataset of benign and metastatic breast cancer samples.

Medical diagnosis is an area of intense research for machine learning algorithms. Currently, many diagnoses for conditions ranging from cancer to infections need a trained technician or medical professional to analyse image data, whether from xrays, scans, cell culture images or many other sources before a diagnosis can be determined. If machine learning algortithms and pipelines can be written to take either the raw images or processed data extracted from the raw images and determine a diagnosis at a rate at least as accurate as a trained professional then much time and many resources could be saved freeing up those highly trained individuals to perform other tasks.

The dataset I have chosen to work with was downloaded from the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic).

The objectives of the project are to take this dataset and perform the necessary pre-processing steps in order to get the data into the desired state for machine learning algorithm training. To perform statistical analysis on and analyse the distributions of the different features of the data. To identify how we can reduce the dimensionality of this dataset to its most critical components.

Finally to select, train and test a machine learning algorithm to distinguish between the metastatic and benign samples in the dataset and to generate a success score for said model. To be clear I will only be using data from within this dataset but through cross validation we will be able to train and test on different parts of the dataset.

Acknowledgments

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

About

🖥️🔬 Project employing data science and machine learning techniques to visualise and classify an open dataset of benign and metastatic breast cancer samples.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published