The purpose of this research is to use machine learning methods in the context of blood donor prognosis on his/her multiple donations. By using primary data from a mobile blood donation vehicle for people in Taiwan, the research aim is to improve the annual blood donations forecast and ensure a constant blood demand.
Our methodology involves:
- Data Inspection: This is the first step in cleaning the data where the dataset is checked to have an overview of its form and content.
- Data Loading: Cleaning the data and making it more suitable for analysis by dealing with the missing values and scaling the features.
- Model Selection: Applying feature selection and setting the best pipeline for the machine learning models with the help of TPOT.
- Model Evaluation: Methods of comparing models with the help of the Area Under the Receiver Operating Characteristic Curve (AUC).
- Normalization: Normalizing the training data by means of logarithm to enhance the model itself.
Hence, after normalisation, the value of AUC, associated with the model by using the logistic regression was 0. 7900 this is slightly better than the TPOT model that was 7400 making the predictions slightly better. Data pre-processing and choice of the right prediction model is highlighted in the results section as a very important factor in predictive analytics. This research improves the understanding of the donor behaviour and secures a stable blood source, moreover it may save more lives with the timely treatments.