Internship – DCPT (Medical Image Classification Pipeline for Breast Cancer Research)

Internship – DCPT (Medical Image Classification Pipeline for Breast Cancer Research)
Project Owner
DCPT, Aarhus University Hospital
Developed
2024
Type
Full-Stack Application
Role
Machine Learning Engineer (Solo Developer)

Challenge

The objective was to build an automated system capable of processing and classifying large-scale chest image datasets collected from multiple hospitals for breast cancer treatment analysis.

Process

Dataset Analysis

Investigated imaging datasets from multiple hospitals to understand inconsistencies in file naming, metadata availability, and image quality.

Dataset Analysis

Pipeline Design

Designed modular Python pipelines responsible for preprocessing, metadata parsing, classification, and dataset restructuring.

Model Training

Trained and optimized a YOLOv8 classification model, experimenting with preprocessing and augmentation techniques to improve generalization.

Model Training

Validation & Testing

Evaluated model performance using precision, recall, and F1-score, followed by manual verification across datasets from different hospitals.

Validation & Testing

Results

This project produced a reliable machine learning pipeline for automated medical image classification and dataset organization. By standardizing datasets from multiple hospitals and automating processing workflows, the system provides a scalable foundation for future BCCT research.

Stack

Python
Python
YOLOv8
YOLOv8
PyTorch
PyTorch
NumPy
NumPy
Pandas
Pandas
OpenCV
OpenCV
Internship – DCPT (Medical Image Classification Pipeline for Breast Cancer Research) Project Result

Conclusion

The project has demonstrated how modern technologies such as Python, YOLOv8, PyTorch can be combined to create Internship – DCPT (Medical Image Classification Pipeline for Breast Cancer Research)—a scalable and modular system offering high precision and user-friendliness.