Deep Learning - Driven Image Intelligence for Robotics and Clinical Diagnostics
No Thumbnail Available
Authors
Islam, Md Sazidul
Issue Date
Type
Thesis
Language
en
Keywords
Alternative Title
Abstract
This thesis presents a comprehensive investigation of deep learning applications across two critical domains: intelligent robotics and clinical diagnostics. The work addresses fundamental challenges in real-time human-robot interaction and medical image classification through practical, resource-efficient implementations.
In the robotics domain, we developed a real-time face recognition and control system for a Hexapod robot using a PyQt5-based GUI with offline voice feedback. Leveraging DeepFace with the FaceNet model, achieving 93.02% accuracy and 380-420ms response times on Raspberry Pi hardware. Our implementation demonstrates effective edge deployment of deep learning-based facial recognition combined with robot control, sensor monitoring, and obstacle detection through an intuitive graphical interface. The architecture employs multi-threaded processing and TCP/IP communication, with client-side GUI managing movement controls, sensor monitoring, and recognition operations, while the server-side handles hardware interfacing and command execution on the Raspberry Pi. This modular client-server design ensures scalability, maintainability, and responsive concurrent operations video streaming, face recognition, and robot control at a total hardware cost of approximately $150, democratizing advanced human-robot interaction for educational and research applications.
In the medical domain, we address three critical barriers to clinical AI adoption: lack of uncertainty quantification, poor minority class performance, and high computational requirements. Using the HAM10000 dataset (10,015 dermoscopic images across 7 diagnostic categories with 58:1 class imbalance), we developed an uncertainty-aware Swin Transformer system that achieves 87.82% test accuracy with 90.15% validation accuracy. Through Monte Carlo Dropout integration, our model provides confidence-calibrated predictions, achieving 97% accuracy on high-confidence cases (80% coverage) while flagging uncertain cases for expert review. A triple-strategy imbalance handling approach combining weighted sampling, class-weighted focal loss, and label smoothing yields an average minority class F1-score of 83.8%, with no class falling below 77%. Memory optimization techniques reduce peak VRAM usage to 8GB and training costs to $3.15, enabling deployment on consumer hardware.
Both systems demonstrate that sophisticated deep learning models can be deployed effectively on resource-constrained platforms while maintaining high performance, transparency, and accessibility. This work provides practical frameworks for trustworthy AI in educational robotics and clinical decision support, contributing methodologies applicable across diverse real-world applications.
Description
Citation
Publisher
Clayton State University
