Deep Learning - Driven Image Intelligence for Robotics and Clinical Diagnostics

No Thumbnail Available

Authors

Islam, Md Sazidul

Issue Date

Type

Thesis

Language

en

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

This thesis presents a comprehensive investigation of deep learning applications across two critical domains: intelligent robotics and clinical diagnostics. The work addresses fundamental challenges in real-time human-robot interaction and medical image classification through practical, resource-efficient implementations. In the robotics domain, we developed a real-time face recognition and control system for a Hexapod robot using a PyQt5-based GUI with offline voice feedback. Leveraging DeepFace with the FaceNet model, achieving 93.02% accuracy and 380-420ms response times on Raspberry Pi hardware. Our implementation demonstrates effective edge deployment of deep learning-based facial recognition combined with robot control, sensor monitoring, and obstacle detection through an intuitive graphical interface. The architecture employs multi-threaded processing and TCP/IP communication, with client-side GUI managing movement controls, sensor monitoring, and recognition operations, while the server-side handles hardware interfacing and command execution on the Raspberry Pi. This modular client-server design ensures scalability, maintainability, and responsive concurrent operations video streaming, face recognition, and robot control at a total hardware cost of approximately $150, democratizing advanced human-robot interaction for educational and research applications. In the medical domain, we address three critical barriers to clinical AI adoption: lack of uncertainty quantification, poor minority class performance, and high computational requirements. Using the HAM10000 dataset (10,015 dermoscopic images across 7 diagnostic categories with 58:1 class imbalance), we developed an uncertainty-aware Swin Transformer system that achieves 87.82% test accuracy with 90.15% validation accuracy. Through Monte Carlo Dropout integration, our model provides confidence-calibrated predictions, achieving 97% accuracy on high-confidence cases (80% coverage) while flagging uncertain cases for expert review. A triple-strategy imbalance handling approach combining weighted sampling, class-weighted focal loss, and label smoothing yields an average minority class F1-score of 83.8%, with no class falling below 77%. Memory optimization techniques reduce peak VRAM usage to 8GB and training costs to $3.15, enabling deployment on consumer hardware. Both systems demonstrate that sophisticated deep learning models can be deployed effectively on resource-constrained platforms while maintaining high performance, transparency, and accessibility. This work provides practical frameworks for trustworthy AI in educational robotics and clinical decision support, contributing methodologies applicable across diverse real-world applications.

Description

Citation

Publisher

Clayton State University

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN

Collections