Evaluating model compression techniques for efficient edge AI deployment

Zungu, MLZwane, Skhumbuzo GAdigun, MO2026-02-252026-02-252025-11978-1-0492-3850-0http://hdl.handle.net/10204/14710Deploying artificial intelligence models on edge devices presents challenges due to limited computational power, memory, and energy. Model compression is essential to reduce the size and computational requirements of AI models, enabling their deployment on smartphones, IoT sensors, and other edge devices. This paper evaluates three major compression techniques which are quantization, pruning, and tensor decomposition, applied to the YOLOv8n model for object detection. The study compares their impact on accuracy, model size, inference speed, and energy efficiency. Findings indicate that quantization achieves the best balance by reducing size and improving inference speed with minimal accuracy loss, pruning yields high energy savings but at the cost of accuracy, and tensor decomposition provides a balanced trade-off. The research underscores the importance of compression in enabling real-time Edge AI applications and highlights the quantization model compression technique as the most suitable technique for efficient edge deployment.FulltextenEdge AIModel compressionPruningQuantizationTensor decompositionYOLOv8nEvaluating model compression techniques for efficient edge AI deploymentConference PresentationN/A