Keywords: Deep learning, vision transformer, fish species classification, convolutional neural network, intelligent automatic detection.
Abstract
As marine environments encounter escalating threats and obstacles, accurate and effective Fish Species Classification (FSC) has become crucial for managing fisheries, preserving biodiversity, and ecological surveillance. Considering the substantial volume of georeferenced fish photographs gathered daily by fishermen, artificial intelligence (AI) and computer vision (CV) technologies provide significant potential to automate their analysis via species recognition and classification. This study investigates utilizing Deep Learning (DL) techniques alongside appearance-based feature selection to automatically and precisely determine fish species from images. The research utilizes many aquatic fish images, including diverse species, sizes, and ecological settings. Conventional DL models struggle to capture long-term dependencies and necessitate fixed input sizes, rendering them less adaptable when processing images of varying dimensions. The Vision Transformer (VT) mitigates these limitations using the transformer model's Self-Attention Mechanisms (SAM). This paper employs a VT to address the FSC problem and provides Intelligent Automatic Detection and FSC in Marine Environment (IAD-FSC-ME). VT's efficacy is evaluated compared to pre-trained Convolutional Neural Network (CNN) models: VGG19, DenseNet121, ResNet50v2, InceptionV3, and Xception. The investigations utilize an open data set (Fish4Knowledge), wherein both the object detection and classification systems are enhanced with subtropical fish species of interest. It has been observed that VT surpassed the prevailing literature by attaining 99.14% accuracy in efficient FSC.