COMBINATION OF RETENTIVE NETWORKS AND VISIONS TRANSFORMERS FOR FACIAL EMOTION RECOGNITION IN IMAGE AND VIDEO

Authors

  • VELI DEMIR
  • AKUP GENÇ

Abstract

The field of video sentiment analysis has grown significantly with continuous advances in artificial intelligence (AI) and machine learning (ML). In this digital age, understanding and interpreting human emotions in videos is a rapidly developing field that continues to be a matter of deep interest.

The integration of Retentive Network and Vision Transformers has launched a new path in sentiment analysis from videos, showcasing extraordinary capabilities and potential over traditional models. This article discusses the remarkable advantages, groundbreaking results, and promising future that these AI models offer in the field of video sentiment analysis.

An illustrative comparative analysis is presented showing how the combination of Retentive Network and Vision Transformers outperforms other models in terms of accuracy, adaptability and scalability. Although the functionality of these AI models has so far been explored primarily in the context of images, the potential for application to video processing and more nuanced sentiment analysis is vast and exciting.

Index Terms— technological models known for their capabilities, primarily used for natural language processing tasks.

  1. Vision Transformers: Image processing models with improved adaptability to different input resolutions that provide scalability and efficiency.
  2. Video Sentiment Analysis: The process of examining, understanding and interpreting emotions found in video data using machine learning and artificial intelligence-supported tools.
  3. Natural Language Processing (NLP): The field of artificial intelligence that involves the interaction between computers and humans through natural language.
  4. Image Processing: The process of performing some operations on an image to obtain an improved image or extract some useful information from it.
  5. Artificial Intelligence (AI) and Machine Learning (ML): AI is a branch of computer science that emphasizes the development of intelligent machines that think and work like humans, while ML is a branch of AI where machines are given access to data and use that data to learn on their own.

Downloads

How to Cite

DEMIR, V., & GENÇ, A. (2025). COMBINATION OF RETENTIVE NETWORKS AND VISIONS TRANSFORMERS FOR FACIAL EMOTION RECOGNITION IN IMAGE AND VIDEO. TPM – Testing, Psychometrics, Methodology in Applied Psychology, 32(S7 (2025): Posted 10 October), 678–685. Retrieved from https://tpmap.org/submission/index.php/tpm/article/view/2226