Table of Contents

Chapter 1 Introduction

1.1 Background

Emotion recognition is a growing and rapidly advancing area of study that has great potential in a wide range of fields, such as computer-human interaction, psycho- logical monitoring, and customised user experiences. Conventional approaches to emotion recognition have mainly depended on external indicators like facial expres- sions, speech, and physiological signals. These methods utilize observable behaviors to infer emotional states, which, while useful, often face limitations in accuracy and reliability. Observable behaviors can be easily masked, altered, or influenced by external factors, leading to inaccuracies in emotion detection.

1.2 Limitations of Traditional Methods

Traditional emotion recognition techniques often rely on facial expressions, verbal sounds, and physiological signals such as heart rate and pores. While these methods have been extensively researched and applied, they possess inherent limitations:

Facial Expressions: Facial expressions can be consciously controlled or sup- pressed, making it difficult to accurately gauge genuine emotional states. Cul- tural differences also play a significant role in how emotions are expressed
Speech: Emotional cues in speech can be ambiguous and influenced by con- text, language, and individual speaking styles. Background noise and speech disorders can further complicate accurate emotion
Physiological Signals: Physiological responses such as heart rate variability and skin conductance are influenced by a variety of factors beyond emotional states, such as physical activity and environmental

1.3 Emotions

Emotion plays an essential role in promoting communication and interaction among individuals. It has significant effects on the brain, which subsequently affects various cognitive and physiological processes either directly or indirectly [3, 4, 5, 6]. Emo- tions are an essential component of human existence, influencing our experiences, choices, social connections, and overall state of being. Consequently, comprehend- ing and precisely identifying emotions holds significant significance for a wide range of applications in personal and professional settings.

1.3.1 Emotion Models

Characterizing emotions is a complex task, primarily due to the challenge of dividing the emotional space. Emotions can be categorized into two main models: discrete and dimensional. Each model offers a different perspective on how emotions are represented and understood

Discrete Model

In 1971, Paul Ekman proposed a distinct structure of emotion, which highlights the cross-cultural universality of specific emotions. Ekman’s research involved analyz- ing the relationship between human emotions and facial expressions among subjects from diverse cultural backgrounds [7]. He proposed the presence of a universal set of fundamental emotions, which are recognizable regardless of cultural differences. These basic emotions include:

Surprise
Pleasure
Anger
Fear
Sadness
Disgust

These emotions are distinguished from each other through various physiological the- ories and are universally experienced and expressed in similar ways. According

to the discrete model, at any given time, if a person is experiencing an emotional change, it can be categorized as one of these basic emotions.

Dimensional Model

In 1999, James A. Russell introduced the dimensional model of emotion, which is based on a cognitive theory [8]. Unlike the discrete model, which identifies specific emotions, the dimensional model represents emotions on a two-dimensional plane. This model generalizes emotions by mapping them according to two key dimen- sions: valence and arousal.

Valence: This dimension measures the degree of positivity or negativity asso- ciated with an emotion. The spectrum spans from disagreeable (negative) to enjoyable (positive).
Arousal: This dimension represents the degree of activation or intensity of an emotion. The range of emotional states varies from tranquilly (minimal arousal) to excitement (strong arousal).

The combination of these two dimensions creates a comprehensive space where all emotions can be positioned. For instance, emotions like excitement are characterized by high arousal and positive valence, while emotions like sadness are characterized by low arousal and negative valence.

Figure 1.1 illustrates the valence-arousal emotion model with its two-dimensional space. In this model, each emotion can be plotted according to its valence and arousal levels, allowing for a nuanced representation of emotional states. The fig-

Figure 1.1: Valence-Arousal Emotion Model [1]

Figure 1.2 illustrates a color wheel that visually represents various emotions with their essential impacts, further demonstrating the dimensional model’s application.

These models provide different frameworks for understanding and categorizing emotions, with the discrete model focusing on universally recognized basic emo- tions and the dimensional model offering a more flexible representation of emotional states. Both models are widely utilized in research and have important consequences for disciplines such as neuroscience, psychology, and artificial intelligence.

1.4 Affective Computing

In the past few decades, affective computing has become an important area of re- search, with the goal of improving communication between machines and human.

Figure 1.2: Plutchik’s Wheel of Emotions [2] [9]. The main objective of this discipline is to create systems capable of identify- ing and reacting to human emotions, thus enhancing HCI. Due to advancements in HCI, it is now crucial for machines to understand and react to the emotional states of users. This ability is crucial for developing interfaces that are more intuitive and user-friendly.

1.4.1 Advancement and Significance

Affective computing aims to close the gap among human emotions and machine responses. The industry has experienced substantial expansion as a result of its capacity to enhance interactions by making them more authentic and effective. The capacity to perceive and comprehend emotions can have a substantial influence on

multiple areas, such as communication, planning, and learning.

1.5 Emotion Recognition

Emotion detection is a crucial element in the field of affective computing. This involves determining an individual’s emotional state by observing different signs that include facial expressions, voice pronunciation, body movements, and physiological signals. Out of these, physiological signals such as EEG are especially valuable because they are non-invasive and have a high temporal resolution [10, 11].

1.5.1 Emotion Detection Using EEG

The objective of this study is to develop a model that can accurately identify emo- tions by analysing EEG data. EEG signals provide a highly accurate assessment of brain activity, which makes them an outstanding tool for identifying emotional states. Through the analysis of these signals, it is feasible to categorise various emotions, which can subsequently be utilised to enhance the responsiveness and adaptability of human-computer interactions.

The Function and Placement of EEG Electrodes

In 1924, Hans Berger pioneered the development of EEG, a method that detects electrical activity in the brain. This activity, often referred to as the ”Berger wave,” represents the rhythmic alpha waves generated by neuron and synaptic excitations within the cerebral cortex. These electrical impulses are critical for understanding

the central mechanisms of brain function.

EEG works by measuring the electrical signals produced by neurons in the cerebral cortex. During an EEG recording session, electrodes are placed on the scalp following the internationally recognized 10-20 system, ensuring standardized and reliable data collection. This method enables the non-invasive monitoring of brain activity, making EEG a widely used and dependable technique for neurological analysis.

Recording Techniques: Invasive vs. Non-Invasive

EEG signals can be recorded using either invasive or non-invasive techniques. Inva- sive recordings involve placing electrodes directly on the cerebral cortex, providing high-precision data but requiring surgical procedures. Non-invasive recordings, on the other hand, capture brain activity from the scalp and are more commonly used due to their simplicity and safety

Frequency Bands of EEG Signals

The EEG signal encompasses a frequency range from 0.5 to 50 Hz. This range is divided into five distinct rhythms, each associated with different brain activities:

Delta () Waves: 5 – 4 Hz, associated with deep sleep stages.
Theta () Waves: 4 – 8 Hz, linked to light sleep, relaxation, and meditative
Alpha () Waves: 8 – 13 Hz, indicative of relaxed wakefulness and
Beta () Waves: 13 – 30 Hz, related to active thinking, problem-solving, and
Gamma () Waves: 30 – 50 Hz, connected to high-level information processing and cognitive functioning

These frequency bands provide a comprehensive framework for analyzing EEG data, enabling the identification of various cognitive and neurological states

Figure 1.3: EEG Rhythm

1.6 Advantages of EEG-Based Emotion Recognition

EEG data offers a promising alternative by providing a direct assessment of brain activity. EEG is a technique that detects the electrical signals produced by the brain. It does this by placing electrodes on the scalp, without the need for surgery. This rich source of information captures the underlying neural processes associated with emotional states, allowing for a more precise and reliable identification of emotions.

Table 1.1: Frequency bands of EEG Signal

Band	Frequency (Hz)	Interpretation
Delta	0.5 – 4	Delta waves are the slowest and highest in am- plitude. They are observed often in infants and are related to the deepest levels of relaxation. They generally present in the occipital and tem- poral lobes.
Theta	4 – 8	Theta waves are slow waves that present when a person is calm or sleepy.
Alpha	8 – 14	Alpha waves are higher-amplitude, medium fre- quency waves. They appear when a person is performing some brain activity.
Beta	14 – 30	Beta waves are low-amplitude, high-frequency waves. This wave reflects an attentive status of the brain and is generally seen when a person is in an excited state.
Gamma	30 – 50	Gamma waves are connected to high-level in- formation processing and cognitive functioning.

EEG-based emotion recognition leverages the nuances of brain activity to classify emotions, bypassing the limitations of external indicators.

EEG-based methods have several advantages:

Direct Measurement: EEG captures electrical activity directly from the brain, providing a more immediate and less interpretable measure of emotional states compared to external
Non-Intrusive: Modern EEG systems are relatively non-intrusive and can be used in various settings without causing significant discomfort to the
Rich Data Source: EEG data provides a comprehensive view of brain activity, enabling the detection of subtle and complex patterns associated with different emotional states.

1.7 Importance of Accurate Emotion Recognition

The importance of accurate emotion recognition cannot be overstated. In human- computer interaction, understanding user emotions can lead to more intuitive and responsive interfaces, enhancing user satisfaction and engagement. For instance, adaptive systems can modify their responses based on the user’s emotional state, providing a more personalized experience. This can be particularly beneficial in educational technologies, where understanding student emotions can lead to better- tailored learning experiences, or in customer service, where systems can respond empathetically to user frustration.

In mental health monitoring, emotion recognition can offer objective measures for diagnosing and tracking emotional disorders, such as depression and anxiety. By providing real-time insights into a patient’s emotional state, healthcare providers can tailor interventions more effectively, improving treatment outcomes. Emotion recognition can also play a role in detecting early signs of emotional distress, en- abling timely intervention and support.

1.8 Challenges in EEG-Based Emotion Recognition

Despite its potential, EEG-based emotion recognition faces substantial challenges. Traditional machine learning models and simpler neural networks often struggle to capture the complex spatial and temporal patterns inherent in EEG signals. These models typically rely on manually engineered features and basic temporal analyses, leading to suboptimal accuracy and robustness.

Key challenges include:

Complex Patterns: EEG signals are characterized by their complex spatial and temporal patterns, which are influenced by various physiological and psy- chological factors. Capturing these patterns accurately is crucial for effective emotion recognition.
Feature Extraction: Manually engineering features from EEG data is a labor- intensive process that may not fully capture the richness of the Traditional methods often fall short in representing the intricate dynamics of brain activity.
Temporal Dependencies: Emotions are not static and evolve over time. Mod- els need to account for these temporal dependencies to accurately track and classify emotional states.
Data Variability: EEG signals can vary significantly between individuals due to differences in brain structure, physiology, and environmental Models must be robust enough to handle this variability.

1.9 Need for Advanced Models

Therefore, there is a need for advanced models that can automatically extract and integrate these intricate patterns to improve emotion recognition performance. Deep learning methods, especially CNN and RNN based approaches, present encouraging answers to these problems. While RNNs, particularly LSTM networks, are appro- priate for capturing temporal dependencies, CNNs are excellent at obtaining spatial features using EEG spectrograms.

By combining these architectures, it is possible to develop a model that effec- tively captures both the spatial and temporal patterns in EEG data, leading to more accurate and robust emotion recognition. Additionally, incorporating advanced acti- vation functions such as Swish can enhance model performance by improving gra- dient flow and convergence.

1.10 Motivation

The motivation behind this research stems from the significant potential of EEG- based emotion recognition to enhance various applications. In human-computer in- teraction, accurately recognizing users’ emotional states can lead to more intuitive and responsive interfaces. In mental health monitoring, it can provide objective mea- sures for diagnosing and tracking emotional disorders, offering valuable insights for therapeutic interventions. Additionally, personalized user experiences can be signif- icantly improved by adapting systems based on real-time emotional feedback.

The current landscape of emotion recognition using EEG data presents several challenges. Current models frequently lack the complexity necessary to fully rep- resent the complex temporal and spatial patterns found in EEG signals. Traditional machine learning approaches depend heavily on feature engineering, which may not fully represent the complexity of EEG data. Simpler neural networks, while au- tomating feature extraction, often fail to account for the temporal dependencies crit- ical for understanding emotional states over time. These limitations result in lower accuracy and limited generalizability of the models

To address these challenges, there is a pressing need for advanced models that can seamlessly integrate spatial and temporal information from EEG data. In order to close this gap, the proposed research will create a hybrid model which combines the advantages of RNNs and CNNs. This approach leverages CNNs for spatial fea- ture extraction from EEG spectrograms and RNNs, particularly LSTM networks, for capturing temporal dependencies. By doing so, the model aims to achieve higher accuracy and robustness in emotion recognition.

1.11 Objective

The main goal of this work is to combine CNN and RNN to create a reliable model for emotion recognition from EEG data. The goal of this hybrid model is to combine the best aspects of both architectures: RNNs, especially LSTM networks, for cap- turing temporal dependencies, and CNNs for extracting spatial features using EEG spectrograms. Furthermore, it is suggested to use Swish activation functions in order to improve the model’s performance.

The specific goals of this study are:

To preprocess and normalize EEG data, transforming it into a suitable format for model training.
To design and implement a hybrid CNN-RNN-Bi-LSTM-Attention model for emotion recognition.
To train and optimize the model using a comprehensive EEG dataset, ensuring high accuracy and robustness.
To evaluate the model’s performance against baseline models and analyze its strengths and limitations.

1.12 Overview of the Project

In this project, we have developed a novel approach to emotion recognition utilizing a hybrid model that integrates CNN and RNN. The model leverages the capabilities of CNNs to extract spatial features from EEG spectrograms, which are graphical representations of EEG signals over time. These spatial features capture the complex patterns inherent in EEG data.

Following the extraction of spatial features, the model employs LSTM layers to capture temporal dependencies. Because LSTM networks can maintain long- term dependencies and handle the vanishing gradient issue that traditional RNNs frequently face, they are particularly suitable for the purpose. By combining these two architectures, the model can effectively capture both spatial and temporal pat- terns in the EEG data, leading to more accurate emotion recognition.

Additionally, the model incorporates Swish activation functions to enhance per- formance. Swish, a self-gated activation function, has been shown to outperform traditional activation functions like ReLU in various deep learning tasks. Its smooth and non-monotonic properties contribute to improved gradient flow and model con- vergence, resulting in better performance.

We have implemented a comprehensive preprocessing pipeline to ensure the quality and uniformity of the input data. This involves normalizing the EEG signals,

generating spectrograms to capture the time-frequency representation, and mapping labels based on valence, arousal, and dominance ratings. These preprocessing steps are crucial for improving the reliability and accuracy of the emotion recognition process.

The hybrid CNN-RNN-Bi-LSTM-Attention model was trained on an extensive EEG dataset, and hyperparameters were optimized through cross-validation. The model was evaluated against baseline models, showing a significant improvement in accuracy and robustness. The results indicate that the model effectively captures complex patterns in EEG data, leading to a recognition accuracy of 92

The discussion highlights the model’s durability and its capacity for utilization in real-time monitoring of emotions and diagnosing mental health disorders. Fu- ture endeavors will focus on enhancing the model for immediate application and improving its overall efficiency.

1.13 Thesis Organization

This thesis is organized into six chapters, each detailing a critical aspect of the re- search:

Chapter 1: Introduction: Provides an overview of the background, motiva- tion, and objectives of the study.
Chapter 2: Literature Review: Reviews existing research on emotion recog- nition using EEG data, highlighting key studies and identifying research
Chapter 3: Proposed Methodology: Describes the proposed hybrid model, including data preprocessing, model architecture, and training
Chapter 4: Experimental Setup and Analysis: Details the dataset used, pre- processing steps, experimental setup, and analysis of
Chapter 5: Results and Discussion: Presents the experimental results, com- pares them with baseline models, and discusses the implications and limita-
Chapter 6: Conclusion and Future Work: Summarizes the findings, draws conclusions, and suggests directions for future

This structured approach ensures a comprehensive exploration of the research prob- lem, providing valuable insights into the potential of EEG-based emotion recogni- tion.

Chapter 2 Literature Review

2.1 Introduction

Emotion recognition using EEG signals has become a significant area of research, offering profound insights into the neural correlates of emotional states. This chap- ter reviews existing literature on EEG-based emotion recognition, highlighting key studies, methodologies, and findings. The objective is to identify research gaps and establish a foundation for the proposed hybrid CNN-RNN-Bi-LSTM-Attention model.

2.2 Overview of EEG-Based Emotion Recognition

EEG-based emotion recognition leverages the electrical activity of the brain to iden- tify emotional states. This approach provides a direct and non-invasive method of assessing brain activity, making it a valuable tool for various applications, includ- ing mental health monitoring, human-computer interaction, and personalized user experiences.

2.3 Key Studies and Methodologies

2.3.1 Deep Learning Approaches

Algarni et al. (2022) explore a deep learning-based approach for emotion recogni- tion using Bi-LSTM networks. Their methodology involves the utilization of EEG signals to train the Bi-LSTM model, achieving high accuracy in detecting arousal, valence, and liking emotions. The study demonstrates the effectiveness of deep learning techniques in capturing complex temporal patterns in EEG data. The re- searchers highlight that traditional models often fail to account for the bidirectional nature of emotional experiences, where past and future contexts are equally impor- tant. By employing Bi-LSTM, they could effectively model these dependencies, leading to a significant improvement in recognition performance. The study also discusses the pre-processing steps involved, including normalization and artifact re- moval, which are crucial for ensuring the quality of the input EEG data [12].

Bazgir et al. (2018) present a robust emotion recognition system based on the valence/arousal model. The researchers employed advanced signal processing tech- niques, including DWT and PCA, to extract features from EEG signals. Their system demonstrated exceptional precision using classifiers such as SVM, KNN, and ANN. The use of DWT allowed for the decomposition of EEG signals into different fre- quency bands, capturing both time and frequency domain features. PCA helped in reducing the dimensionality of the feature space, minimizing computational complexity while retaining critical information. This combination of techniques enabled the system to achieve high accuracy and robustness in emotion classification [13].

Houssein et al. (2022) provide a comprehensive review of EEG-based emo- tion recognition methods, focusing on feature extraction and machine learning tech- niques. They highlight the potential of deep learning algorithms to address chal- lenges in emotion recognition, emphasizing the need for multi-channel EEG signal analysis to capture the complexities of emotional states. The review covers various deep learning architectures, including CNNs and RNNs, and their applications in emotion recognition. The authors discuss the strengths and limitations of different approaches, providing insights into how deep learning can automate feature extrac- tion and improve the representation of spatial and temporal patterns in EEG data. Their findings underscore the importance of integrating multiple channels and using sophisticated models to enhance recognition performance [14].

2.3.2 Feature Extraction and Classification

Liu et al. (2021) review methodologies for emotion recognition using EEG sig- nals, emphasizing the importance of feature extraction and classification techniques. They delve into a number of strategies, including modern deep learning architectures and conventional machine learning techniques, highlighting the necessity of reliable feature extraction to increase the accuracy of emotion recognition. The review high- lights that manually engineered features often fail to capture the full complexity of EEG signals. Instead, automated feature extraction methods that leverage the hier- archical structure of deep learning models can provide a more comprehensive representation of the data. The authors also explore different classification techniques, such as SVMs, decision trees, and neural networks, comparing their effectiveness in various emotion recognition tasks [15].

Gannouni et al. (2021) introduce novel methods for enhancing emotion recog- nition using EEG signals. Their research focuses on the adaptive selection of chan- nels and the identification of epochs during emotional states, showcasing promis- ing outcomes in identifying emotions across diverse categories. The authors pro- pose an adaptive channel selection method that dynamically selects the most rele- vant EEG channels for emotion recognition, thereby improving classification perfor- mance. Additionally, their epoch identification technique helps in isolating the most informative segments of EEG data, further enhancing the recognition accuracy. This approach addresses the challenge of data variability by focusing on the most sig- nificant features and reducing noise, leading to more accurate and reliable emotion classification [16].

2.3.3 Applications in Mental Health and Human-Computer Interaction

Al-Nafjan et al. (2017) classify human emotions from EEG signals using a deep neural network. Their study demonstrates the application of EEG-based emotion recognition in mental health monitoring, providing insights into emotional disorders and potential interventions. The authors highlight the potential of EEG-based sys- tems to offer objective measures for diagnosing and tracking emotional disorders, such as depression and anxiety. Their method, which uses deep neural networks to classify different emotional states with high accuracy, is a useful tool for mental

health experts. The study also discusses the practical implications of using EEG- based emotion recognition in clinical settings, including the potential for real-time monitoring and intervention [17].

Du et al. (2020) provide a technique for recognising emotions in EEG signals to evaluate activities. The study highlights the application of EEG-based emotion recognition in human-computer interaction, emphasizing its potential to enhance user experiences in gaming environments. The authors develop a system that cap- tures players’ emotional states in real-time, allowing game developers to adapt game dynamics based on the detected emotions. This approach not only improves user en- gagement but also provides valuable feedback for game design and development. The study also explores the technical challenges involved in implementing real-time emotion recognition, such as latency and computational efficiency, and proposes so- lutions to address these issues [18].

Mudgal et al. (2020) discuss the advancements in BCIs and their applications in neurosciences, including emotion recognition. They address the challenges and potential of BCIs in various fields, underscoring the significance of accurate emo- tion recognition for improving user interaction. The authors review different BCI systems and their applications, highlighting the role of EEG-based emotion recog- nition in enhancing the functionality and user experience of BCIs. Their discussion points to the need for more sophisticated models that can accurately decode emo- tions from EEG signals, providing a foundation for developing more intuitive and responsive BCIs. The study also considers the ethical and practical considerations of using BCIs for emotion recognition, such as privacy and user consent [19]

Table 2.1: Summary of Key Studies on EEG-Based Emotion Recognition

Citation

Title

Methodology Used

Limitations

[12]

Deep Learning-Based

Approach for Emotion Recognition Using EEG Signals Using Bi-LSTM

Bi-LSTM network, EEG signals,

normalization, artifact removal

Limited by the need for

extensive preprocessing and potential overfitting

[13]

Emotion Recognition

with Machine Learning Using EEG Signals

DWT, PCA, SVM, KNN, ANN

High computational

cost due to feature extraction and dimen- sionality reduction

[14]

A comprehensive ex-

amination of human emotion detection from EEG-based brain- computer interface employing machine learning

Review of feature extraction, multi-

channel EEG analysis, deep learn- ing algorithms

General review, lacks

specific implemen- tation details and experimental validation

[15]

Review on Emotion

Recognition Based on Electroencephalogra- phy

Review of traditional and deep

learning methods, feature extrac- tion techniques

Focused more on the-

oretical aspects, lim- ited practical applica- tions discussed

[16]

Emotion detection us-

ing electroencephalog- raphy signals and a zero-time windowing- based epoch estimation and relevant electrode identification

Adaptive channel selection, epoch

identification, diverse emotion cat- egories

Potentially com-

plex implementation, requires high computa- tional resources

[17]

Classification of Hu-

man Emotions from (EEG) Signal using Deep Neural Network

Deep neural network, EEG signals

for mental health monitoring

Limited by the need for

large datasets and po- tential overfitting

[18]

An Emotion Recogni-

tion Method for Game Evaluation Based on Electroencephalogram

Real-time emotion detection, game

dynamics adaptation, EEG signals

Challenges in real-time

implementation, poten- tial latency issues

[19]

Brain computer inter-

face advancement in neurosciences: applica- tions and issues

Review of BCI systems and appli-

cations, EEG-based emotion recog- nition

Lacks detailed imple-

mentation and experi- mental results, focuses on general overview

2.4 Challenges in EEG-Based Emotion Recognition

Despite significant advancements, EEG-based emotion recognition faces several chal- lenges. Accurate emotion recognition depends on being able to capture the complex temporal and spatial patterns observed in EEG signals. Traditional models often fail to represent these intricate dynamics adequately. Manually engineering features from EEG data is labour-intensive and may not fully capture the richness of the data. Advanced feature extraction techniques are needed to improve model performance.

Emotions evolve over time, necessitating models that can account for these temporal dependencies. RNNs, particularly LSTM networks, are well-suited for this task but require sophisticated training and optimization. Additionally, EEG signals vary significantly between individuals due to differences in brain structure, phys- iology, and environmental factors. Models must be robust enough to handle this variability to generalize well across different subjects.

2.5 Addressing the Challenges: Advanced Models and Techniques

To address these challenges, recent studies have proposed various advanced models and techniques. Combining CNNs and RNNs has shown promise in capturing both spatial and temporal patterns in EEG data. For instance, Li et al. (2016) developed a convolutional recurrent neural network for emotion recognition, demonstrating improved performance by leveraging both architectures [20].

A hybrid model of GCNs as well as LSTM for EEG emotion detection was proposed by Yin et al. (2020), demonstrating the possibility of fusing various neu- ral network architectures to improve model performance. Their approach leverages the strengths of GCNs in capturing spatial dependencies and LSTMs in modeling temporal dynamics, resulting in improved emotion recognition accuracy [21].

Integrating EEG data with other physiological signals can improve emotion recognition accuracy. Ranganathan et al. (2016) explored multimodal emotion recognition using deep learning architectures, demonstrating the benefits of com- bining EEG with other data sources. Their multimodal approach integrates EEG with physiological signals such as heart rate and skin conductance, providing a more comprehensive understanding of emotional states and improving classification per- formance [22].

Incorporating advanced activation functions such as Swish has been shown to enhance model performance. Swish improves gradient flow and convergence, lead- ing to better learning outcomes. Studies by Algarni et al. (2022) and Bazgir et al. (2018) have demonstrated the way Swish works to enhance deep learning models’ ability to recognise emotions [12, 13].

2.6 Gaps in Research and Future Directions

While significant progress has been made, several research gaps remain. Most stud- ies focus on offline analysis of EEG data, highlighting the need for models that can perform real-time emotion recognition with minimal latency. Addressing class imbalance in EEG datasets is crucial for improving model performance across all

emotion categories. Methods like weighted loss functions and data augmentation can help reduce this problem.

Ensuring that models generalize well across different subjects and conditions is a significant challenge. Future research should focus on developing robust mod- els that can handle variability in EEG signals. Combining EEG data with other physiological signals, such as heart rate and skin conductance, can provide a more comprehensive understanding of emotional states. Future studies should explore multimodal approaches to enhance emotion recognition accuracy.

Chapter 3

3.1 Overview

Current models for emotion recognition using EEG data frequently face numerous challenges, primarily because they are unable to fully harness the intricate and multi- faceted characteristics of EEG signals. Conventional machine learning models often rely on manually designed features, which might not accurately represent the com- plex spatial and temporal patterns inherent in EEG data. These manually engineered features typically include statistical measures such as mean, variance, and frequency domain features. While useful, these features may fail to capture subtle yet impor- tant patterns in the data.

Furthermore, less complex neural networks, although they automate the process of feature extraction, frequently neglect the temporal relationships that are essential for comprehending emotional states over time. Basic neural networks might learn from raw data but often lack the depth required to capture intricate spatial charac- teristics. Additionally, they may not possess the necessary structure to accurately

represent temporal dependencies, which are crucial for understanding how emotions evolve.

These constraints result in suboptimal performance, characterized by reduced accuracy and a limited ability to generalize to novel data. The inability to cap- ture the full spectrum of spatial and temporal features in EEG signals means that conventional models may overlook critical information needed for precise emotion recognition. Consequently, the performance of these models is often compromised, leading to inaccuracies in identifying emotional states and challenges in applying the models to new and diverse datasets.

In summary, the limitations of current models for emotion recognition using EEG data stem from their reliance on manually crafted features and the insufficient complexity of basic neural networks. These factors hinder the ability to fully exploit the rich, multi-dimensional nature of EEG signals, resulting in less accurate and less generalizable emotion recognition systems.

Emotion detection, commonly referred to as facial emotion recognition, is a captivating field within artificial intelligence and computer vision. It entails the identification and interpretation of human emotions based on facial expressions. The accuracy and efficacy of emotion detection have far-reaching applications in various domains, including human-computer interaction, customer feedback analysis, and mental health monitoring. CNNs and RNNs have emerged as transformative tools in this domain, significantly enhancing our ability to decode and interpret emotional cues from facial images.

3.2 Proposed Hybrid Model

In order to address these difficulties, we suggest employing a hybrid model that in- tegrates CNNs and RNNs. This approach will allow us to accurately record and analyse both the spatial and temporal characteristics of EEG data. The CNN module examines the spatial characteristics of EEG signal spectrograms by utilising its ca- pacity to independently obtain spatial hierarchies from the input data. CNNs exhibit exceptional efficiency in analysing data representations that resemble images, such as spectrograms, owing to their utilisation of convolutional filters. The filters in the network detect local patterns, including edges, textures, and complex shapes, as the network goes through its layers.

On the other hand, the LSTM layers in the RNN model are specifically de- signed to handle sequential data, allowing them to effectively capture the temporal relationships found in EEG signals. LSTMs are equipped with gating mechanisms that enable them to store information for long durations by regulating the flow of information. This makes them appropriate for analysing sequences that occur over time.

In addition, we utilise Swish activation functions to optimise the performance of the model by enhancing the flow of gradients and facilitating convergence during the training process. The Swish activation function has been demonstrated to surpass conventional activation functions such as ReLU by enabling a more seamless gradi- ent flow. Our proposed hybrid model combines the strengths of CNNs and RNN net- works. By using advanced activation functions, our model aims to provide accurate and reliable analysis of EEG data. It effectively handles the spatial and temporal complexities that are inherent in these signals.

3.3 CNN Model

The CNN model is a powerful deep learning architecture designed to categorize ob- jects into distinct classes based on their spatial properties [23]. CNNs are particularly effective for evaluating and processing image data, making them suitable for appli- cations such as image and video recognition, decision support, picture classification, segmentation, computer vision, text analysis, central nervous system linkages, and economic time series analysis.

3.3.1 Key Features and Structure of CNNs

Shared-Weight Design

CNNs utilize a shared-weight architecture, where the same weights (or filters) are applied across different parts of the input image. This design mimics the Fourier transform by scanning the input data with convolutional filters, capturing transla- tionally invariant features. The shared weights help reduce the number of param- eters and computational complexity, enabling CNNs to learn efficiently from large datasets

Multilayer Perceptrons (MLPs)

CNN variants include multilayer perceptrons, which are fully connected neural net- works where each neuron in one layer connects to every neuron in the next layer. Although MLPs are powerful, they often suffer from overfitting, especially with large input sizes like images. Regularization techniques such as dropout and weight decay are employed to mitigate this issue.

Feature Extraction with Convolutional Layers

The core component of a CNN is the convolutional layer, which consists of a se- ries of filters (or kernels) that slide over the input image to extract spatial features. Each filter is convolved with the input image to produce a feature map, highlighting specific patterns such as edges, textures, and more complex shapes. Convolutional layers enable the network to capture hierarchical features, from low-level details in initial layers to high-level abstractions in deeper layers.

Figure 3.1: Overview of the CNN architecture

Pooling Layers

To reduce the spatial dimensions of feature maps and the number of parameters, pooling layers are introduced. Pooling layers perform down-sampling operations, such as max pooling or average pooling. Max pooling selects the maximum value from a group of activations, while average pooling computes the mean value. Pool- ing layers help to make the feature maps invariant to small translations and distor- tions in the input image.

Input Representation and Processing

CNNs process input images represented as 3D tensors with dimensions correspond- ing to the number of samples, height, width, and channels (e.g., RGB channels for colour images). The convolutional layers apply filters to these input tensors, creating feature maps with reduced dimensions. The hyperparameters of the convolutional layers, such as filter size, stride, and padding, are carefully chosen to balance com- putational efficiency and feature extraction quality.

Fully Connected Layers and Activation Functions

After the convolutional and pooling layers, the network typically includes fully con- nected layers that integrate the extracted features to perform classification. These layers use activation functions such as ReLU to introduce non-linearity and im- prove the model’s ability to learn complex patterns. Regularization techniques, like dropout, are applied to prevent overfitting and improve generalization.

3.3.2 Convolutional Layer Mechanics

Receptive Field

Each convolutional neuron processes a small region of the input image, known as the receptive field. This localized processing allows CNNs to detect specific features within confined areas of the image, which are then combined across layers to form a comprehensive representation

Weight Sharing and Local Connectivity

By sharing weights across different regions of the input image, CNNs reduce the number of parameters and enhance learning efficiency. Local connectivity ensures that each neuron focuses on a specific spatial region, contributing to the hierarchical feature extraction process.

Filter and Feature Map Generation

Filters are small-sized matrices that slide over the input image, performing element- wise multiplication and summation to produce feature maps. These feature maps highlight the presence of specific patterns, such as edges or textures, within the input image

3.3.3 Pooling Layers

Max Pooling

Max pooling selects the maximum value from each region of the feature map, effec- tively reducing the dimensionality while retaining the most important features. This operation provides translation invariance and robustness to minor variations in the input image

Average Poolinng

Average pooling computes the average value within each region of the feature map, offering a smoother and more generalized feature representation. Although less commonly used than max pooling, average pooling can be beneficial in certain sce- narios.

3.3.4 Regularization Techniques

Dropout

Dropout is a regularization technique that randomly sets a fraction of neurons to zero during training, preventing the network from relying too heavily on specific neurons and promoting better generalization.

Weight Decay

Weight decay adds a penalty term to the loss function, proportional to the sum of the squared weights,

encouraging the network to learn smaller weights and reducing overfitting.

3.3.5 Advanced Architectures

FCNs

By adding convolutional layers in place of fully connected layers, FCNs expand CNNs’ capabilities to handle input images of any size and perform operations like image segmentation

Residual Networks (ResNets)

ResNets introduce shortcut connections that bypass one or more layers, allowing gradients to flow more easily during training and enabling the construction of very deep networks without the vanishing gradient problem.

Global and Local Pooling

Global pooling computes a single value (e.g., max or average) across the entire fea- ture map, reducing the data to a single vector. Local pooling, in contrast, computes values over smaller regions, preserving more spatial information.

3.4 RNN Model

RNNs address the disadvantages of FFNNs, specifically their inefficiency in pro- cessing sequential data [24]. RNNs, unlike FFNNs, are specifically designed to address the limitations of processing only the current input without taking into ac- count previous inputs. Sequential data analysis is achieved by integrating present and previous inputs, which is made possible by their internal memory. Figure 3.2 depicts the differences in information propagation between RNNs and FFNNs.

Figure 3.2: Recurrent neural network and feed-forward neural network

RNNs are utilised in various domains including image captioning [25], time se- ries prediction [26], machine translation [27], as well as natural language processing (NLP) [28, 29]. The networks are classified into four categories: one-to-one, many- to-one, one-to-many, and many-to-many [30]. Figure 3.3 classifies these types of RNNs, while Fig

3.5 Model Architecture of Hybrid Model

The algorithm depicted in Figure ?? provides a comprehensive breakdown of the precise steps involved in the proposed hybrid CNN-RNN model for emotion recog- nition utilising EEG data. The algorithm initiates by preprocessing the data, which involves normalising the EEG signals and converting them into spectrograms. Sub- sequently, valence, arousal, and dominance ratings are associated with emotion la- bels through mapping. The model is built by combining convolutional layers to abstract spatial features as well as the Bidirectional LSTM layers to capture tempo- ral dependencies. The last layers are completely connected and employ the Swish activation function to improve performance. The training process incorporates early stopping and model checkpointing, while the evaluation step employs accuracy, con- fusion matrix, classification report, and F1 scores to assess the model’s efficacy.

3.5.1 Input Representation

The input to our model is structured as a 3D tensor representing spectrograms of EEG signals, denoted as X ∈ RN×T ×F , where:

N is the number of samples,
T is the number of time steps,
F is the number of frequency

This representation allows the model to capture both temporal and frequency-domain characteristics of EEG signals simultaneously.

3.5.2 Convolutional Layers

Convolutional layers are pivotal in extracting spatial features from the spectrograms. The process begins with the application of k filters of size ( f , f ) with a stride:

Z₁ = Conv2D(X,W₁) + b₁

where W₁ and b₁ are the weights and biases of the first convolutional layer, respec- tively. The Swish activation function is then applied:

A₁ = Z₁ · σ(Z₁)

Algorithm 1 Proposed Hybrid CNN-RNN Model for Emotion Recognition 1: Input: EEG data X RN×T ×F , Valence, Arousal, Dominance ratings 2: Output: Emotion labels Y

∈

3: Step 1: Data Preprocessing

4: for each EEG sample Xi do

5: Normalize EEG signal

6: Generate spectrogram Si from Xi

7: Map Valence, Arousal, Dominance ratings to emotion label yi

8: end for

9: Step 2: Model Architecture

10: Convolutional Layers

←

11: Z1 Conv2D(S,W1, b1)

← · { }

12: A1 Z1 σ(Z1) Swish Activation

←

13: P1 MaxPooling2D(A1)

←

14: Z2 Conv2D(P1,W2, b2)

← · { }

15: A2 Z2 σ(Z2) Swish Activation

←

16: P2 MaxPooling2D(A2)

←

17: P2 Dropout(P2, 0.1)

18: Recurrent Layers 19: F ← Flatten(P₂) 20: R ← Reshape(F)

21: Ht ← LSTM(Rt,Wh,Uh, bh)

22: →−Ht ← LSTM(Rt )

←

23: ←H−t LSTM(Rt )

← { }

24: Ht [→−Ht , ←H−t ] Bidirectional LSTM

25: Fully Connected Layers

← ⊕

26: D1 Dense(F Ht,Wd, bd )

← · { }

27: Ad D1 σ(D1) Swish Activation

←

28: Ad Dropout(Ad, 0.2)

29: Output Layer

←

30: Y Softmax(D₂)

31: Step 3: Training and Optimization

32: Compile model with Adam optimizer and categorical cross-entropy loss

33: Train model with early stopping and model checkpointing

34: Step 4: Evaluation

Here, σ(Z₁) denotes the sigmoid activation function, enhancing the non-linear trans- formations of the convolutional outputs. Subsequent convolutional layers follow a

similar structure, where each layer’s output Z_i₊₁ serves as the input to the next layer after applying convolution and activation:

Z_i₊₁ = Conv2D(A_i,W_i₊₁) + b_i₊₁ Ai+1 = Zi+1 · σ(Zi+1)

3.5.3 Max-Pooling Layers

Max-pooling layers are incorporated after each convolutional layer to reduce spatial dimensions:

P_i = MaxPooling2D(A_i)

Pooling helps in retaining essential features while reducing computational complex- ity and preventing overfitting.

3.5.4 Recurrent Layers

Upon processing through the convolutional layers, the output is transformed into a flattened and restructured format suitable for input to the LSTM layers. This transformation ensures that the LSTM can effectively capture temporal relationships within the data. The modified input R is fed into the LSTM layer as follows:

H_t = LSTM(R_t,W_h,U_h, b_h)

In this context, H_t represents the hidden state at time t, W_h represents the weights,

U_h represents the recurrent weights, and b_h represents the biases of the LSTM layer.

We also use a Bidirectional LSTM to capture dependencies in both forward and backward directions:

→−H_t = LSTM(R_t)

←H−t = LSTM(R_t)

H_t = [→−H_t, ←H−t ]

This bidirectional approach ensures that the network can efficiently utilise context from both preceding and succeeding states, leading to improved understanding and prediction of temporal sequences in EEG data.

3.5.5 Fully Connected Layers

The results of the CNN and LSTM layers are merged and then used as input for fully connected layers.

F = Flatten(P)

D₁ = Dense(F ⊕ H,W_d, b_d) A_d = D₁ · σ(D₁)

By incorporating both spatial and temporal features, these layers allow the model to learn complex relationships between the input features and the output classes.

3.5.6 Output Layer

The ultimate dense layer produces the probabilities for emotion classification using a SoftMax activation function:

Y = Softmax(D₂)

The SoftMax function guarantees that the output values fall within the range of [0, 1] and add up to 1, which can be interpreted as probabilities.

3.5.7 Training and Optimization

The model is compiled with the categorical cross-entropy loss function and opti- mized using the Adam optimizer:

−

L = ∑ y_i log(y^∧i )

i=1

where y_i is the true label and y^∧i is the predicted probability.

Early stopping and model checkpointing are utilized to mitigate overfitting and ensure the preservation of the optimal model:

Early_Stopping(monitor = ’val_loss’,

patience = 30, restore_best_weights = True)

Early stopping is a method employed to halt the training process when the val- idation loss ceases to improve, hence preventing overfitting. Model checkpointing

Figure 3.6: Architecture of the proposed CNN-RNN hybrid model is a method that preserves the most optimal model at various stages of the training process. This guarantees that the ultimate model utilized for assessment is the one that attained the utmost performance on the validation set.

Chapter 4

Results and Discussions

This section discusses the analytical tools and metrics employed to evaluate the per- formance of the proposed hybrid CNN-RNN model in accurately classifying emo- tional states based on EEG data. This comprehensive approach ensures robustness and reliability in the interpretation and validation of results, contributing to advance- ments in EEG-based emotion recognition research.

4.1 Dataset

The dataset used in this study consists of EEG recordings made by people who were subjected to a variety of emotional stimuli. Throughout each recording session, the brain’s electrical activity is captured using multiple channels, allowing for a comprehensive observation of neural responses to emotional experiences.

Every EEG recording is carefully marked with three essential emotional di- mensions: valence, arousal, and dominance. These annotations function as accurate markers for linking the EEG signals with specific emotion labels. Valence refers to the degree of positivity or negativity of an emotional experience, arousal measures the level of intensity of the emotion, and dominance represents the perceived level of control over the situation.

The dataset comprises recordings from a total of 465 sessions. The catego- rization of each session is based on eight distinct emotional states, which are deter- mined by particular combinations of valence, arousal, and dominance values. The emotional states act as the definitive labels against which the model’s predictions are assessed and evaluated.

4.2 Data Preprocessing

Pre-processing the EEG data is crucial to ensure the quality and consistency of input data for our model. This process involved several key steps:

4.2.1 Pre-processing Steps

Normalization: The EEG signals were normalized to a standard range of 0 to 1. This step eliminates baseline drift and noise, ensuring uniformity across different recordings and preparing the data for effective training with neural
Spectrogram Generation: The EEG signals were transformed into spectro- grams using the This technique computes the frequency composition of EEG signals over time, providing a detailed time-frequency representation.

Specifically, we used a window size of 125 samples and an overlap of 62 sam- ples in the STFT calculation:

S(t, f ) = . ∑ x[n] · w[n − t] · e⁻ ^j²^π ^f ⁿ.

N−1 .

.n=0 .

The valence, arousal, and dominance evaluations were transformed into dis- crete emotion labels using a pre-established mapping algorithm. This function classifies the emotional state into one of eight categories based on the values of the ratings. For example, when valence, arousal, and dominance are all high, they correspond to label 0. Conversely, when valence, arousal, and dominance are all low, they correspond to label 1.

4.2.2 Experimental Setup

The experimental procedure entailed dividing the dataset, configuring the model, and establishing the training parameters.

Data Splitting: The dataset was divided into two sets, with 75% of the data used for training and 25% used for testing. This ensures that an adequate amount of the data is used to train the model, reserving a separate set for eval- uation to assess its generalization
Model Configuration: The proposed CNN-RNN hybrid model was configured with the following components:

– Convolutional Layers: Two convolutional layers with 32 and 128 filters respectively, followed by Swish activation functions and Max-Pooling layers. These layers were responsible for extracting spatial features from the spectrograms.

Z_i = Conv2D(A_i₋₁,W_i) + b_i

A_i = Z_i · σ(Z_i)

P_i = MaxPooling2D(A_i)

Flattening and Reshaping: The output from the convolutional layers was flattened and reshaped to prepare it for the LSTM

F = Flatten(P)

R = Reshape(F, shape = (63, 78))

LSTM Layers: The architecture comprises of an LSTM layer and a Bidi- rectional LSTM layer, each containing 64 units. This architecture is pur- posefully designed to capture the temporal dependencies inherent in the

H_t = LSTM(R_t,W_h,U_h, b_h)

H_t = [→−H_t, ←H−t ]

Dense Layers: The final classification is performed by fully connected layers consisting of 128 units, which are then followed by Swish activation

D₁ = Dense(F ⊕ H,W_d, b_d)

A_d = D₁ · σ(D₁)

Y = Softmax(D₂)

Training Parameters: The model was compiled using the Adam optimizer with a learning rate of 0.001 and the categorical cross-entropy loss func- To mitigate overfitting and retain the best model according to valida- tion loss, early stopping was employed with a patience of 30 epochs, along with model checkpointing.

4.2.3 Evaluation Metrics

The model’s performance was assessed using the following metrics:

Accuracy: The model’s overall accuracy on the test set is determined by the proportion of correctly predicted samples to the total number of
A confusion matrix was created to visually represent the performance across various emotion classes. The matrix was standardized to display the proportion of accurate categorizations for each class.

Confusion Matrix_i _j = Confusioni j

∑ j Confusion_i _j

Classification Report: A comprehensive classification report was generated, which includes precision, recall, and F1-score for each

Precision = TP

TP + FP

Recall = TP

TP + FN

F1-score = 2 Precision × Recall

Precision + Recall

F1 Score: The F1 scores, which take into account both precision and recall, were calculated using a weighted mean to assess the balance across all

These metrics offer a thorough assessment of the model’s performance, show- casing its strengths and weaknesses across various categories. Our methodology, which includes detailed equations and rigorous pre-processing, guarantees that our model accurately captures the intricate patterns in EEG data, resulting in precise and dependable emotion recognition.

4.2.4 Analysis

The experimental findings showcased that the suggested CNN-RNN hybrid model exhibited superior performance compared to conventional models in the task of emo- tion recognition using EEG data. The model attained a 96.6% overall accuracy on the test set. The confusion matrix demonstrated a high level of accuracy in classifying most emotion categories, although there was some confusion between emotions that had similar valence and arousal levels. The classification report exhibited elevated precision, recall, and F1-scores, specifically for the dominant emotion categories. The average F1 score was 0.968, suggesting a well-balanced performance across all categories. The utilisation of Swish activation functions resulted in improved con- vergence and model performance, as demonstrated by the training and validation curves. The early stopping mechanism successfully mitigated overfitting, thereby ensuring that the model consistently achieved high performance on unseen data.

Chapter 5

Results and Discussion

5.1 Results

The performance of the proposed CNN-RNN hybrid model was evaluated on the test set, and the results are summarized below.

Accuracy: The model demonstrated a remarkable accuracy of 96.6% on the test set, highlighting the effectiveness of the hybrid approach in precisely clas- sifying emotions using EEG data.

• Confusion Matrix:

Figure 5.2 displays the normalised confusion matrix. The matrix displays the accuracy rates for each emotion class, indicating the percentage of correctly classified samples. The values along the diagonal represent the true positive rates. The elevated values along the diagonal indicate that the model exhibits strong perfor- mance across the majority of emotion categories.

Confusion Matrix_i _j = Confusioni j

∑ j Confusion_i _j

• Classification Report:

Table 5.1 presents a comprehensive classification report that includes precision, recall, and F1-scores for each emotion class. The model demonstrated exceptional precision and recall rates, especially for the prominent classes such as ”Happy,” ”Sad,” and ”Excited.”

The average F1 score was 0.968, suggesting a well-balanced performance across

Table 5.1: Precision, Recall, and F1-Score for each emotion class and overall performance of the proposed CNN-RNN hybrid model.

Class	Precision	Recall	F1-Score
Happy	0.97	0.95	0.96
Sad	0.96	0.98	0.97
Excited	0.96	0.97	0.97
Afraid	0.98	0.95	0.96
Overall	0.97	0.97	0.97

all categories. The weighted F1 score, which takes into consideration the distribution of classes, was 0.965, providing additional evidence of the model’s resilience.

5.2 Discussion

The experimental results unequivocally establish the superiority of the proposed CNN-RNN hybrid model compared to traditional models in emotion recognition using EEG data. The model’s high accuracy, precision, recall, and F1-scores demon- strate its ability to effectively capture the intricate spatial and temporal patterns in EEG signals, resulting in precise emotion classification. An important advantage of the proposed model is its capacity to incorporate both spatial and temporal characteristics. The model utilises CNNs to extract spa- tial features and LSTMs to capture temporal dependencies, effectively combining the advantages of both architectures. This integration enables the model to compre- hend the complex patterns in EEG data that are essential for emotion recognition. The CNN component efficiently captures spatial hierarchies from the spectrograms, while the LSTM component preserves long-term dependencies and mitigates the vanishing gradient problem, thereby improving the model’s temporal understand- ing.

The utilisation of Swish activation functions is an additional noteworthy con- tribution to the performance of the model. Swish, a self-gated activation function, enhances the flow of gradients and the process of convergence, leading to improved learning and increased accuracy. The efficacy of this activation function has been demonstrated to surpass that of conventional activation functions such as ReLU in diverse deep learning tasks, and our findings support these conclusions.

The implementation of thorough preprocessing procedures was essential in guaranteeing the production of input data of superior quality, thereby enhancing the model’s resilience and precision. Normalisation eliminated the gradual change in the baseline and unwanted random variations, while spectrogram generation produced a diverse collection of characteristics for the CNN to extract spatial attributes. By meticulously aligning valence, arousal, and dominance ratings with specific emotion labels, the model was trained using precise and representative data.

Although the model performed well overall, it exhibited slightly reduced pre- cision and recall for less prominent classes. This phenomenon can be ascribed to the disparity in class distribution within the dataset. Classes with a smaller number of samples typically exhibit lower performance metrics, as the model may not have been exposed to a sufficient number of examples to learn effectively. To rectify this disparity, employing methods like data augmentation or weighted loss functions has the potential to enhance the model’s accuracy on these specific categories.

The intricate nature of the hybrid model, which incorporates both CNN and LSTM layers, leads to increased computational demands. Although the complex- ity is essential for accurately representing the intricate patterns in the EEG data, it also implies that the model necessitates substantial computational resources for both training and inference. To address this problem, one possible approach is to optimise the architectural structure of the model and explore more efficient training methods, enhancing the model’s usability for real-world implementations. Although the current model shows impressive accuracy in a controlled experimental environment, additional validation is needed to ensure its effectiveness in real-time emotion recognition systems. Real-time applications require precise accuracy and minimal latency, as well as the ability to handle dynamic environmental changes with sta- bility. Subsequent investigations should prioritise the optimisation of the dynamic performance model and evaluate its ability to withstand different scenarios, such as diverse user conditions, fluctuating levels of noise, and various types of emotional stimuli.

In terms of evaluation metrics, the hybrid model consistently demonstrated su- perior performance compared to both traditional machine learning algorithms and simpler neural networks, surpassing the performance of baseline models. Baseline models often struggle to capture the temporal dependencies and complex spatial characteristics in EEG data, leading to lower accuracy and F1 scores. The hybrid model’s ability to integrate spatial and temporal information gives it a significant advantage in accurately identifying emotions from EEG signals.

The success of the proposed CNN-RNN hybrid model creates numerous prospects for future research. Potential areas for additional research encompass enhancing the model’s ability to perform in real-time, addressing the issue of class imbalance, and evaluating the model’s applicability to other domains within affective computing.

Moreover, by incorporating this model with supplementary physiological signals, such as heart rate or galvanic skin reaction, there is the possibility to enhance the effectiveness of emotion identification systems. The integration of EEG with other physiological data in multimodal approaches has the potential to improve accuracy and provide a more comprehensive understanding of emotional states.

Chapter 6 CONCLUSION

6.1 Conclusion

This study presents a novel approach for identifying emotions through the analy- sis of EEG data. The technique employs a hybrid model that integrates CNNs and RNNs. By combining these two architectures, the model can effectively capture both the spatial and temporal patterns in the EEG signals with precision. As a con- sequence, there is an enhancement in the precision and resilience of emotion clas- sification. The model we proposed demonstrated a significant improvement com- pared to traditional models, achieving an overall accuracy of 96.6% on the test set. The model exhibits a notable degree of accuracy, comprehensiveness, and overall effectiveness, as evidenced by its elevated precision, recall, and F1-scores for the majority of emotion categories. This indicates that the model exhibits a high level of reliability in accurately discerning and differentiating various emotional states. The model’s performance was greatly enhanced by incorporating Swish activation functions, which improved gradient flow and convergence during training. The in-

CHAPTER 6. CONCLUSION

put data underwent thorough preprocessing procedures, including normalisation and spectrogram creation, to ensure its quality and uniformity. The meticulous correla- tion of valence, arousal, and dominance ratings with distinct emotion labels estab- lished a strong basis for training the model. These steps were essential in attaining the observed enhancements in performance. In addition to these achievements, the study also pinpointed specific areas that require further enhancement, such as rec- tifying the imbalance in class representation and fine-tuning the model to be more suitable for real-time usage. The relatively lower performance observed in less dom- inant classes’ highlights the need to utilise techniques such as data augmentation or weighted loss functions to improve the training of the model on underrepresented classes. Furthermore, the hybrid model’s computational complexity emphasises the necessity for additional optimisation in order to enhance its suitability for practical, real-time application.

REFERENCES

[1] S. Jirayucharoensak, S. Pan-Ngum, and P. Israsena, “Eeg-based emotion recog- nition using deep learning network with principal component based covariate shift adaptation,” The Scientific World Journal, vol. 2014, 2014.
[2] R. Plutchik, “The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice,” American scientist, vol. 89, pp. 344–350, 2001.
[3] L. F. Barrett, B. Mesquita, K. N. Ochsner, and J. J. Gross, “The experience of emotion,” Annu. Rev. Psychol., vol. 58, pp. 373–403, 2007.
[4] A. Dzedzickis, A. Kaklauskas, and V. Bucinskas, “Human emotion recogni- tion: Review of sensors and methods,” Sensors, vol. 20, p. 592, 2020.
[5] W. M. Vanderlind, Y. Millgram, A. R. Baskin-Sommers, M. S. Clark, and
J. Joormann, “Understanding positive emotion deficits in depression: From emotion preferences to emotion regulation,” Clinical psychology review, vol. 76, p. 101826, 2020.
[6] T. Dalgleish and M. Power, Handbook of cognition and emotion. John Wiley Sons, 2000.
[7] P. Ekman, W. V. Friesen, and S. S. Tomkins, “Facial affect scoring technique: A first validity study,” Semiotica, vol. 3, pp. 37–58, 1971.
[8] J. A. Russell and L. F. Barrett, “Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant,” Journal of personal- ity and social psychology, vol. 76, p. 805, 1999.
[9] J. Tao and T. Tan, “Affective computing: A review,” in International Confer- ence on Affective computing and intelligent interaction, 2005, pp. 981–995.
[10] L. F. Haas, “Hans berger (1873–1941), richard caton (1842–1926), and elec- troencephalography,” Journal of Neurology, Neurosurgery Psychiatry, vol. 74, pp. 9–9, 2003.

60 REFERENCES

[11] S. Siuly, Y. Li, and Y. Zhang, EEG and its background. Springer, 2016, pp. 3–21.
[12] M. Algarni, F. Saeed, T. Al-Hadhrami, F. Ghabban, and M. Al-Sarem, “Deep learning-based approach for emotion recognition using eeg signals using bi- lstm,” Sensors (Basel), vol. 22, no. 8, p. 2976, Apr 2022.
[13] O. Bazgir, Z. Mohammadi, and S. A. H. Habibi, “Emotion recognition with ma- chine learning using eeg signals,” in 2018 25th National and 3rd International Iranian Conference on Biomedical Engineering (ICBME), 2018, pp. 1–5.
[14] E. H. Houssein, A. Hammad, and A. A. Ali, “Human emotion recognition from eeg-based brain–computer interface using machine learning: a comprehensive review,” Neural Comput Applic, vol. 34, pp. 12 527–12 557, 2022.
[15] H. Liu, Y. Zhang, Y. Li, and X. Kong, “Review on emotion recognition based on electroencephalography,” Front. Comput. Neurosci., vol. 15, p. 758212, 2021.
[16] S. Gannouni, A. Aledaily, and K. Belwafi, “Emotion detection using electroen- cephalography signals and a zero-time windowing-based epoch estimation and relevant electrode identification,” Sci Rep, vol. 11, p. 7071, 2021.
[17] A. Al-Nafjan, M. Hosny, A. Al-Wabil, and Y. Al-Ohali, “Classification of hu- man emotions from (eeg) signal using deep neural network,” Int. J. Adv. Com- put. Sci. Appl., vol. 8, pp. 419–425, 2017.
[18] G. Du, W. Zhou, C. Li, D. Li, and P. X. Liu, “An emotion recognition method for game evaluation based on electroencephalogram,” IEEE Trans. Affect. Com- put., vol. 10, p. 598, 2020.
[19] S. K. Mudgal, S. K. Sharma, J. Chaturvedi, and A. Sharma, “Brain computer interface advancement in neurosciences: applications and issues,” Interdisci- plinary Neurosurgery, vol. 20, p. 100694, 2020.
[20] X. Li, D. Song, P. Zhang, G. Yu, Y. Hou, and B. Hu, “Emotion recognition from multi-channel eeg data through convolutional recurrent neural network,” in Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2016, pp. 352–359.
[21] Y. Yin, X. Zheng, B. Hu, Y. Zhang, and X. Cui, “Eeg emotion recognition using fusion model of graph convolutional neural networks and lstm,” Appl. Soft Comput., vol. 100, p. 106954, 2020.
[22] H. Ranganathan, S. Chakraborty, and S. Panchanathan, “Multimodal emotion recognition using deep learning architectures,” in 2016 IEEE Winter Confer- ence on Applications of Computer Vision (WACV), 2016, pp. 1–9.

61 REFERENCES

[23] B. Chakravarthi, S.-C. Ng, M. R. Ezilarasan, and M.-F. Leung, “Eeg-based emotion recognition using hybrid cnn and lstm classification,” Frontiers in Computational Neuroscience, vol. 16, 2022.
[24] M. K. Chowdary, J. Anitha, and D. J. Hemanth, “Emotion recognition from eeg signals using recurrent neural networks,” Electronics, vol. 11, no. 2387, 2022.
[25] X. Liu, Q. Xu, and N. Wang, “A survey on deep neural network-based image captioning,” Vis. Comput., vol. 35, pp. 445–470, 2019.
[26] C. L. Giles, S. Lawrence, and A. C. Tsoi, “Noisy time series prediction using recurrent neural networks and grammatical inference,” Mach. Learn., vol. 44, pp. 161–183, 2001.
[27] S. P. Singh, A. Kumar, H. Darbari, L. Singh, A. Rastogi, and S. Jain, “Ma- chine translation using deep learning: An overview,” in Proceedings of the 2017 International Conference on Computer, Communications and Electronics (Comptelix), 2017, pp. 162–167.
[28] S. Pattanayak, Natural language processing using recurrent neural networks. Berkeley, CA, USA: Apress, 2017, pp. 223–278.
[29] W. Yin, K. Kann, M. Yu, and H. Schu¨tze, “Comparative study of cnn and rnn for natural language processing,” arXiv, vol. arXiv:1702.01923, 2017.
[30] L. Medsker and L. C. Jain, Recurrent Neural Networks: Design and Applica- tions. Boca Raton, FL, USA: CRC Press, 1999.

Have any thoughts?

Share your reaction or leave a quick response — we’d love to hear what you think!