As social media has become an integral part of our life, its great usage also carries the risk of exposing unpleasant content. A topic of increasing relevance, artificial intelligence (AI) has shown itself as a useful tool in the identification and elimination of such data. It can be especially challenging to find nasty content in memes though, since humans interpret the combined meaning and cannot tell a picture from the text it contains. Therefore, just like people do, an artificial intelligence tool meant to detect dangerous memes must have extensive knowledge of their substance and context. To address this issue, a project was initiated to automatically classify memes as hostile or not by combining text, picture feature information, and additional data from online entity recognition. The paper made use of the multimodal dataset from the Hateful Meme Detection Challenge 2020. Modern visual language models tend to perform accurately in comparison with non-expert people on this dataset since it contains confusing events such memes that are insignificant, contrastive, or contrary, so stressing the difficulty of the task. If models are to reach great accuracy, they must have a better awareness of language acquisition, image, current affairs, and the interactions among several modalities. The proposed approach consists in classifying memes using text, images, and statistics acquired from the online entity identification process. The paper looks at both the shortcomings of the proposed method and ways to improve it going ahead. Lack of real-world experience causes the algorithms to struggle in properly identifying individuals’s traits and classifying racial or religious groups. The models also struggle to identify memes that speak to anguish, violence, and disability. The models struggle also with religious practices, traditional attire, political and social references, and cultural standards. Two parallel streams and cross-attention training enable the proposed architecture to simultaneously handle text and images. Both streams originate from the bidirectional multi-head attention paradigm. The research also includes the preprocessing pipeline required for the proposed design. Two phases of the study were conducted; the first generated an Area Under the Receiver Operating Characteristic (AUROC) of 0.71 and an accuracy of 0.74 on the dataset of vile memes. With an accuracy of 0.7352 for the test unseen and 0.7650 for the dev unseen data, the enlarged Hateful Meme Detection Dataset displayed that the model had an AUROC of 0.8108 on the test unseen data and 0.7555 on the dev unseen data. To better span the dataset including more memes in Phase-2, the Hateful Meme Detection dataset was expanded to include in order Page | 4. The paper acknowledges the restrictions of the project even if the suggested approach shows encouraging results. The approach mostly depends on linguistic and visual aspects, which limits its capacity to find offending memes with delicate or complex content. The models also need a lot of training data, which can be difficult to obtain in real-world circumstances, if they are to get more accurate. The paper emphasizes the need of objectivity, responsibility, and openness in the evolution and implementation of algorithms as well as the moral quandaries generated by the use of artificial intelligence to content moderation. The final result of the study provided a method for automatically identifying offensive memes using internet object recognition, picture feature data, and language, all of which combined Although the method shows good results, issues still need to be addressed before the models’ accuracy and efficacy can be raised. Future studies in this field could look at including other modalities, such audio or video, to raise model performance. Improving the knowledge of sociological and cultural influences among the algorithms could also help to increase their accuracy in identifying unpleasant content. Eventually, artificial intelligence (AI) has to be applied carefully and transparently in the moderation of content if we are to guarantee moral and responsible use of these developments.
Detecting Hate In Multimodal Memes Using Machine Learning
157