Generative AI and it's Models: Exploring the New Era

AI is taking over rapidly in this technology-driven world. The evolution of AI has led to many fields, Generative AI is one of those fields. It is widely used nowadays as it is fast and versatile. When we hear about it, the first thing that comes to our mind is that it generates. That is what the main purpose serves though. Generative AI creates new ideas and content like photos, videos, animation, 3D models, etc when a variety of input is provided.

Table of Contents

History of Generative AI: Eliza in 1932 to ChatGPT

The story starts in 1966. The world witnessed the first generative AI Known as Eliza, the birth of Eliza, which is known as the first chatbot ever built by humans. It was created by Joseph Weizenbaum at MIT Eliza and was a groundbreaking experiment of its time, enabling human-computer interaction. It didn’t understand the conversation context in a way human to or for that matter, chat equity. However, nowadays, instead, it merely creates an illusion of conversation by rephrasing its user statement as a question back to them. For example, if a user types, I am feeling sad. It would respond by saying, “Why do you feel sad?” at that time. Many variations of chatbots have been made, and one of the most well-known variations is called DOCTOR.

Workflow of Gen AI. Image Credit: cloudfront

Recurrent neural networks 1986.

On to the late 20th century. We saw the emergence of neural networks, starting with the Eliza in 1932. After Aliza, it almost started a trend for creating chat boards in various institutions and tech companies. Among these recurrent neural networks, or RNNs, were the first to come in 1986 they gained instant popularity Unlike traditional feed-forward neural networks, where the flow of information was in one direction, RNN code, remember, previous input in their internal state or memory, or also known as continued conversation.

Long saw term memory 1997.

LSTMs are a specialized type of recurrent neural network their priority. It’s a type of recurrent neural network Its primary advantage was its ability to remember information over long sequences overcoming this short-term memory limitation of R and well LTSM in the unique architecture introduced the concept of gates specifically the input gate and output gate these gates determine how much information should be memorized discarded or output at each time step

Gated recurrent units 2014

GRUs are catered recurrent came in 2014 and Gru’s were designed to combat the vanishing gradient problem allowing them to return long-term dependencies in sentences GRUs simplified the gating by using only two gates update gate which determines how much of previous information to keep versus how much of new information to consider and reset gate which determine how much of previous information to forget this method not only reduces processing time but also improve the efficiencies in the terms of computation.

Attention mechanism 2014.

LTSM and GRE architectures were great at retaining the context when it was far away from the NLP Ward and their problems needed something more and that gave birth to the concept of attention which was published in this 2014 paper the introduction of their tension mechanism marked a significant paradigm shift in the sequence modeling offering a fresh perspective compared to previous architectures.

Rise of LLMs-ChatGPT 2018

With the success of Transformers, the next logical step was scaling, and this kick-started with Google’s BERT model which was released in the year 2018. Unlike previous smaller debt process text, it is either left to right or right to left. BERT was designed to consider both directions simultaneously hence the name by directional encoder representations from transformer pre-trained on vast amounts of text. BERT was the first proper foundational language model that would be fine-tuned for the specific task. Later on, open AI released its GPT 2 models Google released its T5 model in 2019 thereafter GPT 3 in 2020, and then between 2019 and August 2024 which is the present time, many language models have been released.

How does Generative AI work?

Gen-AI is the subset of deep learning AI. Gen AI models can generate novel outputs that resemble the training data from deep learning models to learn patterns and representations from existing data. An important concept of Gen AI is LLM or Large Language Model. LLM is a model trained on a vast amount of text data to understand and generate human-like texts. These easily perform a wide range of language tasks. LLM helps in essay writing, answering questions, or even coding. Prompt Engineering is a major part of Gen AI as it directs the AI with effective prompts to get the desired outputs. It involves understanding the model’s capability and limitations. Furthermore, Effective prompts provide clear instructions, examples, and contexts to get the best outputs.

Types of Generative AI Models

There are different types of Gen AI. They are divided into models to fit into the various domains of content generation. Some of the models are listed below.

Generative Adversarial Networks(GANs)

A combination of two neural networks namely a generator and a discriminator. The generator specifically generates artificial data using the real data available. It has a purpose to fool the discriminator whose main purpose is to distinguish between the data created and the genuine data. Hence, high-quality results are obtained because of the competition between both networks that leads them to advancement. They are evolving to be very dynamic and versatile with their skills in image synthesis, image synthesis, etc.
Diffusion Model

Imagine we have an image, now we add a bit of noise to it and then we keep adding noise to the extent to which the effect makes the image unrecognizable. Now what if we could reverse this process? The basic idea of diffusion models is to transform a noise image into a coherent image. It is a successful model, particularly in the domain of image generation. It involves two steps in the process namely forward diffusion and reverse diffusion. Moreover, these models offer high quality and are large-scale, often they are categorized as foundation models.
Flow Model

These Gen AI models learn from a given dataset. They interpret the underlying structure of these datasets. Additionally, these models analyze the dataset to understand the probability distribution of different events and values. After understanding the probability distribution, the model can create fresh data. In other words, it creates fresh data from the given dataset with similar components and characteristics. In addition, this model is faster and more efficient than other models.
Variational Autoencoders (VAEs)

Variational autoencoders use a method in which, instead of mapping any input to a fixed vector, you want to map your input into distribution. The only thing that is different in a variational autoencoder is that your normal bottleneck vector C is replaced by two separate vectors. One represents the mean of your distribution and the other one represents the standard deviation of the distribution so whenever you need a vector to feed through your decoder network.