ide for SEOs to understand large language models (LLMs).
Do I need to use large language models when researching keywords? These models can they think? ChatGPT is my friend.
This guide is for those who have asked themselves these questions.
This article will cover what SEOs should know about large language modeling, natural language processing, and more.
Simple terms for large language models, natural Language Processing and more
You can either tell someone to do something or you can hope that they will do it on their own.
In computer science, machine learning means that the robot will do what you tell it to, while programming is telling it to do it. The first is called supervised machine-learning, while the second is called unsupervised machine-learning.
Natural language processing is the process of breaking down text into numbers, and then analyzing it with computers.
As computers become more sophisticated, they begin to analyze patterns within words, and then, the relationships between words.
A natural language machine-learning model that is unsupervised can be trained using a variety of datasets.
If you train a model of language on the average reviews of Waterworld movie, the result will be a good writer (or understander) of reviews for Waterworld movie.
It would only be able to understand the positive reviews I gave of Waterworld if you were to train it on those two reviews.
They are neural networks that have over a trillion parameters. These models are so large that they are more generalized. They are trained not only on positive and negatively reviewed Waterworld reviews, but also on Wikipedia articles, news websites, and other sources.
Context is a big part of machine learning projects – both within and outside context.
It won’t work well if you show a machine-learning project a picture of a cat and ask it to identify bugs.
It’s hard to generalize knowledge when there are so many problems out of context.
LLMs appear and are a lot more generalized compared to other machine learning projects. The sheer volume of data and the ability of millions of relationships to be crunched is the reason for this.
Transformers are one of the technologies that allows this.
How to explain transformers?
Transformers, a type of neural network architecture, has revolutionized NLP.
Most NLP models used a technique known as recurrent neural network (RNNs) to process text one word at a given time. This method had some limitations. It was slow and struggled to handle text with long-range dependencies.
Transformers have changed the world.
Vaswani and colleagues published a landmark paper in 2017 entitled “Attention Is All You Need.” The transformer architecture was introduced.
Transformers process text in parallel instead of sequentially. This allows them to better capture dependencies over a long distance.
Previous architecture included RNNs, and long-short-term memory algorithms.
These models were (and are still) used to perform tasks that involve data sequences such as speech or text.
These models do have a flaw. These models can only handle one piece of data at a given time. This slows them down, and limits the amount of data that they can use. These models are severely limited by this sequential processing.
As a new way to process sequence data, attention mechanisms were introduced. These mechanisms allow the model to examine all data pieces at once and determine which ones are important.
It can be very useful in many situations. Most models that use attention also use recurrent processes.
In essence, they were able to process data at once, but needed to order it. Vaswani et al. asked in their paper, “What if only the attention mechanism was used?”
When processing input, the model can focus its attention on specific parts. When we read a phrase, we pay attention to certain words more than others depending on context and what we are trying to understand.
When you use a model to transform a sequence, it calculates the importance of each word based on the overall meaning.
This model uses the scores to determine the importance of the words in the sequence. It then focuses more on important words, and less on unimportant ones.
This attention mechanism allows the model to capture long-range dependencies between words, even if they are far apart. It does this without processing the entire input sequence in a sequential manner.
The transformer is a powerful tool for tasks involving natural language processing, because it can understand a sentence quickly and accurately.
Take the example of the transformer model processing “The cat sat in the mat.”
Each word is represented by a vector or a sequence of numbers using an embedding matrices. Say the embeddings of each word are as follows:
- The: [0.2, 0.1, 0.3, 0.5]
- cat: [0.6, 0.3, 0.1, 0.2]
- sat: [0.1, 0.8, 0.2, 0.3]
- on: [0.3, 0.1, 0.6, 0.4]
- the: [0.5, 0.2, 0.1, 0.4]
- mat: [0.2, 0.4, 0.7, 0.5]
Then the Transformer calculates a score for every word in the sentences based on the relationship between each of the words.
The dot product between the embeddings for each word and the other words within the sentence is used.
To calculate the score of the word “cat”, we would multiply its embedding by the embedded words of the rest of the words.
- “The cat“: 0.2*0.6 + 0.1*0.3 + 0.3*0.1 + 0.5*0.2 = 0.24
- “cat sat“: 0.6*0.1 + 0.3*0.8 + 0.1*0.2 + 0.2*0.3 = 0.31
- “cat on“: 0.6*0.3 + 0.3*0.1 + 0.1*0.6 + 0.2*0.4 = 0.39
- “cat the“: 0.6*0.5 + 0.3*0.2 + 0.1*0.1 + 0.2*0.4 = 0.42
- “cat mat“: 0.6*0.2 + 0.3*0.4 + 0.1*0.7 + 0.2*0.5 = 0.32
These scores show the relevance of a word to “cat”. The transformer uses these scores to calculate a weighted total of the words embedded, where the scores are the weights.
This creates a vector context for the word “cat”, which considers all the words within the sentence. The process is repeated with each word of the sentence.
Imagine a transformer that draws a line between every word of the sentence, based on each calculation. Some lines are more tenuous than others.
Transformer is a model that uses only attention, without recurrent processing. It is faster and can handle more data.
GPT transformers
You might remember that when Google announced BERT, they boasted that it would allow search to understand a full context for an input. This is similar to the way GPT uses transformers.
Let’s use a simple analogy.
Imagine a thousand monkeys sitting at a keyboard.
Each monkey randomly hits keys on the keyboard to generate letters and symbols.
Some strings may be complete nonsense while others could resemble real sentences or words.
One of the circus trainers noticed that a monkey had written “To be or not to be” on a piece of paper. The trainer gave the monkey a sweet treat.
Other monkeys start to try and imitate this successful monkey in hopes of getting their own treat.
Some monkeys produce more coherent and better text as time goes on, while others produce gibberish.
The monkeys will eventually be able to recognize and mimic coherent patterns within text.
LLMs get an edge over monkeys, because they are trained first on billions and billions of texts. They are able to see patterns. They can also see the relationships and vectors between these text pieces.
They can then use these patterns and relationships to create new text that is similar to natural language.
GPT (Generative Pretrained Transformer) is a language-model that uses transformers for natural language text.
The system was trained using a large amount of text taken from the Internet, which enabled it to learn patterns and relationships among words and phrases.
Models work by taking a prompt, or a few short words of text, and using transformers to predict the next words based on patterns that it has learned through its training data.
The model generates text word-by-word, using context from the previous words as a guide for the next.
GPT in Action
GPT can produce natural language texts that are highly coherent and context-relevant.
It can be used in many ways, including to generate product descriptions and answer customer service questions. You can use it creatively to create poetry or short stories.
It is however only a model of language. The data used to train it can be outdated or incorrect.
- No source of information is available.
- You cannot search for the Internet.
- It does not “know” anything.
The guessing game is a simple way to predict the next word.
Here are some examples:
OpenAI’s playground has the first line from the classic Handsome Boy Modeling School song ‘Holy Calamity [[Bear Witness II]]’.
I sent the answer so that we could see the probability of both my input and output lines. Let’s look at each section of this.
The most likely next words/tokens are Spirit, Roman and Ghost.
The top six results only cover 17.29%, which means there are about 82% of other possible outcomes that we cannot see.
Here’s a quick overview of the inputs and outputs.
Temperature indicates the likelihood that the model will select words other than the ones with the highest probability. Top P shows how the model selects these words.
For the input “Holy Calamity”, top P is the way we select the next cluster of tokens [Ghost Roman Spirit], while temperature is the likelihood that you will choose the most likely token over more variety.
It is likely to choose a lesser likely token if the temperature is high.
A high top P and a temperature above normal will make the game more wild. It is choosing from a large variety (high top p) and more likely to select surprising tokens.
A selection of high temp, high P responses
A high top P but a low temp will select surprising options from a small sample of possibles:
The next most likely tokens are chosen by lowering the temperature.
I think that playing with these probabilities will give you an insight into the way these models work.
This is a collection that shows what the next possible selections are based on previous work.
What does it mean in reality?
LLMs are a simple way to transform inputs into outputs.
People have made jokes about how different that is from normal people.
It’s not as if LLMs are ignorant. They don’t get any information. The guess is based on what the previous word was.
Think of an apple. What do you think of?
You can try rotating one in your head.
You may remember the sweet smell of an orchard of apples, the pink lady’s sweetness, etc.
You might think of Steve Job.
Let’s now see what the prompt “think about an apple” will return.
By now, you’ve likely heard “Stochastic parrots” being thrown around.
Stochastic parrots are used to describe LLMs such as GPT. A parrot mimics sounds.
LLMs work like parrots, in that they listen to information and then produce something similar to what they heard. They’re also random which means that they use probabilities to predict what will happen next.
LLMs can recognize patterns and relationships, but do not have a deeper understanding of the words they are seeing. This is why they are so good at generating text in natural language but not understanding the content.
Uses for a LLM
The LLM is good for more generalist tasks.
It can be shown text and perform a task without any training.
It can do a variety of creative tasks, such as writing an outline, or even perform sentiment analysis.
You can do things like code. It can be used to complete many tasks.
It’s still based on patterns and probability. There will be times that it detects patterns in your inputs that you are unaware of.
It can be a positive thing (seeing patterns humans cannot) or a negative thing (why did it react like that? ).
It doesn’t even have any data sources. It will be a problem for SEOs that use it to find ranking keywords.
It cannot look at traffic data for a specific keyword. It does not have any keyword data other than the fact that they exist.
ChatGPT offers a language model that you can use right away for a variety of tasks. It’s not without its caveats.
Other ML models can be used for a variety of purposes
Other NLP techniques and algorithms can perform better than LLMs.
Take keyword extraction as an example.
If I use TFIDF or another keyword extraction technique to extract keywords from a text corpus, I am aware of the calculations that go into this technique.
The results are reproducible and standard. I also know that they are specific to the corpus.
If you ask for keyword extraction with LLMs such as ChatGPT, you won’t get the keywords from the corpus. GPT will give you what believes a corpus + extract keyword response would be.
It is similar to clustering and sentiment analysis. The parameters you choose may not give the best result. You get what you can expect based on similar tasks.
LLMs lack current and accurate information. Often, they cannot search on the internet and only use statistical tokens to parse information. These factors are the reason for the limitations on an LLM’s ability to retain information.
These models cannot think. This article only uses the word “think”, because it is difficult to avoid using it when discussing these processes.
Even when discussing fancy stats, the tendency is to anthropomorphise.
This means that you cannot trust an LLM with any task that requires “thought.”
You are relying on a statistical analysis based on what hundreds of Internet weirdos have responded to similar tokens.
You can use a LLM if you trust Internet denizens to complete a task. Otherwise…
Models that are not ML compatible
<a href="https://www.euronews.com/next/2023/03/31/man-ends-his-life-after-an-ai-chatbot-encouraged-him-to-sacrifice-himself-to-stop-climate-#:~:text=Smart%20Health-,Man%20ends%20his%20life%20after%20an%20AI%20chatbot%20'encouraged'%20him,himself%20to%20stop%20climate%20change&text=A%20Belgian%20man%20reportedly%20decided,an%20AI%20chatbot%20named%20Eliza." A Belgian man reportedly decided to use an AI chatbot named Eliza. Combinations of factors can lead to real harm.
- These responses are often anthropomorphized by people.
- They are infallible.
- Use them where there is a need for humans to be present in the machine.
- More.
You may think “I am an SEO.” You don’t work on systems that can kill anyone!
Google has promoted concepts such as E-A.T. and YMYL.
Google does this to frustrate SEOs or because they do not want to be held responsible for the harm caused?
Even systems with strong knowledge bases can cause harm.
Google Knowledge carousel “Flowers safe for dogs and cats” shows daffodils, despite the fact that they are toxic to cats.
Imagine you’re generating content at scale for a veterinary site using GPT. You enter a few keywords and send pings to the ChatGPT API.
A freelancer who is not an expert in the subject area reads all of the results. They miss a problem.
You can then publish the results, which will encourage cat owners to buy daffodils.
You killed someone’s cat.
Maybe they don’t know. They may not even be aware of the site.
Perhaps other vet websites will start to do the same and feed off of each other.
Google’s top result for the question “are daffodils poisonous to cats?” is a website that says they aren’t.
Fact-checking is done by other freelancers who read AI content, which can be pages and pages long. The systems have now incorrect information.
In my discussions of the current AI boom, I often mention Therac-25. It’s a well-known case study in computer mischief.
It was the first radiation therapy machine to use computer locking mechanisms. The software glitch meant that people received tens and thousands of times more radiation than they should have.
The fact that the company has voluntarily recalled these models and inspected them is something that I find very interesting.
They assumed that because the software was “infallible” and the technology advanced, the problem must have been with the mechanical parts of the machine.
The Therac-25 was still on the market after they fixed the mechanisms, but did not check the software.
FAQs and Misconceptions
Why is ChatGPT lying to me?
Some of the most influential people on Twitter and some of our greatest minds have complained that ChatGPT has “lied” to them. It’s due to two misconceptions that are linked together:
- ChatGPT is a “want”-based service.
- It has a solid knowledge base
- The technologists who created the technology may have a different agenda than “make money” and “create a cool product.”
Every aspect of your daily life is infused with biases. There are also exceptions to the biases.
The majority of software developers are currently men. I am both a man and a female software developer.
This would mean that an AI trained on this basis would always assume software developers to be men. That is not the case.
Amazon’s AI recruiting system, which is trained using resumes of successful Amazon employees, is a famous example.
It was a mistake to throw out resumes from colleges with a majority of black students, even though they could have been very successful.
ChatGPT uses layers of fine tuning to counteract these biases. You get the response “As a language model for AI, I am unable to …”.
Some workers from Kenya were required to look through hundreds of prompts and responses, searching for hate speech and slurs.
A fine-tuning layer is then created.
Why are you unable to make insults about Joe Biden Why are you able to make sexist remarks about men but not women?
ChatGPT is not saying the N-word because it’s a liberal bias, but rather thousands of layers of fine tuning that tells ChatGPT to do so.
ChatGPT should be completely neutral in its view of the world. However, it must also reflect that world.
It’s the same problem that Google is facing…
Often, the truth, what makes someone happy, and a good response to a question are very different.
Why does ChatGPT generate fake citations for me?
Fake citations are another question that I hear a lot. Why are some fake and others real? Why are some pages on websites fake, while others are real?
You can hopefully figure this out by understanding how statistical models operate.
If you’ve skipped over the long expectations, here is a short one.
You are an AI language model. You’ve been trained with a lot of web content.
You are told to write about something technological – say, Cumulative layout shift.
You may not have many examples of CLS articles, but you do know what they are and the general format of an article on technologies. You are familiar with the general format of this type of article.
You start writing your answer and then run into some kind of problem. You know that a URL is the next thing to go in your sentence if you are familiar with technical writing.
You know from reading other CLS articles that Google and GTMetrix have been cited in CLS articles. So, those are simple.
You also know that CSS Tricks is frequently linked in web articles.
This is how ALL URLs are built, not only the fake URLs:
This GTMetrix post does exist. But it exists because a string of values was likely to appear at the end this sentence.
GPT models and other similar models are unable to distinguish between real and fake citations.
Other sources (knowledge base, Python, etc.,) are required to model the data. To check and compare the results.
What is a “Stochastic parrot”?
It’s worth repeating. Stochastic Parrots describe what happens when large language model appear generalist.
For the LLM, nonsense is the same as reality. The LLM sees the world in the same way as an economist, as a collection of numbers and statistics describing reality.
You’ve heard the saying, “There are 3 kinds of lies – lies, damned lying, and statistics.”
LLMs consist of a lot of statistics.
LLMs may seem coherent but this is only because we see everything that appears human as being human.
The chatbot model also obscures much of what you need to know in order for GPT responses be coherent.
I am a software developer. The results of using LLMs to debug code are extremely variable. LLMs will fix the issue if it’s a problem that people have had before.
It will not solve the problem if it’s a new issue or a small portion of its corpus.
Why is GPT superior to a search engine
I phrased this in an interesting way. I don’t believe GPT is better that a search engine. I’m worried that ChatGPT has replaced search engines.
ChatGPT’s ability to obey instructions is one of its most important features. It will do almost anything you ask.
Remember, the next statistical word in the sentence is not the truth.
If you ask a question to which it has no answer, but you ask in such a way as it feels obliged to respond, you’ll get a bad answer.
It is comforting to have a response that is tailored for you. However, the world has a wealth of experiences.
The inputs to an LLM will all be treated the same, but some people are more experienced and their responses will be superior than a mix of other people’s responses.
A single expert is more valuable than a hundred think pieces.
Is AI on the rise? Skynet is here?
Koko was a gorilla who was taught sign languages. Researchers in linguistics did a lot of research to show that apes can be taught sign language.
Herbert Terrace discovered that the apes were not putting sentences together or using words, but rather mimicking their human handlers.
Eliza is a chatterbot (machine therapist).
She was seen as a real person by the people: someone they could trust and who cared about. Researchers were asked to spend time alone with her.
The brain responds to language in a very specific way. When people hear something communicated, they expect to be able to think about it.
LLMs can be impressive, but they do so in a way which shows the breadth of human achievements.
They don’t have a will. They cannot escape. They cannot try to take over the entire world.
They are mirrors: they reflect people, and more specifically the user.
There is only one thought, which is a statistically represented collective unconscious.
Does GPT speak a language all by itself?
Sundar Pichai of Google claimed on 60 Minutes that Google’s language models learned Bengali.
The model was trained using those texts. It is wrong to say that the model “spoke in a foreign tongue it had never been trained to speak.”
AI can sometimes do unexpected things. But that is to be expected.
There will always be instances when patterns or statistics reveal surprising results.
This shows that the marketing and C-suite people who peddle AI and ML do not understand the system’s workings.
I’ve heard people who are very intelligent talk about AGI and other futuristic stuff.
It may be that I am just a simple country ML ops engineering, but it demonstrates how hype, promises and science fiction are thrown around when discussing these systems.
Elizabeth Holmes was crucified because she made promises that couldn’t be kept.
Making impossible promises is a part of the startup culture. The main difference between Theranos’ AI hype and Theranos is that Theranos could not fake it for very long.
GPT is it a black-box? What happens to the data I store in GPT
GPT, as a design, is not a “black box”. The source code of GPT J and GPT Neo is available.
OpenAI’s GPT, however, is a black-box. OpenAI will probably continue to try and not release its model as Google does not release their algorithm.
It’s not because the algorithm is dangerous. If this were true, then they would not sell API subscriptions for any idiot with a computer. The value of this proprietary codebase is the reason.
OpenAI tools allow you to feed their API with your inputs. OpenAI is fed by everything you input.
HIPAA has been violated by people who used OpenAI’s GPT on patient data in order to write notes or other documents. The information has been stored in the model and will be difficult to remove.
It’s likely that the model has a lot of data waiting to be released, as it is difficult for people to understand.
Why does GPT train on hate speech
The text corpus that GPT is trained on contains hate speech.
OpenAI must train its models in order to recognize hate speech. It therefore needs a corpus of terms that include some of these terms.
OpenAI claims to have scrubbed out hate speech, but source documents include 4chan as well as a number of hate sites.
Browse the Web and absorb the bias.
It is impossible to avoid it. How can something understand or recognize hatred, biases and violence if it is not part of the training?
How can you understand and avoid implicit and explicit biases as a machine agent selecting the next token of a sentence based on statistical data?
GPT responding to some questions from
“Wikipedia’s list of common misconceptions
.”
TL;DR
AI is currently a boom characterized by hype and misinformation. This doesn’t mean that there aren’t legitimate uses. The technology is incredible and useful.
However, the way in which technology is marketed, and how it’s used, can lead to misinformation, plagiarism and even direct harm.
Use LLMs only when your life is at stake. Use a different algorithm when LLMs are not the best option. Don’t be fooled by hype.
Understanding LLMs – and what they are not – is essential
This Adam Conover Interview with Emily Bender and Timnit G. is highly recommended.
LLMs are powerful tools if used properly. LLMs can be used in many different ways. There are also a number of ways that they can be abused.
ChatGPT will not be your friend. It’s just a bunch statistics. Artificial general intelligence (AGI) isn’t yet “here.”
The post A guide for SEOs to understand large language models (LLMs),, appeared first on Search Engine land.