search engines detect AI content? //
AI tools have had a dramatic impact on digital marketers in the last year, particularly those who work in and SEO.
Marketers have used AI to help with content creation, but the results are mixed.
A question that is frequently asked, despite ethical issues, is “Can search engine detect my AI content?”
This question is considered particularly important, because if you answer “no”, it invalidates a lot of other questions regarding whether or how AI should used.
Machine-generated content has a long history
Machine-generated or Assisted Content Creation may be a new phenomenon, but it is not a bad thing.
News websites are required to break stories first. They have used data from sources such as stock exchanges and seismometers for years to create content faster.
It’s entirely factually accurate to publish an article with a robot that states:
- The first earthquake in [location and city] since [date last event] was a [magnitude] [earthquake] detected at [time]/[date] on [location]/[date] morning. “More news to come.”
These updates are helpful for the reader, who needs to know this information quickly.
On the other hand, we have seen many “blackhats” of machine-generated material.
Google has been condemning for years now, using Markov chains as a way to generate text and low-effort content.
The meaning of the phrase “no value added” is interesting and for many a source of confusion.
What value can LLMs bring?
Due to GPTx Large Language Models and ChatGPT which are AI chatbots that have been fine-tuned, the popularity of AI content has soared.
These tools are important to know, but without going into the technical details:
The generated text is calculated using a probability distribution
- If you write “Being a SEO is fun because …,”, the LLM will look at all the tokens to determine the next likely word using its training set. You can imagine it as an advanced version of the predictive text on your phone.
ChatGPT is an artificial intelligence that generates its own knowledge.
- The output is unpredictable. It has a random element and may respond differently to a given prompt.
Once you understand these two points it is clear that ChatGPT does not “know” or have traditional knowledge. This is the reason for the “hallucinations”, or errors as they are sometimes called.
The two points above reveal that ChatGPT is a tool without traditional knowledge, or understanding. This leads to “hallucinations” or errors.
Several outputs show how this approach producesincorrect outcomes and causes ChatGPT contradict itself repeatedly.
The possibility of hallucinations is a serious concern when it comes to “adding value” in AI-written texts.
It’s not easy to fix this problem without a change in the way LLMs produce text.
It is important to keep this in mind, particularly when discussing Your Money, Your Life topics, as they can cause serious damage to people’s lives or finances if incorrect.
This year, major publications such as Men’s Health or CNET have been caught publishing AI-generated incorrect information.
Google also has a problem with YMYL-related content in its Search Generative Experience.
Google has stated that it will be cautious with the generated answers, even going so far as to say “won’t show an answering to a query about giving a Tylenol to a child because it’s in the medical area,” but the SGE can do this just by asking the question.
Google’s SGE (Segregated Google Earth) and MUM
Google clearly believes that machine-generated content can be used to answer user queries. Google has been hinting at this since they announced MUM in May 2021.
MUM was challenged to solve a problem based on data showing that users issue an average of eight queries for complex tasks.
The searcher can learn more information by submitting an initial query. This will prompt related searches, and new pages to answer the queries.
Google’s proposal: What if Google could anticipate the user’s follow-up queries and generate a complete answer using its index knowledge?
While this may work for users, it will also eliminate many “long-tail”, zero volume keyword strategies, that SEOs use to gain a foothold in the SERPs.
Many questions can be “solved” if Google is able to identify queries that are suitable for AI generated answers.
This raises a question…
- Why would Google display a searcher’s webpage with an answer pre-generated when they could retain the user in their search ecosystem, and generate the solution themselves?
Google has an incentive to keep its users in the ecosystem. We’ve seen many approaches to accomplish this goal, from featured excerpts, to letting users search for flights on the SERPs.
Google may decide that your generated text is not worth more than what the search engine can provide. Then, the question becomes one of cost versus benefit.
Can they generate more long-term revenue by absorbing costs of generation, and making the users wait for a response, versus sending them quickly and inexpensively to a webpage they already know exists?
AI Content Detection
With the rapid growth of ChatGPT, dozens of ” AI Content Detectors ” were developed. These allow users to enter text and receive a percentage score.
There are some differences in the way that different detectors score this percentage, but they all give almost the same result: the percentage of certainty that the provided text was generated by AI.
It can be confusing to label the percentage, such as “75% AI/25 % Human.”
This is often misinterpreted as “the text was 75% written by an AI, and 25% by humans,” but it actually means “I’m 75% sure that an AI wrote all of this text.”
Some have offered advice to “tweak” text input in order to pass an AI detector.
The use of a double exclamation point (!!), for example, is a very human characteristic. This is a human characteristic. If you add it to a text generated by AI, the AI detector will give a score of “99%+ Human”.
It is then interpreted as if you “tricked” the detector.
The passage provided by AI is not 100% accurate.
This false conclusion that you can “fool”, AI detectors, is often confused with the fact that search engines like Google do not detect AI content. This gives website owners a false feeling of security.
Google’s AI policies and actions
Google’s AI statements have always been vague, giving them some wiggle room in terms of enforcement.
The updated guidelines were published in Google Search Central this year that explicitly says:
We focus on quality content rather than the production methods.
Danny Sullivan, Google Search Liaison, jumped into the Twitter conversation to confirm that “we haven’t said AI is bad”.
Google provides specific examples on how AI can produce useful content such as sports scores and weather forecasts.
Google’s spam policies are clear: “to create content for the sole purpose of manipulating search rankings is a violation.”
Google has a lot of experience with SERP manipulation. They claim that SpamBrain and other improvements to their systems have rendered 99% percent of searches spam-free. This would include UGC, scraping, content generation, cloaking, etc.
Many people have conducted tests to determine how Google reacts when AI content is presented and where the quality line is drawn.
I created an unsupervised GPT3-based website with 10,000 pages that answered Questions about video games.
The site grew steadily and quickly with minimal links. It now receives thousands of visitors each month.
Google suppressed a site almost entirely and suddenly during two system updates that occurred in 2022. The first was the Helpful content update.
Google Search console data from AI test site
This experiment cannot be used to say that AI content is not effective.
This showed me that Google was at that time a very popular search engine.
- The GPT-3 unsupervised content was not classified as “quality”.
- Can detect and remove these results using a variety of signals.
You need to ask a better question to get the best answer
Can search engines detect AI? This is probably the wrong question.
It is at best a short-term perspective.
LLMs have a difficult time producing “high-quality content” in most topics. This is based on factual accuracy, and the ability to meet Google’s A-E-A criteria.
AI has made significant progress in providing answers to queries that were previously devoid of content. This trend could fade as Google strives to achieve loftier goals in the long term with SGE.
Google’s Knowledge Systems will provide answers for many longtail questions, rather than directing users away to a multitude of small sites.
The article Can search engine detect AI content? first appeared on Search Engine Land.