le uses a ChatGPT-like technology to detect spam and AI content and rank websites. //

Although the headline is misleading, it is only true insofar that it uses the term ChatGPT.

Instead of calling the technology “ChatGPT like”, it lets you, the reader know immediately what type of technology I am referring to. (Also the former wouldn’t be as clickable …).

This article will focus on an older but still relevant Google paper, ” Generative models are unsupervised predictors of page quality: A colossal-scale study.”

What’s the paper all about?

Let’s begin with the description of these authors. The authors introduce the topic as follows:

“Many people have expressed concern about the dangers of neural-text generators in the wild due to their ability to create human-looking text at large scales

Recently, machine-generated content on the internet has been monitored by classifiers that can distinguish between machine-generated and human text. However, little work has been done to apply these classifiers for other purposes, despite their appealing property of not requiring labels – they only require a corpus and a generative model. We show that page quality can be classified using a combination of off-the-shelf machine and human discriminators. Texts that are machine-generated often appear unintelligible or incoherent. We apply the classifiers on half a million English webpages em> to understand why there is low page quality.

They are basically saying that the same AI-based copy detection classifiers can also be used to detect low quality content using the same models.

This leaves us with a crucial question:

Is this cause (i.e. the system picking up it because it’s really good at it) or relationship? (i.e. is there a lot of spam that was created in a way that is simple to use with better tools)

Let’s first look at the work of some authors and their conclusions before we get into that.

The set-up

They used the following as a reference:

It is possible that non-AI-generated content is evident by the use of red and purple. It is a joy to say that this paper was not generated by GPT.

Get the daily newsletter search marketers rely on.

” />
” />
” />
input type=”inlineEmail control rounded-0, w-100″ placeholder=”Enter business email here.” required=”” type=”email”/>

Processing…Please wait.

A side note about spam prevalence

I want to briefly mention some of the interesting discoveries made by the authors. The following illustration (Figure 3 of the paper) illustrates one such finding:

It is important to pay attention to the score below each graph. A score of 1.0 indicates that spam content is more likely to be found. We are seeing that low-quality documents have been a problem since 2017 (and spiking in 2019).

They also found that low-quality content had a greater impact on certain sectors than others. (Remember, a higher score indicates a higher likelihood of spam).

I was confused by a few of these. It made perfect sense to me.

Books and literature came as a surprise. So was health, until the authors mentioned Viagra and other adult health product sites as “health”, and essay farms and “literature”.

These findings were published

Apart from the discussion about the sectors and the 2019 spike, the authors found many interesting things that SEOs could learn from. This is especially important as we begin to use tools like chatGPT.

These authors don’t claim that this is the end-all-be all solution. They simply suggest a starting point. I’m certain they have moved the bar up in the last few years.

Note about AI-generated content

Over the years, language models have also evolved. GPT-3 was not yet available when this paper was published. However, detectors used by the researchers were based upon GPT-2, which is a much inferior model.

GPT-4 is expected to be available soon and Google’s Sparrow will be released later in the year. Combinations will become easier to find, not only are the technology improving on both sides (search engines vs content generators), but also the battleground is becoming more competitive.

Google can detect Sparrow and GPT-4 content. Maybe.

What if the Sparrow generated it and sent it to GPT-4 with a prompt for rewrite?

Remember that this paper is based on autoregressive models. They predict a score for each word based upon what they expect that word to be like given the preceding words.

The detection of AI may fall as models become more sophisticated and create full ideas instead of just a few words.

However, AI-generated content may be the best option for detecting poor content.

The post Google uses a ChatGPT-like system to detect spam and AI content and rank websites appeared first on Search Engine Land.

Leave a Reply

Your email address will not be published. Required fields are marked *