is the potential for disruption in the future of generative AI technology? //

These days, there’s a lot said about how generative artificial intelligence could eliminate jobs. It’s not given as much consideration to how people can put generative AI out-of-work. They could, and they very well may.

GenAI, and the foundation models that it is built on, are at the peak of Gartner’s hype cycle. Gartner’s models suggest that these tools could be on the verge of a “trough disillusionment”, before they emerge a few more years later with a plateau of productive productivity.

However, there’s a case to be made that the trough disillusionment will swallow genAI for good. Users face the real risk of being unable to trust “intelligence” that is essentially amoral and unconscious. They also have to worry about copyright issues and privacy concerns.

Let’s order them.

What is the national register of Do Not Scrape?

Publishers monetize content. Publishers do not want third parties to monetize content without their permission, as they have probably already paid for the content. Professional authors monetize their writing. Both do not want third parties to profit from their works without remunerating the creator. All that I have said about written content is also applicable to video, graphic and other creative content.

Copyright laws protect authors and publishers from direct theft. These laws don’t work with genAI, because the crawler uses so many different sources that its final output might not closely resemble any one source (although this can happen).

Publishers are currently actively searching for ways to stop LLMs scraping their content. It’s a difficult technical challenge

In the video MarTech contributor Greg Krehbiel explains how publishers could try to stop LLMs. He makes the case for changing the terms and conditions in order to set up future lawsuits. He seems to admit that none of his ideas are guaranteed. Is it possible to stop Google from crawling your website to grab content, without stopping it from crawling it to place it in the search results? Lawsuits can be expensive.

How about a regulatory solution? Remember the telemarketing calls that were so annoying? This was stopped by the National Do Not Call Register. Telemarketers were only allowed to continue calling the number if the FTC was going to impose heavy fines.

It may be more difficult to register domains on a National Do Not Scrape Register, but it is possible to see how a regulatory strategy of this kind might work. Every infraction would be detected? No, of course not. The same is true, for instance, of GDPR. GDPR is not enforced because of every violation, but rather because severe sanctions can be imposed on those who are found to have violated the law.

It’s already too late. GenAI already has the data

Hasn’t the horse left the barn, regardless of whether there is a regulatory or technical fix to stop genAI from stealing content? LLMs were already trained with inconceivably huge datasets. Although they are prone to errors, there is a sense that they know it all.

They know all about the past, up until a few years ago. ChatGPT-4 has been pre-trained with data up to September 2021. This means it’s got a lot of things that it’s not aware of. Remind yourself of the situation.

Dig deeper: Artificial Intelligence: A beginner’s guide

GenAI uses algorithms to predict the next-best-piece-of-text to create, based on all those millions of pieces of text on which it was trained. It is “intelligent”, because it can improve itself based on the feedback and responses (a human does not have to tweak the algorithms).

This is why genAI can’t learn about the world outside of its training data. This supports the claim made by philosophers such as Donald Davidson 1,, that AI does not have causal relationships with the world. When I want to see if it is raining, I do not rely on datasets. I simply look out the window. Technically, genAI has a great syntax (grammar), yet it is a stranger to meaning.

This means that AI relies on creatures like us who are causally linked to the world. We can tell when it rains, whether there is a full moon in the sky or if Jefferson wrote the Declaration of Independence. It has so far been reliant on the actions of humans in the past. It must depend on the actions of individuals to remain relevant.

In the future, if LLMs are unable to scrape human-created content, they won’t be able update, correct, or augment their datasets. It might take a while, but the demise of LLMs is more or less assured.

Please, don’t touch my PII.

There’s a very real issue that genAI faces in the near future, aside from the desire of publishers, authors, and other content creators to keep it away from their work. It is necessary to ensure that in scraping millions and gigabytes worth of data, the data does not contain any personally identifiable information (PII), or other data types protected by current regulations.

It is enough to say that European Courts tend to be more sympathetic towards the rights of citizens than big tech’s profit.

We haven’t even touched on trust or safety. Afraz Jaffri from Gartner, an AI hype cycle specialist, addressed these concerns in a recent conversation.

First, there is the issue of trust. There’s still the feeling that, despite external regulations and standards, it is very difficult to control model outputs or guarantee they are correct. This is a major obstacle.




What is the future of genAI? Gartner hype cycle


Does all of this activate the off switch?

It is easy to say genAI will be around for a long time. Many people have said this. It is unlikely that a technology development of this magnitude — even if it’s not completely novel — will be ignored or forgotten. Organizations will at least continue to use this capability on their own datasets or carefully determined external datasets and that will satisfy many important use case.

The chances of genAI being disrupted, constrained, and greatly altered by a combination of regulatory barriers, legal challenges and trust issues — as well as other obstacles that are yet unknown — is far above zero.

  1. Donald Davison “Turing’s Test”, mind 59, 1950

The first time MarTech published the post What can disrupt the future of generative AI?.

Leave a Reply

Your email address will not be published. Required fields are marked *