to remove sensitive data from Google’s Index //

Better keyword rankings. Traffic increase. Conversions from organic searches. These KPIs are used to measure SEO results.

Some consultants and agencies manage SEO campaigns for clients without considering a crucial element:

Google Search Results: Preventing confidential content to appear

Neglecting this can lead to a breach in trust or costly litigation, which could ultimately result in the end of a client-client relationship.

This can be avoided if you understand how easy it is for client data to enter Google’s search index, and how to prevent this.

Discover the important search indexing issue that many SEOs overlook, including the accidental exposure of client information on Google and how to deindex this content.

What I did when I found sensitive information

I am a full-time, independent SEO consultant. Since 2018, I have worked with various midsize companies to improve organic search results.

To check the results of a technical audit on SEO, I use Google’s site search operator by entering site:domain.com. I can see quickly how different URLs, site titles and snippets appear across various page categories.

I notice patterns in what gets indexed. Perhaps adding keywords to the operator will help me to be more specific.

Most clients will have noticed that dev/testing/staging websites are getting indexed. I also notice thin content reducing link equity, or harming search efficacy, (or leading keyword cannibalization), and paid landing pages which aren’t intended to rank.

I’ve noticed that SaaS customers are exhibiting a pattern of behavior that is alarming.

Pages that are indexed under subdomains, which no one in marketing or product ever thought about.

Subdomains for customers that allow them to customize their login experience are the most innocent (e.g. client.example.com).

A client may still not want their name to appear in the search results. This could be a way to differentiate your product or expose it as vulnerable.

Web-based forms that collect data from specific individuals could be used in more serious cases.

A lack of password protection can lead to the access and modification of form fields.

These are not directly related to organic search but I am quick to mention them. I thought that there was a lot at stake.

This became a “all hands on deck” problem at least in several cases. I was told to remove this data from search results as quickly as possible.

A CEO stated that his security consultants had never brought up this possibility. This was found quickly by a simple step that many SEOs perform in an audit.

It is fair to say that it takes a lot of searching to find this type of page.

Consider the strange searches your clients or even your team would make. Never forget that 15 % of Google’s search queries are unique!

Even if it is not a legal problem, sensitive information in search results that are found first by clients could still damage your relationship.

Search for daily newsletters that marketers use.

“> “> “>

Processing…Please wait.

Why are these data on Google?

It only takes a single, unobtrusive link from any search engine, anywhere on the internet, to a specific page:

Awareness is half the battle. You can start the process of removing the pages from Google search immediately after you have identified the pages that need to be removed.

How to deindex Google content quickly

Search for patterns in the URLs that contain sensitive data displayed by Google.

It’s not uncommon to have a web-based SaaS version housed in a subdomain called data.example.com. Use the site search operator in order to scan through results.

You can view all URLs indexed by Google Search Console using the Page Indexing Report.

This may not be all. You may find it helpful to contact your product team, who can provide you with more information.

Double-check the URLs

Use the tool to inspect the URLs in GSC. If the links are not at the locations you originally found, confirm by using.

Consider all URL variations that can canonicalize to what you see when searching.

The alternate versions of the URL may be indexed if the canonical version is removed.

Use the pattern (the 2nd radio button under New Request), a likely subdomain or list all URLs by creating a new request using the GSC Removals tool.

The URL Inspection tool can be used to speed up the removal of a small number of pages. It will also confirm that the current status is correct. You must do this one by one. You can also use Microsoft Bing’s Block URL Tool, which is not as big as Google, but it works just the same.

If you follow these steps, your removal from Google will last only six months.

This will not stop the problem from happening again or on other search engine, so there is a last step you need to complete.

How to permanently remove content from Google

There are two methods that can be used.

1. Use the noindex Meta Robots Tag on those pages.

Your web developer should add this to all page templates.

Note Don’t use robots.txt rules that disallow crawling ( exceptions for images), as they only work when there is no problem. A disallow block crawling, but not indexing.

2. Gate the content

By password-protecting files or webpages, you can ensure that only authorized users have access to them. It is another way to prevent your content from being displayed on Google.

Search results that contain sensitive information are not displayed

You can be assured that after taking these steps, pages with sensitive data about clients will not reappear in Google’s search results. Pages are usually removed within one day.

You should always tell your customers what has happened. Nothing on the Web ever completely disappears.

The article How do I remove sensitive client information from Google’s search engine first appeared on Search Engine Land.

Leave a Reply

Your email address will not be published. Required fields are marked *