Troubleshooting Warning Indexed Although Blocked By Robots.txt - poprevaeng

    Social Items

poprevaeng

This Blog Shares Latest Information About Technology and tips for daily life


Troubleshooting Indexed Warning Although Blocked By Robots.txt - In the Google Search Console (new console version) an indexing warning problem arises, even though it is blocked by robots.txt especially for blogs that use the Blogger platform.

If we check all indexed URLs, even though they are blocked by robots.txt these are all Search pages, that is, for the Label Search page and for the old post navigation page.

As shown that these pages are indexed, even if they are blocked by robots.txt. That's because bloggers use robots.txt as follows:




User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourdomain.com/sitemap.xml

The robots.txt above shows that all Search pages are not allowed to be banned.

However, because these search pages are linked to blogs such as breadcrumbs , menus or label widgets or next prev navigation , these pages are still crawled by bots.

To overcome this problem, it is recommended that these pages be allowed to be crawled by bots and displayed in search results.

Please replace robots.txt with the following code if you are using robots.txt as above.


User-agent: *
Disallow:

Sitemap: https://www.yourdomain.com/sitemap.xml
Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: https://www.yourdomain.com/feeds/posts/default
Sitemap: https://www.yourdomain.com/sitemap-pages.xml

Please replace the code marked with your blog domain.

For the following code, make a new line if your blog post is above 500.

Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=501&max-results=500


And so on, if the post is above 1000, then make a new line again as follows:

Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1001&max-results=500

Then please save the following noindex meta tag code in the <head> blog section to block bots on archive pages, search, labels and not display them on Google search results pages.


<b:if cond='data:view.isArchive'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchQuery'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchLabel'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>

And make sure you don't use the blogger archive widget.

After all of the above is done, please submit robots.txt your new robots.txt testing tool that Google quickly recognize your new robots.txt

Then enter the Console and validate the Indexed warning, even if it is blocked by robots.txt and please continue to monitor Search Console.

For other results I will update this post later.

Troubleshooting Warning Indexed Although Blocked By Robots.txt


Troubleshooting Indexed Warning Although Blocked By Robots.txt - In the Google Search Console (new console version) an indexing warning problem arises, even though it is blocked by robots.txt especially for blogs that use the Blogger platform.

If we check all indexed URLs, even though they are blocked by robots.txt these are all Search pages, that is, for the Label Search page and for the old post navigation page.

As shown that these pages are indexed, even if they are blocked by robots.txt. That's because bloggers use robots.txt as follows:




User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.yourdomain.com/sitemap.xml

The robots.txt above shows that all Search pages are not allowed to be banned.

However, because these search pages are linked to blogs such as breadcrumbs , menus or label widgets or next prev navigation , these pages are still crawled by bots.

To overcome this problem, it is recommended that these pages be allowed to be crawled by bots and displayed in search results.

Please replace robots.txt with the following code if you are using robots.txt as above.


User-agent: *
Disallow:

Sitemap: https://www.yourdomain.com/sitemap.xml
Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: https://www.yourdomain.com/feeds/posts/default
Sitemap: https://www.yourdomain.com/sitemap-pages.xml

Please replace the code marked with your blog domain.

For the following code, make a new line if your blog post is above 500.

Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=501&max-results=500


And so on, if the post is above 1000, then make a new line again as follows:

Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1001&max-results=500

Then please save the following noindex meta tag code in the <head> blog section to block bots on archive pages, search, labels and not display them on Google search results pages.


<b:if cond='data:view.isArchive'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchQuery'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchLabel'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>

And make sure you don't use the blogger archive widget.

After all of the above is done, please submit robots.txt your new robots.txt testing tool that Google quickly recognize your new robots.txt

Then enter the Console and validate the Indexed warning, even if it is blocked by robots.txt and please continue to monitor Search Console.

For other results I will update this post later.
Load Comments

Subscribe Our Newsletter

Notifications

Disqus Logo