Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?
-
Hi guys, I've already added the following syntax in robots.txt to prevent search engines in crawling dynamic pages produce by my website's search feature: Disallow: /search/. But soft 404s are still showing in Google Webmaster Tools. Do I need to wait(it's been almost a week since I've added the following syntax in my robots.txt)? Thanks, JC
-
You could also look at using the meta robots = noindex tag on /search/ pages, rather than just blocking it in robots.txt, as this will remove existing URLs from the index.
-
Glad to help
-
Thanks a lot Dan!
-
That is a good recommendation but ultimately search engines will make a final decision on crawl frequency. Take a look at your 'Crawl Stats' on GWTs and this will give you an idea of how often your site is crawled.
-
Is the time issue related in crawl frequency of the URLs in my sitemap?
Thanks Dan, appreciate it.
-
You will probably need to wait a little longer - it depends how often your site usually gets crawled and indexed.
However, robots.txt does not always stop search engines from indexing your pages. It will stop them crawling a page on your site but it tells them that they can still index that page. If they find links from external sites then the URL may still appear in the SERP.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New website - not showing in Google?
This site was launched 3 days ago, bimcosupply.com and I'm trying to get it to show in Google just for a branded search for the moment (Bimco, Bimco Corporation, etc). The old site is still showing in search, bimcoplumbingsupplies.com instead. This site was taken down a while back. I set up a redirect for the domain in cPanel, and also set individual pages to redirect in WordPress on the bimcosupply.com site. I've verified the site in Google Search Console, submitted a sitemap and did URL inspection on each page. Each page is showing as indexed, though now when I search site:bimcosupply.com not all pages are there, and there are two results for the home page, one "https" and one "http." (Before today, all of the pages were showing so not sure what changed). I know this new domain does not have any (or very little) domain authority yet, but I would have thought that the site should display for branded search by now. So I'm concerned that something is wrong with the site, how the redirects are set up, etc. that is preventing it from displaying. Could anyone take a look and help me figure this out please?
Technical SEO | | browncreative0 -
How to use Google search console's 'Name change' tool?
Hi There, I'm having trouble performing a 'Name change' for a new website (rebrand and domain change) in Google Search console. Because the 301 redirects are in place (a requirement of the name change tool), Google can no longer verify the site, which means I can't complete the name change? To me, step two (301 redirect) conflicts with step there (site verification) - or is there a way to perform a 301 redirect and have the tool verify the old site? Any pointers in the right direction would be much appreciated. Cheers Ben
Technical SEO | | cmscss0 -
Image Search / sudden drop in traffic
One of our sites in Germany had a very sudden drop in traffic (starting Oct. 7th). The site gets most of it's organic traffic from Image Search. Checking in Search Console revealed that search volume for keywords increased in that period our average position is stable our click rate dropped dramatically (we double checked - searching the keyword in "anonymous mode" still showed our results for main keywords in top image positions (first 2 rows)). As an example (see attached screencopy) - keyword had clickrate of 1% (average) - dan dropped to 0.06% while the position remained stable. Germany is still using the "old" version of image search (unlike the rest of the world) - which gives the site preview rather than just the image slider when you click on a result in image search. Our first thought that this was changed - but it seems that it didn't change. Ideas what might cause this dramatic drop in click%? There have been no major technical modifications on the site for the last 2 months. thanks, Dirk GjlV8CW.jpg
Technical SEO | | DirkC0 -
Google Search Console Site Map Anomalies (HTTP vs HTTPS)
Hi I've just done my usual Monday morning review of clients Google Search Console (previously Webmaster Tools) dashboard and disturbed to see that for 1 client the Site Map section is reporting 95 pages submitted yet only 2 indexed (last time i looked last week it was reporting an expected level of indexed pages) here. It says the sitemap was submitted on the 10th March and processed yesterday. However in the 'Index Status' its showing a graph of growing indexed pages up to & including yesterday where they numbered 112 (so looks like all pages are indexed after all). Also the 'Crawl Stats' section is showing 186 pages crawled on the 26th. Then its listing sub site-maps all of which are non HTTPS (http) which seems very strange since the site is HTTPS and has been for a few months now and the main sitemap index url is an HTTPS: https://www.domain.com/sitemap_index.xml The sub sitemaps are:http://www.domain.com/marketing-sitemap.xmlhttp://www.domain.com/page-sitemap.xmlhttp://www.domain.com/post-sitemap.xmlThere are no 'Sitemap Errors' reported but there are 'Index Error' warnings for the above post-sitemap, copied below:_"When we tested a sample of the URLs from your Sitemap, we found that some of the URLs were unreachable. Please check your webserver for possible misconfiguration, as these errors may be caused by a server error (such as a 5xx error) or a network error between Googlebot and your server. All reachable URLs will still be submitted." _
Technical SEO | | Dan-Lawrence
Also for the below site map URL's: "Some URLs listed in this Sitemap have a high response time. This may indicate a problem with your server or with the content of the page" for:http://domain.com/en/post-sitemap.xmlANDhttps://www.domain.com/page-sitemap.xmlAND https://www.domain.com/post-sitemap.xmlI take it from all the above that the HTTPS sitemap is mainly fine and despite the reported 0 pages indexed in GSC sitemap section that they are in fact indexed as per the main 'Index Status' graph and that somehow some HTTP sitemap elements have been accidentally attached to the main HTTPS sitemap and the are causing these problems.What's best way forward to clean up this mess ? Resubmitting the HTTPS site map sounds like right option but seeing as the master url indexed is an https url cant see it making any difference until the http aspects are deleted/removed but how do you do that or even check that's what's needed ? Or should Google just sort this out eventually ? I see the graph in 'Crawl > Sitemaps > WebPages' is showing a consistent blue line of submitted pages but the red line of indexed pages drops to 0 for 3 - 5 days every 5 days or so. So fully indexed pages being reported for 5 day stretches then zero for a few days then indexed for another 5 days and so on ! ? Many ThanksDan0 -
How could you make a URL/Breadcrumb structure appear different in Google than when you click into site?
I'm seeing a competitor be able to make their URL/Breadcrumb stucture appear different in Google than on the site. Google shows a 3-4 category silo for the page but once clicked the page is off root. How could you do this?
Technical SEO | | TicketCity0 -
How to remove my cdn sub domins on Google search result?
A few months ago I moved all my Wordpress images into a sub domain. After I purchased CDN service, I again moved that images to my root domain. I added User-agent: * Disallow: / to my CDN domain. But now, when I perform site search on the Google, I found that my CDN sub domains are indexed by the Google. I think this will make duplicate content issue. I already hit by the Panguin. How do I remove these search results on Google? Should I add my cdn domain to webmaster tools to request URL removal request? Problem is, If I use cdn.mydomain.com it shows my www.mydomain.com. My blog:- http://goo.gl/58Utt site search result:- http://goo.gl/ElNwc
Technical SEO | | Godad1 -
Duplicate content /index.php/ issues
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.: www.mysite.com/category/article and www.mysite.com/index.php/category/article Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out. Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
Technical SEO | | b18turboef1 -
Google Cache is not showing in my page
Hello Everyone, I have issue in my Page, My category page (http://www.bannerbuzz.com/custom-vinyl-banners.html) is regular cached in past, but before sometime it can't show the cached result in SERP and not show in cached result , I have also fetch this link in google web master, but can't get the result, it is showing following message. 404. That’s an error. The requested URL /search?q=cache%3A http%3A//www.bannerbuzz.com/custom-vinyl-banners.html was not found on this server. That’s all we know. My category page rank is 2 and its keyword is on first in google.com, so i am little bit worried about this page cache issue, Can someone please tell me why is this happening? Is this a temporary issue? Help me to solve out this cache issue and once again my page will regularly cache in future. Thanks
Technical SEO | | CommercePundit0