Google Crawler Error / restricting crawling
-
Hi
On a Magento Instance we manage there is an advanced search. As part of the ongoing enhancement of the instance we altered the advance search options so there are less and more relevant.
The issue is Google has crawled and catalogued the advanced search with the now removed options in the query string. Google keeps crawling these out of date advanced searches. These stale searches now create a 500 error.
Currently Google is attempting to crawl these pages twice a day.
I have implemented the following to stop this:-
1. Submitted requested the url be removed via Webmaster tools, selecting the directory option using uri:
http://www.domian.com/catalogsearch/advanced/result/
2. Added Disallow to robots.txt
Disallow: /catalogsearch/advanced/result/* Disallow: /catalogsearch/advanced/result/
3. Add rel="nofollow" to the links in the site linking to the advanced search.
Below is a list of the links it is crawling or attempting to crawl, 12 links crawled twice a day each resulting in a 500 status.
Can anything else be done?
-
Seems like you've done everything right. You could also add a Meta robots "NOINDEX, FOLLOW" to those pages.
I'd also double check the referring "linked from" referrer in Webmasters tools just to make sure you haven't missed any live followed links pointing to those pages.
When did you submit the removal request, and what is the status? (approved, denied, pending?) Another question, are those pages in Google's index?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate pages with "/" and without "/"
I seem to have duplicate pages like the examples below: https://example.com https://example.com/ This is happening on 3 pages and I'm not sure why or how to fix it. The first (https://example.com) is what I want and is what I have all my canonicals set too, but that doesn't seem to be doing anything. I've also setup 301 redirects for each page with "/" to be redirected to the page without it. Doing this didn't seem to fix anything as when I use the (https://example.com/) URL it doesn't redirect to (https://example.com) like it's supposed to. This issue has been going on for some time, so any help would be much appreciated. I'm using Squarespace as the design/hosting site.
Technical SEO | | granitemountain0 -
Google has deindexed 40% of my site because it's having problems crawling it
Hi Last week i got my fifth email saying 'Google can't access your site'. The first one i got in early November. Since then my site has gone from almost 80k pages indexed to less than 45k pages and the number is lowering even though we post daily about 100 new articles (it's a online newspaper). The site i'm talking about is http://www.gazetaexpress.com/ We have to deal with DDoS attacks most of the time, so our server guy has implemented a firewall to protect the site from these attacks. We suspect that it's the firewall that is blocking google bots to crawl and index our site. But then things get more interesting, some parts of the site are being crawled regularly and some others not at all. If the firewall was to stop google bots from crawling the site, why some parts of the site are being crawled with no problems and others aren't? In the screenshot attached to this post you will see how Google Webmasters is reporting these errors. In this link, it says that if 'Error' status happens again you should contact Google Webmaster support because something is preventing Google to fetch the site. I used the Feedback form in Google Webmasters to report this error about two months ago but haven't heard from them. Did i use the wrong form to contact them, if yes how can i reach them and tell about my problem? If you need more details feel free to ask. I will appreciate any help. Thank you in advance C43svbv.png?1
Technical SEO | | Bajram.Kurtishaj1 -
Sitemap and crawl impact
If I have two links in the sitemap (for example: page1.html and page2.html) but the web-site contains more pages (page1.html, page2.html and page3.html) is this a sign for Google to not to crawl other pages? I.e. Will Google index page3.html? Consider that any page can be accessed.
Technical SEO | | ditoroin0 -
I cannot find a way to implement to the 2 Link method as shown in this post: http://searchengineland.com/the-definitive-guide-to-google-authorship-markup-123218
Did Google stop offering the 2 link method of verification for Authorship? See this post below: http://searchengineland.com/the-definitive-guide-to-google-authorship-markup-123218 And see this: http://www.seomoz.org/blog/using-passive-link-building-to-build-links-with-no-budget In both articles the authors talk about how to set up Authorship snippets for posts on blogs where they have no bio page and no email verification just by linking directly from the content to their Google+ profile and then by linking the from the the Google+ profile page (in the Contributor to section) to the blog home page. But this does not work no matter how many ways I trie it. Did Google stop offering this method?
Technical SEO | | jeff.interactive0 -
Google Places Question......
Hi Guys. I am working with a photographer they do not have a studio they shoot on location. However I noticed many photographers within their industry have their home address listed in their google places, and they too shoot on location. My client doesn't want their home address listed so I wondered what options there would be? Do you think renting mail forwarding address would suffice?
Technical SEO | | RankStealer0 -
Why is Google stripping/replacing my TITLE tag for the site with the BRAND Name only when looking at BRAND level search
When doing a search in Google (US Proxy) - Google is stripping and replacing my functional TITLE with the brand name only (say 'Nike'), but if you do a specific search term like ('buy nike shoes') and see a top 10 listing for my site's homepage, now the title works and shows correctly. I saw this a few years ago with another one of my company domains, but didn't ask the question as it worked out. Thanks for any insight.. NOTE: It's not damaging any results, or rankings for the site.. but: when searching for BRAND name of the company, like I explained, it's replacing a optimized title for the BRAND name, and then re-placing it naturally when deep search brings up the homepage and the TITLE looks fine.. Very weird at best! Thanks, Rob
Technical SEO | | RobMay0 -
Tracking a Crawl error
Hi All, If you find a crawl error on your page. How do you find it? The error only says the URL that is wrong but this is not the location. Can i drill down and find out more information? Thank you!
Technical SEO | | wedmonds0 -
Why is this url showing as "not crawled" on opensiteexplorer, but still showing up in Google's index?
The below url is showing up as "not crawled" on opensitexplorer.com, but when you google the title tag "Joel Roberts, Our Family Doctors - Doctor in Clearwater, FL" it is showing up in the Google index. Can you explain why this is happening? Thank you http://doctor.webmd.com/physician_finder/profile.aspx?sponsor=core&pid=14ef09dd-e216-4369-99d3-460aa3c4f1ce
Technical SEO | | nicole.healthline0