Google Crawler Error / restricting crawling
-
Hi
On a Magento Instance we manage there is an advanced search. As part of the ongoing enhancement of the instance we altered the advance search options so there are less and more relevant.
The issue is Google has crawled and catalogued the advanced search with the now removed options in the query string. Google keeps crawling these out of date advanced searches. These stale searches now create a 500 error.
Currently Google is attempting to crawl these pages twice a day.
I have implemented the following to stop this:-
1. Submitted requested the url be removed via Webmaster tools, selecting the directory option using uri:
http://www.domian.com/catalogsearch/advanced/result/
2. Added Disallow to robots.txt
Disallow: /catalogsearch/advanced/result/* Disallow: /catalogsearch/advanced/result/
3. Add rel="nofollow" to the links in the site linking to the advanced search.
Below is a list of the links it is crawling or attempting to crawl, 12 links crawled twice a day each resulting in a 500 status.
Can anything else be done?
-
Seems like you've done everything right. You could also add a Meta robots "NOINDEX, FOLLOW" to those pages.
I'd also double check the referring "linked from" referrer in Webmasters tools just to make sure you haven't missed any live followed links pointing to those pages.
When did you submit the removal request, and what is the status? (approved, denied, pending?) Another question, are those pages in Google's index?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Client suffered a malware attack. Removed links not being crawled by Google!
Hi all, My client suffered a malware attack a few weeks ago where an external site somehow created 700 plus links on my clients site with their content. I removed all of the content and redirected the pages to the home page. I then created a new temporary xml sitemap with those 700 links and submitted the sitemap to Google 9 days ago. Google has crawled the sitemap a few times but not the individual links. When I click on the crawl report for the sitemap in GSC, I see that the individual links still have the last crawled date from before they were removed. So in Googles eyes, that old malicioud content still exists. What do I do to ensure Google knows the contnt is gone and redirected? Thanks!
Technical SEO | | sk19900 -
Not all images indexed in Google
Hi all, Recently, got an unusual issue with images in Google index. We have more than 1,500 images in our sitemap, but according to Search Console only 273 of those are indexed. If I check Google image search directly, I find more images in index, but still not all of them. For example this post has 28 images and only 17 are indexed in Google image. This is happening to other posts as well. Checked all possible reasons (missing alt, image as background, file size, fetch and render in Search Console), but none of these are relevant in our case. So, everything looks fine, but not all images are in index. Any ideas on this issue? Your feedback is much appreciated, thanks
Technical SEO | | flo_seo1 -
Why Google crawl parameter URLs?
Hi SEO Masters, Google is indexing this parameter URLs - 1- xyz.com/f1/f2/page?jewelry_styles=6165-4188-4184-4192-4180-6109-4191-6110&mode=li_23&p=2&filterable_stone_shapes=4114 2- xyz.com/f1/f2/page?jewelry_styles=6165-4188-4184-4192-4180-4169-4195&mode=li_23&p=2&filterable_stone_shapes=4115&filterable_metal_types=4163 I have handled by Google parameter like this - jewelry_styles= Narrows Let Googlebot decide mode= None Representative URL p= Paginates Let Googlebot decide filterable_stone_shapes= Narrows Let Googlebot decide filterable_metal_types= Narrows Let Googlebot decide and Canonical for both pages - xyz.com/f1/f2/page?p=2 So can you suggest me why Google indexed all related pages with this - xyz.com/f1/f2/page?p=2 But I have no issue with first page - xyz.com/f1/f2/page (with any parameter). Cononical of first page is working perfectly. Thanks
Technical SEO | | Rajesh.Prajapati
Rajesh0 -
Fetch as Google issues
HI all, Recently, well a couple of months back, I finally got around to switching our sites over to HTTPS://. In terms of rankings etc all looks fine and we have not move about much, only the usual fluctuations of a place or two on a daily basis in a competitive niche. All links have been updated, redirects in place, the usual https domain migration stuff. I am however, troubled by one thing! I cannot for love nor money get Google to fetch my site in GSC. No matter what I have tried it continues to display "Temporarily unreachable". I have checked the robots.txt and it is on a new https:// profile in GSC. Has anyone got a clue as I am stumped! Have I simply become blinded by looking too much??? Site in Q. caravanguard co uk. Cheers and looking forward to your comments.... Tim
Technical SEO | | TimHolmes0 -
Is google all over the place tonight?
Is it me or is google all over the place tonight? Whilst checking my rankings I came across a site with a page authority of 29 and 23 links from 5 domains ranking at number 6 for a competitive keyword! This site came from nowhere and I'm getting different results every time I search! Weird....
Technical SEO | | SamCUK0 -
Getting multiple errors for domain.com/xxxx/xxxx/feed/feed/feed/feed...
A recent SEOMoz crawl report is showing a bunch 404's and duplicate page content on pages with urls like http://domain.com/categories/about/feed/feed/feed/feed/feed and on and on. This is a wordpress install. Does anyone know what could be causing this or why SEOMoz would be trying to read these non-existent feed pages?
Technical SEO | | Brandtailers0 -
Google not using <title>for SERP?</title>
Today I noticed that Google is not using my title tag for one of my pages. Search for "covered call search" Look at organic result 6: Search - Covered Calls Covered call screener filters 150000 options instantly to find the best high yield covered calls that meet your custom criteria. Free newsletter.<cite>https://www.borntosell.com/search</cite> - CachedNow, if you click through to that page you see the meta title tag is:Covered Call ScreenerEven the cached version shows the title tag as Covered Call ScreenerI am not logged in, so I don't believe personalization has anything to do with it.Have others seen this before?It is possible that "search - covered calls" was the title tag 9 months ago (before I understood SEO); I honestly don't remember. I cleaned all my titles up at least 6 months ago.Can I force Google to re-index the page? Its content has changed a few times in the last few months, and Google crawls my site frequently according to webmaster tools.
Technical SEO | | scanlin0