Interest in optimise Google Crawl
-
Hello,
I have an ecommerce site with all pages crawled and indexed by Google.
But I have some pages with multiple urls like : www.sitename.com/product-name.html and www.sitename.com/category/product-name.html
There is a canonical on all these pages linking to the simplest url (so Google index only one page). So the multiple pages are not indexed, but Google still comes crawling them.
My question is : Did I have any interest in avoiding Google to crawl these pages or not ?
My point is that Google crawl around 1500 pages a day on my site, but there are only 800 real pages and they are all indexed on Google. There is no particular issue, so is it interesting to make it change ?
Thanks
-
Hi!
Have you no indexed the pages too? That may help to make sure that they aren't being crawled if that's concerning you. May at least give Google another signal not to crawl those pages.
Obviously it's not a catch all as there's only so much you can do to tell Google not to crawl a page. Sometimes if the alternative page is linked to internally (which it sounds like it is), then it will automatically crawl it even though you've said it has a canonical on it as you're showing that the page is important to your site.
May be worth testing a few pages to see if it has an impact.
-
Hi there!
From my experience, the best results I was ever able to achieve for a Client is when we consolidated all URLs to a single URL solution. Canonicals are amazing, no doubt. But I've experienced a canonical structure being ignored if there are instances where the canonical structure isn't 100% 'correct.'
If there is a way that you can have your website navigation & internal/XML sitemap reinforce your preferred URL, that would certainly reduce the number of URLs Google would crawl. Then, if you permanently (301) redirect all the now non-navigable URLs to the single preferred URL, you should see a significant boost in traffic (from consolidating all of the authority into a single page, now reinforced throughout your entire website).
If that's not possible, and you have to have multiple URLs within your site for budget/platform constraints, then yes, let Google crawl them. Otherwise the algo won't be able to see your canonical tag across them.
So in short: If you have a means to reduce the number of duplicates and redirect them - awesome. If you don't have a means to reduce duplicates, opening them up to Google is good, too.
For more information on making sure your canonical structure is set up properly, check out this Moz blog post: https://moz.com/blog/rel-confused-answers-to-your-rel-canonical-questions
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Pagination Changes
What with Google recently coming out and saying they're basically ignoring paginated pages, I'm considering the link structure of our new, sooner to launch ecommerce site (moving from an old site to a new one with identical URL structure less a few 404s). Currently our new site shows 20 products per page but with this change by Google it means that any products on pages 2, 3 and so on will suffer because google treats it like an entirely separate page as opposed to an extension of the first. The way I see it I have one option: Show every product in each category on page 1. I have Lazy Load installed on our new website so it will only load the screen a user can see and as they scroll down it loads more products, but how will google interpret this? Will Google simply see all 50-300 products per category and give the site a bad page load score because it doesn't know the Lazy Load is in place? Or will it know and account for it? Is there anything I'm missing?
Intermediate & Advanced SEO | | moon-boots0 -
Google disavow file
Does anybody have any idea how often Google reads the disavow file?
Intermediate & Advanced SEO | | seoman100 -
Why is our page will not being found by google?
Hi, We have a page that went live nearly 2 months ago. https://www.invoicestudio.com/Secure/InvoiceTemplate Why does google not notice it. Both site: URL's return nothing. site:www.invoicestudio.com/Secure/InvoiceTemplate site:www.invoicestudio.com/Secure This is an important page for us and do not understand why google doesn't like it. Hope you can help Thanks Andrew
Intermediate & Advanced SEO | | Studio330 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
Time for Google to change the emphasis?
Why doesn't Google recommend that links are nofollow as standard, via HTML5, etc., with follow being added if the link is on a quality site (defined by PR, or whatever.) and adds value. Wouldn't this save alot of time? Then they could whack all the sites with coding that doesn't comply, couldn't they? Also, instead of enabling negative SEO, why doesn't Google simply focus on wiping out the sites developed simply to pass on PR. I'm sure we could all send them a few thousand suggestions!
Intermediate & Advanced SEO | | McTaggart0 -
Should I let Google crawl my production server if the site is still under development?
I am building out a brand new site. It's built on Wordpress so I've been tinkering with the themes and plug-ins on the production server. To my surprise, less than a week after installing Wordpress, I have pages in the index. I've seen advice in this forum about blocking search bots from dev servers to prevent duplicate content, but this is my production server so it seems like a bad idea. Any advice on the best way to proceed? Block or no block? Or something else? (I know how to block, so I'm not looking for instructions). We're around 3 months from officially launching (possibly less). We'll start to have real content on the site some time in June, even though we aren't planning to launch. We should have a development environment ready in the next couple of weeks. Thanks!
Intermediate & Advanced SEO | | DoItHappy0 -
Missing Suite Number on Google
I realized that we are missing a suite number. It is not on the website or the recently updated Google/Bing/Yahoo revisions I did. Should I go and fix? Or should I go and adjust old listings. Does a suite number matter in the NAP?
Intermediate & Advanced SEO | | greenhornet770 -
I'm pulling my hair out trying to figure out why google stopped crawling.. any help is appreciated
This is going to be kind of long, simply because there is a background to the domain name that is not typical to anybody in the world really and I'm not sure if its possible that it was penalized or ranked lower because of that or not. Because of that I'm going to include it with the hopes that giving the full picture some nice soul in the world who has more knowledge in this than me see's something or knows something and can point me in the right direction. Our site has been around for a few years, at one point the domain was seized by homeland security ICE, and then they had to give it back in Dec. which sparked a lot of the SOPA PIPA stuff and we became the poster child so to speak. The site had previously been up since 2008, but due to that whole mess the site was down for 13 months on the dreaded seized server with a scary warning graphic and site title which caused quite obviously a bunch of 404 errors and who knows what else damage to anything we'd had before that as far as page rank and incoming links. we had a lot of incoming links from high quality sites. We were advised upon getting the domain back to pretty much scrap all the old content that was on the site prior and just start fresh.. which we did. Googlebot started crawling slowly, but then as we started getting back into the swing of things people started linking to us,some with high page rank, we were getting indexed quite frequently and ranking high on search results in our niche.. Then something happened on March 4th, we had arguably our best day with google traffic, we'd been linked back by places like Huff Post etc for content in our niche.. and the next day literally it was a freefall. Darn near nothing. I've attached a screen shot from webmaster tools so you can see how drastic it was. I went crazy, trying to figure out what was wrong, searching obsessively through webmaster tools looking for any indication of a problem, searched the site on google site:dajaz1.com and what comes up is page 2 page 3 page 45 page 46. It's also taken to indexing our category and tag pages and even our search pages. I've now set those all to noindex follow but when I look at where the googlebots are at on the site, they're on the categories, pages, author pages, and tags. Some of our links are still getting indexed, but doing a search just of our site name and we're ranking below many of the media sites that have written about our legal issues, when a month ago we were at least top result for our own name. I've racked my brain trying to figure out the issue. I've disabled plugins, I'm on fetch as google bot all the time making sure our stuff is at least coming out as 200 (we had 2 days where we were getting 403 errors due to a super-cache issue, but once fixed googlebot returned like it never left) I've literally watched 1000 videos, read 100 forums, added in SEO plugins, tried to optimize the site to the point I'm worried I'm over doing it.. and still they've barely begun to crawl. As you can see there is some activity in the last 2-3 days, but even submitting a new site map once I changed the theme out of desperation it's only indexed 16. I've looked for errors all through webmaster tools and I can't find anything to tell me why that happened, how to fix it, and how to get googlebot to like us again. I'm pulling my hair out here. The links we have incoming are high quality links like huffington post , spin, complex, etc. Those haven't slowed down at all, we do outgoing links to sites we trust and are high quality as well. I've got interns working on how they're writing titles and such, I've gone through and attempted to fix duplicate pages and titles.. I've been going through and re-writing meta description tags What am I missing? I'm pulling my hair out trying to figure out what the issue is. Eternally grateful for any help provided. jnzb6.png
Intermediate & Advanced SEO | | malady0