Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Stuck with canonical URL - main site vs categorys?
Hello, I started to doubt myself. We have a classified advertisements website. On the main www.website.com page, almost all the advertisements are shown. Now we take those advertisements and also split them into categorys Category 1 / category 2 / category 3 / category 4 Now all those categories almost always have the same content as www.website.com except a bit less (because X amount of content is now divided also to 4-5 groups) For raking should i actually tell google that those categories are a copy of www.website.com or they should still be as they are?
Technical SEO | | advertisingcloud0 -
Does incomplete or private WHOIS information have negative effect on SEO?
I wondered if having incomplete WHOIS information has a negative effect on rankings? Also I know it's possible to keep WHOIS information private, so I wondered if google would think this suspicious and rank a site not as highly as it should? Thanks!
Technical SEO | | Silktide0 -
Internal Ads on A Site
We serve ads on our site using a sub-domain. All ads use a re-direct from ads.domain before redirecting users to the proper, normal, internal url. Most the content on our home page is ad block driven. Is it possible and does it make sense to enter the sub-domain as url parameter in Google Webmaster tools, letting Google know that this is something to be ignored. Many thanks
Technical SEO | | CeeC-Blogger0 -
Site Blacklisted
Good morning. Just done my WMT ritual morning check and one of my sites has been blacklisted for malware. It's a wordpress site - I've run various scans, e.g. http://sitecheck.sucuri.net/scanner/ and also installed wordfence and scanned with that and wordfence produced some offending files which I have now deleted. I've also installed website defender in the hope that it wont happen again. I'm pretty good with staying on top of updates and rarely let a few days pass without upgrading new version of wordpress or plugins etc. I've also checked my users to make sure no new admins or anything and also changes passwords. I've asked for a review from Google and just wondered how long these reviews take? Also, has anybody got any advice, is there anything else I should be doing? Thanks
Technical SEO | | littlesthobo0 -
Can we use our existing site content on new site?
We added 1000s of pages unique content on our site and soon after google release penguin and we loose our ranking for major keywords and after months of efforts we decided to start a new site. If we use all the existing site content on new domain does google going to penalized the site for duplicate content or it will be treated as unique? Thanks
Technical SEO | | mozfreak0 -
Why is this site ranking so good?
Site in question: http://bit.ly/aBvVbm Our main competitor in the UK seems to be ranking extremely good for the keyword "jigsaw puzzles" even though their linking profile doesn't seem all that great? They mainly have site-wide links on 2 of their other ecommerce sites which seem to be given them their ranking power as this equals to 100's of links. Does sitewide links on 2 sites really give this much ranking power or am I missing something?
Technical SEO | | Tonyy30 -
Mobile Site Domain/URL Structure
We are currently building a mobile optimised version of our main website and I had some questions with regard to SEO. 1. Is it best to structure the domain as: m.yourdomain.com yourdomain/m 2. It is correct to place rel="cannonical" on the mobile pages and to have only the main site indexed? Thanks in advance and links or books on mobile seo you can direct me to that would be greatly appreciated. Phil
Technical SEO | | Phily0 -
Redirect from old wordpress site to new php site? Best approach
Hi I have two websites one legacy site done in wordpress the other in php. However I would like to merge the two together and remove the wordpress site. However it has a good link profile and the pages rank well. What is the best approach to do a 301 redirect from the old site with all its pages pointing to the homepage of the new site? If so what's the best way to do this in wordpress? Many thanks
Technical SEO | | ocelot0