What to do with extremely high number of URLs on your site?
-
Here is the situation:
The site has tons of business and personal profiles, the information needed to be categorized as such directories were created in an attempt to keep the URL structure clean - so for example:
www.abc.com/product/um/name-here/city-name/state/lastname:3458765
Each profile has a unique ID#, and for some reason there needed to be a category for a user in this case /um/ stands for user name.
Webmaster tool steps to resolve state to use an rel=canonical which can be done for that directory /um/ but I am concerned about the bot not being able to find the other pages beyond that directory, like the profile name, city, state associated. So I guess my ultimate question is if I use rel=canonical will the rest of the content not get crawled or indexed as well?
-
This is not what the canonical tag is intended for.
The personal profiles will most likely be very low content dupes of each other like these which are indexed and should not be:
if pages deeper in that folder are good content worthy of being indexed then:
a) add noindex,follow to these profile pages
b) add index, follow to the deeper pages
that will keep the bots crawling the profile pages to the deeper folders with content you want indexed.
You can also disallow the /un/ (user name) folder and allow the deeper folders with robots.txt commands. We were just discussing this:
http://www.seomoz.org/q/allow-or-disallow-first-in-robots-txt
-
Does everything need to be indexed? If not, perhaps the personal profiles could be noindexed. Let the search engines crawl all of your content, but only have them index pages that provide value to the SERPs.\
Only use rel=canonical if the content on different URLs is the exact same. Using it incorrectly will cause content to not be indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you help by advising how to stop a URL from referring to another URL on my website with a 404 errorplease?
How to stop a URL from referring to another URL on my site. I'm getting a 404 error on a referred URL which is (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/[null id=43484])referred from URL (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/) The referred URL is the URL page that I want and I do not need it redirecting to the other URL as that's presenting a 404 error. I have tried saving the permalink in WordPress and recreated the .htaccess file and the problem is still there. Can you advise how to fix this please? Is it a case of removing the redirect? Is this advisable and how do I do that please? Thanks
Technical SEO | | Nichole.wynter20200 -
In Facebook when i place my site URL the image does not load?
In Facebook when i place my site URL the image does not load? It loads some generic image or logo but not other image thats related to the page. Is there any Tag we need to add in the website so the image loads? Is it good to use a tag as this for description? property="og:description" content="Some data" />
Technical SEO | | bsharath0 -
Mobile site backlinks?
Hello, Our mobile site redirects to desktop in a desktop browser and vice versa; however, they are different sites. This said, shouldn't the backlinks for our mobile site be the same as for our desktop site since one redirects to the other. We show no backlinks in my analysis? Any help or insight would be extremely appreciated! Thank you!
Technical SEO | | lfrazer1 -
Why is there a difference in the number of indexed pages shown by GWT and site: search?
Hi Moz Fans, I have noticed that there is a huge difference between the number of indexed pages of my site shown via site: search and the one that shows Webmaster Tools. While searching for my site directly in the browser (site:), there are about 435,000 results coming up. According to GWT there are over 2.000.000 My question is: Why is there such a huge difference and which source is correct? We have launched the site about 3 months ago, there are over 5 million urls within the site and we get lots of organic traffic from the very beginning. Hope you can help! Thanks! Aleksandra
Technical SEO | | aleker0 -
AJAX and High Number Of URLS Indexed
I recently took over as the SEO for a large ecommerce site. Every Month or so our webmaster tools account is hit with a warning for a high number of URLS. In each message they send there is a sample of problematic URLS. 98% of each sample is not an actual URL on our site but is an AJAX request url that users are making. This is a server side request so the URL does not change when users make narrowing selections for items like size, color etc. Here is an example of what one of those looks like Tire?0-1.IBehaviorListener.0-border-border_body-VehicleFilter-VehicleSelectPanel-VehicleAttrsForm-Makes We have over 3 million indexed URLs according to Google because of this. We are not submitting these urls in our site maps, Google Bot is making lots of AJAX selections according to our server data. I have used the URL Handling Parameter Tool to target some of those parameters that are currently set to let Google decide and set it to "no urls" with those parameters to be indexed. I still need more time to see how effective that will be but it does seem to have slowed the number of URLs being indexed. Other notes: 1. Overall traffic to the site has been steady and even increasing. 2. Google bot crawls an average of 241000 urls each day according to our crawl stats. We are a large Ecommerce site that sells parts, accessories and apparel in the power sports industry. 3. We are using the Wicket frame work for our website. Thanks for your time.
Technical SEO | | RMATVMC0 -
Structure of urls
**Hallo from Athens, Greece. We have to implement the following project and i need your help: ** We will build a company guide for the whole country and company local guides for each city for the same client. **Information of the country guide is the sum of information of local guides, so when a user is at the country guide he sees information from companies from all cities and when the user is at city guide he sees info only for the city. ** The problem is the structure of the url we should have. Should the page of presentation of each company should have structure as domain.gr/id/company? or city.domain.gr/id/company and the one to be canonical to the other? is this good for seo? Should both urls be included in the sitemap? Thank you
Technical SEO | | herculesopa0 -
Canonical URL
I previously set the canonical Url in google web masters to the non www version, when I check my on page opt, it tells me that I have a critical issue with this. Should I change it in google web masters back to the www version? if so is there the possibility of negative results? Or is there a better way to deal with this? Note, I have inbound links pointing to both types.
Technical SEO | | bronxpad0 -
What happens when a link goes to a dead url on my site?
I noticed in Open Site Explorer, I have several incoming links going to dead urls because i re-organized my site. For example, there might be an incoming link to: sample.php?ID=8 The problem is that I moved the file to /subdir1 so it would be nice if it could link to /subdir1/sample.php?ID=8 BUT, on top of that, I have also changed the url to seo-friendly urls. So, really, it should link to /Category_Descripton/ProductName/8 and then get re-written to /subdir1/sample.php?ID=8 So, what are the implications of having these incoming links to dead urls other than the bad user experience. What are the implications from an SEO standpoint? What's the best way to fix this? Thanks.
Technical SEO | | webtarget0