AJAX and High Number Of URLS Indexed
-
I recently took over as the SEO for a large ecommerce site. Every Month or so our webmaster tools account is hit with a warning for a high number of URLS. In each message they send there is a sample of problematic URLS. 98% of each sample is not an actual URL on our site but is an AJAX request url that users are making. This is a server side request so the URL does not change when users make narrowing selections for items like size, color etc. Here is an example of what one of those looks like
Tire?0-1.IBehaviorListener.0-border-border_body-VehicleFilter-VehicleSelectPanel-VehicleAttrsForm-Makes
We have over 3 million indexed URLs according to Google because of this. We are not submitting these urls in our site maps, Google Bot is making lots of AJAX selections according to our server data. I have used the URL Handling Parameter Tool to target some of those parameters that are currently set to let Google decide and set it to "no urls" with those parameters to be indexed. I still need more time to see how effective that will be but it does seem to have slowed the number of URLs being indexed.
Other notes:
1. Overall traffic to the site has been steady and even increasing.
2. Google bot crawls an average of 241000 urls each day according to our crawl stats. We are a large Ecommerce site that sells parts, accessories and apparel in the power sports industry.
3. We are using the Wicket frame work for our website.
Thanks for your time.
-
Axial Dev,
Thanks for responding I have considered the Robots disallow however my worry has been several Videos by Matt Cutts talking about how now that the Google Bot can make AJAX requests that the best practice is to allow it to do so. So that is why I have not thrown on all around disallow addition to our robots.txt file, but It is clearly having issues on our site distinguishing the difference between a server side AJAX request on our site vs an actual real URL that should be indexed
Below is Matt Cutts plea to allow Java script to be allowed to be crawled there are a few others out there as well.
Does anyone else have experience with AJAX server side requests being indexed and how they combated the issue?
-
You could try using a robots.txt with a wildcard to stop Google from visiting those URLs :
Disallow: /*Tire?
or
Disallow: /*?0
It would help to see a full URL example (and matching categories).
See: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt : URL matching based on path values
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
In Google Search console 4,218,017 URLs submitted 402,035 URLs indexed what is the best way to troubleshoot? What is best guidance for sitemap indexation of large sites with a lot of changing content? view?usp=sharing
Technical SEO | | Hamish_TM1 -
App Indexing
Can anyone please check if our app is indexed or not? Also check if deep linking done is correct or not rel="alternate" href="android-app://in.instafresh.app/http/www.instafrsh.com/" /> Website - http://instafrsh.com/ App - https://play.google.com/store/apps/details?id=in.instafresh.app
Technical SEO | | Obbserv0 -
Marketing URL
Hi, I need a bit of advice on marketing URL's. The destinations URL is http://www.website.com/by-development.php?area=Isle Of Wight&development=developmentname. If we wanted to use www.website.com/developmentname on literature to send people to the ugly URL above, what would we do? Would we need to rewrite the ugly URL to the neat and then 301 the ugly to the neat? Currently, the team are using a new domain of neatandrelevant.info and 301 redirecting it to ugly URL but there are lots of different developments they want to send people to so a new domain is bought for each development which seems a bit unnecessary. They point to different pages on the ugly URL website. Assuming canonical tag would not be needed then because the ugly URL page would be redirected. Also, as the website has ugly URL's anyway, would it not be best practice to use rewrites anyway so that the URL's read www.mywebsite.com/region/development? Would it confuse things to then have extra short marketing URL's missing out /region? Hope that makes sense....
Technical SEO | | Houses0 -
Spider Indexed Disallowed URLs
Hi there, In order to reduce the huge amount of duplicate content and titles for a cliënt, we have disallowed all spiders for some areas of the site in August via the robots.txt-file. This was followed by a huge decrease in errors in our SEOmoz crawl report, which, of course, made us satisfied. In the meanwhile, we haven't changed anything in the back-end, robots.txt-file, FTP, website or anything. But our crawl report came in this November and all of a sudden all the errors where back. We've checked the errors and noticed URLs that are definitly disallowed. The disallowment of these URLs is also verified by our Google Webmaster Tools, other robots.txt-checkers and when we search for a disallowed URL in Google, it says that it's blocked for spiders. Where did these errors came from? Was it the SEOmoz spider that broke our disallowment or something? You can see the drop and the increase in errors in the attached image. Thanks in advance. [](<a href=)" target="_blank">a> [](<a href=)" target="_blank">a> LAAFj.jpg
Technical SEO | | ooseoo0 -
I'm redesigning a website which will have a new URL format. What's the best way to redirect all the old URLs to the new ones? Is there an automated, fast way to do this?
For example, the new URL will be: https://oregonoptimalhealth.com/about_us.html while the old one's were like this: http://www.oregonoptimalhealth.com/home/ooh/smartlist_1/services.html I have redirect almost 100 old pages to the correct new page. What's the best and easiest way to do this?
Technical SEO | | PolarisMarketing0 -
Overly Dynamic URLs
I have a site that I use to time fitness events and I like to post the results using query strings. I create a link to each event's results/gallery/etc. I don't need these pages crawled and I don't want them to hurt my seo. Can I put a "do not crawl" meta on them or will that hurt my overall positioning? What are my other options?
Technical SEO | | bobbabuoy0 -
Bing indexing
Hello, people~ I want to discuss about Bing indexation. I have a new web site which opened about 3 months ago. Google has no problem to index my site and all pages within the site indexed by Google. However, Bing and Yahoo is different story. I used manual submission, Bing webmaster tool to let Bing know about the site. However, Bing is not indexing my site yet. I researched about it and found that my site should have some external links before I get index by Bing. I check external links of my site with Google webmaster tool, SEOmoz tool and "link:" on Google. All tools show different number as below. Google webmaster Tool : more than 50 SEMoz site explorer : 5 link: on Google: none Why all method of checking links are different and which on should most depend on? Also how many links should I have in order to get index by Bing? Could you people please share your opinion?
Technical SEO | | Artience0 -
When URL rewrite can lead to un pretty URLs
Hi Mozzers. I've a client that has done a little bit of mess rewriting the URLs of its site. In fact, also the data base driven URLs are rewritten, but the dev forgot to change the space with "-", so that now the 95% of the URLs are like this one: http://www.portalesardegna.com/search/Appartamenti e Residence/ Obviously not really a pretty URL. I am not so sure if this issue has an SEO consecuences (in fact, the site ranks pretty well also with those kind of url), but I am thinking more on usability issue. Could you suggest me any easy fix to this rewrite problem?
Technical SEO | | gfiorelli12