How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.
-
We're using the Google snapshot method to index dynamic Ajax content. Some of this content is from tables using pagination. The pagination is tracked with a var in the hash, something like:
#!home/?view_3_page=1
We're seeing all sorts of calls from Google now with huge numbers for these URL variables that we are not generating with our snapshots. Like this:
#!home/?view_3_page=10099089
These aren't trivial since each snapshot represents a server load, so we'd like these vars to only represent what's returned by the snapshots.
Is Google generating random numbers going fishing for content? If so, is this something we can control or minimize?
-
Thanks for the great replies all. Just to clarify, this is the page we're referencing:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=
You can see the one pagination var "next" that points here:
http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=home/?view_3_page=2
As you can see this is pretty simple. There's only one potential variable (the "prev" and "next" links) for introducing these huge numbers and that's pretty limited. We tested the Google URLs up and down the app and haven't seen anything that would send it fishing for larger numbers. But Google keeps hammering us with:
GET /business-directory-user-demo/?escaped_fragment=home/?view_3_page=1000251
For now we're trying to respond to those with 404s and hope they eventually die.
Unfortunately we can't avoid hashbangs.
-
This seems to do this only for parameters that it has decided "changes, re-orders, or narrows content." They may also crawl things that look like URLs in Javascript even when it's part of a function, but it doesn't seem like that's what's happening in this case.
Depending on the setup of the site, you can either manually configure the variable in WMT (don't do this if the parameter is material), write a clever robots.txt rule (e.g. to block anything after a number of digits after the parameter), or (the best solution) re-work the system to generate URLs that don't rely on parameters.
I'm not sure I understand why the server is rendering a page if the URL isn't supposed to exist. Depending on your server config, you may also be able to return a 404 and make a rule for which (valid) pages to render. From there you can just ignore the 404 errors until Google figures it out.
I think that's the best I can do without seeing the site.
-
I agree with Federico. I've seen Google go fishing with URL parameters (?param=xyz) and I've seen it with AJAX and hashbangs as well. How far they take this and when they choose to apply it doesn't seem to follow a consistent pattern . You can see some folks on StackExchange discussing this, too: http://webmasters.stackexchange.com/questions/25560/does-the-google-crawler-really-guess-url-patterns-and-index-pages-that-were-neve
-
Awesome, thanks for looking into it. We've gotten nowhere with any kind of answer.
-
Hi There
I'm an associate here at Moz, and have asked the other associates if they might know the answer, as this one's a little outside of my experience. Please follow up and let us know if you don't hear from anyone.
Thanks!
-Dan
-
We also noticed some weird crawls last year using random numbers at the end of the URL, checking in google webmaster tools we saw that most of those urls were reported as not found, checking from where the link came from google listed some of our URLs, but didn't had any link to those URLs google was trying to fetch. After 2 or 3 months those crawls stopped. We never knew from where Google got those URLs...
-
Hi Federico, thanks for the response.
Unfortunately this is an SEO solution for a third-party JavaScript product, so removing the hash isn't an option.
I'm still interested in knowing if this is a formal Google practice and if there's some way to control or mitigate this.
-
I think you are right. Google is fishing for content. I would find a solution to make those URL friendly by removing the hash and using some URL rewrite and pushState to paginate that content instead.
Here's a previous question that may help: http://moz.com/community/q/best-way-to-break-down-paginated-content
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I've screwed up. Domain pointers I forgot about. Think I am getting dinged by google.
Hey all. I setup some domain pointers for a client 8 years ago and now think they are hurting them. I am afraid google thinks it duplicate content. They are pointers so you can get to the same page using other domain names. Is my best approach to do a 301 redirect on them? The client is on a shared host so I have to use the web.config file. The site is pretty small so doing it for the 10+ pages is not that big of a deal. My question is this? When should I drop those pointers from the website altogether?
Intermediate & Advanced SEO | | DougDeVore0 -
Google Index Constantly Decreases Week over Week (for over 1 year now)
Hi, I recently started working with two products (one is community driven content), the other is editorial content, but I've seen a strange pattern in both of them. The Google Index constantly decreases week over week, for at least 1 year. Yes, the decrease increased 🙂 when the new Mobile version of Google came out, but it was still declining before that. Has it ever happened to you? How did you find out what was wrong? How did you solve it? What I want to do is take the sitemap and look for the urls in the index, to first determine which are the missing links. The problem though is that the sitemap is huge (6 M pages). Have you find out a solution on how to deal with such big index changes? Cheers, Andrei
Intermediate & Advanced SEO | | andreib0 -
I was wondering, do you know when you see updated results for a sporting event in the google search.
I was wondering, do you know when you see updated results for a sporting event in the google search. Are those the result of structured data?
Intermediate & Advanced SEO | | mycujoo0 -
Dfferent url of some other site is shown by Google in cace copy of our site's page
Hi, When i check cached copy of url of my site http://goo.gl/BZw2Zz , the url in cache copy shown by Google is of some other third party site. Why is Google showing third party url in our site's cached url. Did any of you guys faced any such issue. Regards,
Intermediate & Advanced SEO | | vivekrathore0 -
Can Google index PDFs with flash?
Does anyone know if Google can index PDF with Flash embedded? I would assume that the regular flash recommendations are still valid, even when embedded in another document. I would assume there is a list of the filetype and version which Google can index with the search appliance, but was not able to find any. Does anyone have a link or a list?
Intermediate & Advanced SEO | | andreas.wpv0 -
Malicious site pointed A-Record to my IP, Google Indexed
Hello All, I launched my site on May 1 and as it turns out, another domain was pointing it's A-Record to my IP. This site is coming up as malicious, but worst of all, it's ranking on keywords for my business objectives with my content and metadata, therefore I'm losing traffic. I've had the domain host remove the incorrect A-Record and I've submitted numerous malware reports to Google, and attempted to request removal of this site from the index. I've resubmitted my sitemap, but it seems as though this offending domain is still being indexed more thoroughly than my legitimate domain. Can anyone offer any advice? Anything would be greatly appreciated! Best regards, Doug
Intermediate & Advanced SEO | | FranGen0 -
Indexed Pages in Google, How do I find Out?
Is there a way to get a list of pages that google has indexed? Is there some software that can do this? I do not have access to webmaster tools, so hoping there is another way to do this. Would be great if I could also see if the indexed page is a 404 or other Thanks for your help, sorry if its basic question 😞
Intermediate & Advanced SEO | | JohnPeters0 -
Does Google index url with hashtags?
We are setting up some Jquery tabs in a page that will produce the same url with hashtags. For example: index.php#aboutus, index.php#ourguarantee, etc. We don't want that content to be crawled as we'd like to prevent duplicate content. Does Google normally crawl such urls or does it just ignore them? Thanks in advance.
Intermediate & Advanced SEO | | seoppc20120