Does Googlebot Read Session IDs?
-
I did a raw export from AHREFs yesterday and one of our sites has 18,000 backlinks coming from the same site. But they're all the same link, just with a different session ID. The structure of the URL is:
[website].com/resources.php?UserID=10031529
And we have 18,000 of these with a different ID.
Does Google read each of these as a unique backlink or does it realize there's just one link and the session ID is throwing it off? I read different opinions when researching this so I'm hoping the Moz community can give some concrete answers.
-
Safest bet, set up canonicals that point to the page minus the parameter so even if Google does read the session IDs it will understand that they relate to the canon link. Honestly, I'm not 100% sure if Google reads those sessions IDs or not either and have seen conflicting information. I know they read other parameters as separate URLs... I had a few issues with the way one of our sites handled products (sometimes it was ?model= and sometimes it was ?prod_id= and some old products also had ?sku=). But adding the canonicals will solve this problem if it exists and if the problem doesn't exist it won't hurt having a self-referential canonical sitting in the code in case someone scrapes your site.
-
You have to inform yourself and really watch out for this kind of stuff and SE bots.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Uppercase/Lowercase Reading As Duplicate Permalinks
I cannot figure out if this is an actual SEO issue or just a crawl reader error. I use Screaming Frog to crawl my site and use their SEO features. When I look at page titles and duplicates it shows all our pages twice... some with 1 letter capitalized and the other not. I don't REALLY have duplicate permalinks do I? I also noticed when I use some open site explorers and paste in both permalinks the specs will show for the permalink that's all lowercase but it won't find anything for the "duplicate" permalink that is capitalized. Below I included a few screenshots. Thank you Moz Fam! Q6866xZNUfpF cxXacVajCBGb
Intermediate & Advanced SEO | | LindsayE0 -
Display None (Read More) Implimentation
Hi Mozzers, This question has been asked a few times over the years, but opinion seems to have changed drastically and i wanted to get an updated opinion from sources i trust. On my category pages I have content above products. The content can push the product too far down, and if placed below is never viewed. To battle this I wanted to implement a "Read More" button so i could keep a couple hundred words there and expand it to the rest of the content if the user wanted. If not the products would remain near the top of the screen for better conversion. I have implemented this on this page to test if it affects my keyword rankings before i go site wide. But also wanted an opinion if this practice is ok. The example page with it implemented can be found here. The content im hiding isn't huge here but on other pages could be more. Is there a set ratio of text i should aim to keep / hide? Any pitfalls i should watch out for? I know google crawls the hidden content as its in the source code but should i be wary of a penalty is too much is hidden?
Intermediate & Advanced SEO | | ATP0 -
Content with Read More..?
How does google see content that's static on page & content that has a "see more" or "read more" tag. Where the content collapses & de-collapses on a mouse click. On a condition that the complete is readable via the source code view as well as crawl-able by spiders?
Intermediate & Advanced SEO | | welcomecure0 -
Would you rate-control Googlebot? How much crawling is too much crawling?
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day. A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors. I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is. Questions to Enterprise SEOs: *Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached. *We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have. Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back? *What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate. Thanks
Intermediate & Advanced SEO | | lzhao0 -
Problem with Google reading https homepage?
Hi Moz Community, In July, we changed our homepage to https via a 301 redirect from http (the only page on our site with https). Our homepage receives an A grade in the ‘On Page Grader’ by Moz for our desired keyword. We have increased our backlink efforts directly to our homepage since we switched to the SSL homepage. However, we still have not increased in search ranking for our specific keyword. Is there something we could have missed when doing the 301 redirect (submitting a new sitemap, changing rotbots.txt files, or anything else??) that has resulted in Google not correctly accessing the https version? (the https page has been indexed by Google). Any help would be greatly appreciated.
Intermediate & Advanced SEO | | G.Anderson0 -
Https Homepage Redirect & Issue with Googlebot Access
Hi All, I have a question about Google correctly accessing a site that has a 301 redirect to https on the homepage. Here’s an overview of the situation and I’d really appreciate any insight from the community on what the issue might be: Background Info:
Intermediate & Advanced SEO | | G.Anderson
My homepage is set up as a 301 redirect to a https version of the homepage (some users log in so we need the SSL). Only 2 pages on the site are under SSL and the rest of the site is http. We switched to the SSL in July but have not seen any change in our rankings despite efforts increasing backlinks and out put of content. Even though Google has indexed the SSL page of the site, it appears that it is not linking up the SSL page with the rest of the site in its search and tracking. Why do we think this is the case? The Diagnosis: 1) When we do a Google Fetch on our http homepage, it appears that Google is only reading the 301 redirect instructions (as shown below) and is not finding its way over to the SSL page which has all the correct Page Title and meta information. <code>HTTP/1.1 301 Moved Permanently Date: Fri, 08 Nov 2013 17:26:24 GMT Server: Apache/2.2.16 (Debian) Location: https://mysite.com/ Vary: Accept-Encoding Content-Encoding: gzip Content-Length: 242 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 <title>301 Moved Permanently</title> # Moved Permanently The document has moved [here](https://mysite.com/). * * * <address>Apache/2.2.16 (Debian) Server at mysite.com</address></code> 2) When we view a list of external backlinks to our homepage, it appears that the backlinks that have been built after we switched to the SSL homepage have been separated from the backlinks built before the SSL. Even on Open Site, we are only seeing the backlinks that were achieved before we switched to the SSL and not getting to track any backlinks that have been added after the SSL switch. This leads up to believe that the new links are not adding any value to our search rankings. 3) When viewing Google Webmaster, we are receiving no information about our homepage, only all the non-https pages. I added a https account to Google Webmaster and in that version we ONLY receive the information about our homepage (and the other ssl page on the site) What Is The Problem? My concern is that we need to do something specific with our sitemap or with the 301 redirect itself in order for Google to read the whole site as one entity and receive the reporting/backlinks as one site. Again, google is indexing all of our pages but it seems to be doing so in a disjointed way that is breaking down link juice and value being built up by our SSL homepage. Can anybody help? Thank you for any advice input you might be able to offer. -Greg0 -
Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
Hi, Any assistance would be greatly appreciated. Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site". This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc. Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years. We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what . what is the best solution .? A couple of example of our problems are here : In Google type - site:bestathire.co.uk inurl:"br" You will see 107 results. This is one of many lot we need to get rid of. Also - site:bestathire.co.uk intitle:"All items from this hire company" Shows 25,300 indexed pages we need to get rid of Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical. As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ? Any advice on both points greatly appreciated? thanks Sarah.
Intermediate & Advanced SEO | | SarahCollins0 -
Why specify robots instead of googlebot for a Panda affected site?
Daniweb is the poster child for sites that have recovered from Panda. I know one strategy she mentioned was de-indexing all of her tagged content, fo rexample: http://www.daniweb.com/tags/database Why do you think more Panda affected sites specifying 'googlebot' rather than 'robots' to capture traffic from Bing & Yahoo?
Intermediate & Advanced SEO | | nicole.healthline0