Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing Request - Typical Time to Complete?
In Google Search Console, when you request the (re) indexing of a fetched page, what's the average amount of time it takes to re-index and does it vary that much from site to site or are manual re-index request put in a queue and served on a first come - first serve basis despite the site characteristics like domain/page authority?
Intermediate & Advanced SEO | | SEO18050 -
Newly designed page ranks in Google but then disappears - at a loss as to why.
Hi all, I wondered if you could help me at all please? We run a site called getinspired365.com (which is not optimised) and in the last 2 weeks have tried to optimise some new pages that we have added. For example, we have optimised this page - http://getinspired365.com/lifes-a-bit-like-mountaineering-never-look-down This page was added to Google's index via webmaster tools. When I then did a search for the full quote it came back 2nd in Google's search. If I did a search for half the quote (Life is a bit like mountaineering) it also ranked 2nd. We had another quote page that we'd optimised that displayed similar behaviour (it ranked 4th). But then for some reason when I now do the search it doesn't rank in the top 100 results. This, despite, an unoptimised "normal" page ranking 4th for a search such as: Thousands of geniuses live and die undiscovered. So our domain doesn't seem to be penalised as our "normal" pages are ranking. These pages aren't particularly well designed from an SEO standpoint. But our new pages - which are optimised - keep disappearing from Google, despite the fact they still show as indexed. I've rendered the pages and everything appears fine within Google Webmaster Tools. At a bit of a loss as to why they'd drop so significantly? A few pages I could understand but they've all but been removed. Any one seen this before, and any ideas what could be causing the issue? We have a different URL structure for our new pages in that we have the quote appear in the URL. All the content (bar the quote) that you see in the new pages are unique content that we've written ourselves. Could it be that we've over optimised and Google view these pages as spam? Many thanks in advance for all your help.
Intermediate & Advanced SEO | | MichaelWhyley0 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
How can I make sure Google is crawling a link from an iframe (video)?
Do they crawl backlinks from an iframe example from a Youtube video embedded in a blog post? TIA!
Intermediate & Advanced SEO | | zpm20140 -
Pages are Indexed but not Cached by Google. Why?
Here's an example: I get a 404 error for this: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all But a search for qjamba restaurant coupons gives a clear result as does this: site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all What is going on? How can this page be indexed but not in the Google cache? I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
Intermediate & Advanced SEO | | friendoffood2 -
Proper 301 in Place but Old Site Still Indexed In Google
So i have stumbled across an interesting issue with a new SEO client. They just recently launched a new website and implemented a proper 301 redirect strategy at the page level for the new website domain. What is interesting is that the new website is now indexed in Google BUT the old website domain is also still indexed in Google? I even checked the Google Cached date and it shows the new website with a cache date of today. The redirect strategy has been in place for about 30 days. Any thoughts or suggestions on how to get the old domain un-indexed in Google and get all authority passed to the new website?
Intermediate & Advanced SEO | | kchandler0 -
"Jump to" Links in Google, how do you get them?
I have just seen yoast.com results in Google and noticed that nearly all the indexed pages show a "Jump to" link So instead of showing the full URL under the title tag, it shows these type of links yoast.com › SEO
Intermediate & Advanced SEO | | JohnPeters
yoast.com › Social Media
yoast.com › Analytics With the SEO, Social Media and Analytics all being clickable. How has he achieved this? And is it something to try and incorporate in my sites?0 -
How Google treat internal links with rel="nofollow"?
Today, I was reading about NoFollow on Wikipedia. Following statement is over my head and not able to understand with proper manner. "Google states that their engine takes "nofollow" literally and does not "follow" the link at all. However, experiments conducted by SEOs show conflicting results. These studies reveal that Google does follow the link, but does not index the linked-to page, unless it was in Google's index already for other reasons (such as other, non-nofollow links that point to the page)." It's all about indexing and ranking for specific keywords for hyperlink text during external links. I aware about that section. It may not generate in relevant result during any keyword on Google web search. But, what about internal links? I have defined rel="nofollow" attribute on too many internal links. I have archive blog post of Randfish with same subject. I read following question over there. Q. Does Google recommend the use of nofollow internally as a positive method for controlling the flow of internal link love? [In 2007] A: Yes – webmasters can feel free to use nofollow internally to help tell Googlebot which pages they want to receive link juice from other pages
Intermediate & Advanced SEO | | CommercePundit
_
(Matt's precise words were: The nofollow attribute is just a mechanism that gives webmasters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt'ed out), but nofollow on individual links is simpler for some folks to use. There's no stigma to using nofollow, even on your own internal links; for Google, nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level.) Matt has given excellent answer on following question. [In 2011] Q: Should internal links use rel="nofollow"? A:Matt said: "I don't know how to make it more concrete than that." I use nofollow for each internal link that points to an internal page that has the meta name="robots" content="noindex" tag. Why should I waste Googlebot's ressources and those of my server if in the end the target must not be indexed? As far as I can say and since years, this does not cause any problems at all. For internal page anchors (links with the hash mark in front like "#top", the answer is "no", of course. I am still using nofollow attributes on my website. So, what is current trend? Will it require to use nofollow attribute for internal pages?0