Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing Canonical Tags on Indexed Pages that are Ranking Well
Hi Guys, I recently rolled out a domain wide canonical tag change. Previously the website had canonical tags without the www, however the website was setup to redirect to www on page load. I noticed that the site competitors were all using www and as far as I understand www versus non www, it's based on preference. In order to keep things consistent, I changed the canonical tag to include the www. Will the site drop in rankings? Especially if the pages are starting to rank quite well. Any feedback is appreciated. Thanks!
Intermediate & Advanced SEO | | QuickToImpress0 -
Google Index Constantly Decreases Week over Week (for over 1 year now)
Hi, I recently started working with two products (one is community driven content), the other is editorial content, but I've seen a strange pattern in both of them. The Google Index constantly decreases week over week, for at least 1 year. Yes, the decrease increased 🙂 when the new Mobile version of Google came out, but it was still declining before that. Has it ever happened to you? How did you find out what was wrong? How did you solve it? What I want to do is take the sitemap and look for the urls in the index, to first determine which are the missing links. The problem though is that the sitemap is huge (6 M pages). Have you find out a solution on how to deal with such big index changes? Cheers, Andrei
Intermediate & Advanced SEO | | andreib0 -
Link from Google.com
Hi guys I've just seen a website get a link from Google's Webmaster Snippet testing tool. Basically, they've linked to a results page for their own website test. Here's an example of what this would look like for a result on my website. http://www.google.com/webmasters/tools/richsnippets?q=https%3A%2F%2Fwww.impression.co.uk There's a meta nofollow, but I just wondered what everyone's take is on Trust, etc, passing down? (Don't worry, I'm not encouraging people to go out spamming links to results pages!) Looking forward to some interesting responses!
Intermediate & Advanced SEO | | tomcraig860 -
Can internal links from a blog harm the ranking of a page?
Here is the situation: A site was moved from its original domain to its new domain, and at the same time, the external wordpress.com blog was moved to a subdirectory, making it an onsite blog. The two pages that rank the highest on the site have virtually no links from the blog and no external links, while all the other pages are linked extensively from the blog and have backlinks. Their targeted keywords are not so much easier to rank than the other pages for that to be the sole cause. To confuse the matter even more, there was a manual penalty affecting incoming links which was removed last month. The old site, which has many backlinks to the new site, is still in Google's index. The old blog however, has been redirected page by page and is not in Google's index. Most of the blog posts are short 1-paragraph company updates and potentially considered low quality content because of that (?) The common denominator among the two highest ranked pages (I'm talking top 3 in SERP v. page 3 or 4) seems to be either the lack of external backlinks or the lack of internal links from the blog. Could there be an issue with the blog such that internal links from it are detrimental rather than helpful?
Intermediate & Advanced SEO | | kimmiedawn0 -
How can i stop such links being indexed
Hi, How can i stop such links being indexed The first link is what i want to stop indexed. We have 1,000's of people writing articles and the below URl shows how many articles each did http://www.somename.com/article/15633 But this is the URl which shows the exact articlehttp://www.Somename.com/article/step-step-installation-ibm-lotus-notesAs both start as thishttp://www.Somename.com/article/How can i set noindex? Should we set for each URL manually one by oneThanks
Intermediate & Advanced SEO | | mtthompsons0 -
Google Manual Action (manual-Penalty)- Unnatural inbound links
Dear friends, I just get from Google two "Unnatural inbound links" notifications via Google Webmaster Tools, the first is for our WWW version of the site and the second is for the NON-WWW version. My question, I should send two identical reconsideration request for WWW and NON-WWW or treat them as different sites? Thank you Claudio
Intermediate & Advanced SEO | | SharewarePros0 -
Home Page Got Indexed as httpS and Rankings Went Down.
Hello fellow SEO's About 3 weeks ago all of a sudden the home page on our Magento based website went down in rankings (from top 10 to page 3-4 Google) and was showing as httpS - instead of usual http. It first happened with just a few keywords and a week later any search phrase was returning the httpS result for the home page. When I view cache for the home page now it (both http and httpS versions) it gives me this http://clip2net.com/s/2OtPS We are not blocking anything in robots.txt Robots tags are set to index,follow There are hardly any external links pointing at the home pages as httpS This only affected the home page - all other pages rank where they used to and appear as http Has anybody ever had a similar problem? Thanks in advance for your thoughts and help
Intermediate & Advanced SEO | | ddseo0 -
Why is Google indexing either the singular or plural version of a keyword?
Hello Forum, We have just finished completely redoing a website and it seems that for several keywords either the plural or singular version is no longer being displayed in Google search results. For example, we sell yoga products, one of which is a bolster. In the SEO section of Google Analytics, the keyword "bolsters" has held a steady rank while "bolster" lost lots of rank and now no longer shows. Both keywords pointed to the same page and hold nearly equal rank, which has both keywords for "bolster" and "yoga bolster" Any idea what may be going on?
Intermediate & Advanced SEO | | pano0