Why are bit.ly links being indexed and ranked by Google?
-
I did a quick search for "site:bit.ly" and it returns more than 10 million results.
Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination?
I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
-
Given that Chrome and most header checkers (even older ones) are processing the 301s, I don't think a minor header difference would throw off Google's crawlers. They have to handle a lot.
I suspect it's more likely that either:
(a) There was a technical problem the last time they crawled (which would be impossible to see now, if it had been fixed).
(b) Some other signal is overwhelming or negating the 301 - such as massive direct links, canonicals, social, etc. That can be hard to measure.
I don't think it's worth getting hung up on the particulars of Bit.ly's index. I suspect many of these issues are unique to them. I also expect problems will expand with scale. What works for hundreds of pages may not work for millions, and Google isn't always great at massive-scale redirects.
-
Here's something more interesting.
Bitly vs tiny.cc
I used http://web-sniffer.net/ to grab the headers of both and with bitly links, I see an HTTP Response Header of 301, followed by "Content", but with tiny.cc links I only see the header redirect.
Two links I'm testing:
Bitly response:
Content (0.11 <acronym title="KibiByte = 1024 Byte">KiB</acronym>)
<title></span>bit.ly<span class="tag"></title> <a< span="">href="https://twitter.com/KPLU">moved here</a<>
-
I was getting 301->403 on SEO Book's header checker (http://tools.seobook.com/server-header-checker/), but I'm not seeing it on some other tools. Not worth getting hung up on, since it's 1 in 70M.
-
I wonder why you're seeing a 403, I still see a 200.
http://www.wlns.com/story/24958963/police-id-adrian-woman-killed-in-us-127-crash
200: HTTP/1.1 200 OK
- Server IP Address: 192.80.13.72
- ntCoent-Length: 60250
- Content-Type: text/html; charset=utf-8
- Server: Microsoft-IIS/6.0
- WN: IIS27
- P3P: CP="CAO ADMa DEVa TAIa CONi OUR OTRi IND PHY ONL UNI COM NAV INT DEM PRE"
- X-Powered-By: ASP.NET
- X-AspNet-Version: 4.0.30319
- wn_vars: CACHE_DB
- Content-Encoding: gzip
- Content-Length: 13213
- Cache-Control: private, max-age=264
- Expires: Wed, 19 Mar 2014 21:38:36 GMT
- Date: Wed, 19 Mar 2014 21:34:12 GMT
- Connection: keep-alive
- Vary: Accept-Encoding
-
I show the second one (bit.ly/O6QkSI) redirecting to a 403.
Unfortunately, these are only anecdotes, and there's almost no way we could analyze the pattern across 70M indexed pages without a massive audit (and Bitly's cooperation). I don't see anything inherently wrong with their setup, and if you noticed that big of a jump (10M - 70M), it's definitely possible that something temporarily went wrong. In that case, it could take months for Google to clear out the index.
-
I looked at all 3 redirects and they all showed a single 301 redirect to a 200 destination for me. Do you recall which one was a 403?
Looking at my original comment in the question, last month bit.ly had 10M results and now I'm seeing 70M results, which means there was a [relatively] huge increase with indexed shortlinks.
I also see 1000+ results for "mz.cm" which doesn't seem much strange, since mz.cm is just a CNAME to the bitly platform.
I found another URL shortner which has activity, http://scr.im/ and I only saw the correct pages being indexed by Google, not the short links. I wonder if the indexing is particular to bitly and/or the IP subnet behind bitly links.
I looked at another one, bit.do, and their shortlinks are being indexed. Back to square 1.
-
One of those 301s to a 403, which is probably thwarting Google, but the other two seem like standard pages. Honestly, it's tough to do anything but speculate. It may be that so many people are linking to or sharing the short version that Google is choosing to ignore the redirect for ranking purposes (they don't honor signals as often as we like to think). It could simply be that some of them are fairly freshly created and haven't been processed correctly yet. It could be that these URLs got indexed when the target page was having problems (bad headers, down-time, etc.), and Google hasn't recrawled and refreshed those URLs.
I noticed that a lot of our "mz.cm" URLs (Moz's Bitly-powered short domain) seem to be indexed. In our case, it looks like we're chaining two 301s (because we made the domain move last year). It may be that something as small as that chain could throw off the crawlers, especially for links that aren't recrawled very often. I suspect that shortener URLs often get a big burst of activity and crawls early on (since that's the nature of social sharing) but then don't get refreshed very often.
Ultimately, on the scale of Bit.ly, a lot can happen. It may be that 70M URLs is barely a drop in the bucket for Bit.ly as well.
-
I spot checked a few and I noticed some are only single 301 redirects.
And looking at the results for site:bit.ly, some even have breadcrumbs ironically enough.
Here are a few examples
<cite class="_md">bit.ly/M5onJO</cite>
None of these should be indexed, but for some reason they are.
Presently I see 70M pages indexed for "bit.ly"
I see almost 600,000 results for "bitly.com"
-
It looks like bit.ly is chaining two 301s: the first one goes to feedproxy.google.com (FeedProxy is like AdSense for feeds, I think), and then the second 301 goes to the destination site. I suspect this intermediary may be part of the problem.
-
I wasn't sure on this one, but found this on readwrite.com.
"Bit.ly serves up links to Calais and gets back a list of the keywords and concepts that the linked-to pages are actually about. Think of it as machine-performed auto tagging with subject keywords. This structured data is much more interesting than the mere presence of search terms in a full text search."
Perhaps this structured data is submitted to Google?? Any other ideas?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google does not index image sitemap
Hi, we put an image sitemap in the searchconsole/webmastertools http://www.sillasdepaseo.es/sillasdepaseo/sitemap-images.xml it contains only the indexed products and all images on the pages. We also claimed the CDN in the searchconsole http://media.sillasdepaseo.es/ It has been 2 weeks now, Google indexes the pages, but not the images. What can we do? Thanks in advance. Dieter Lang
Intermediate & Advanced SEO | | Storesco0 -
Why is Google ranking irrelevant / not preferred pages for keywords?
Over the past few months we have been chipping away at duplicate content issues. We know this is our biggest issue and is working against us. However, it is due to this client also owning the competitor site. Therefore, product merchandise and top level categories are highly similar, including a shared server. Our rank is suffering major for this, which we understand. However, as we make changes, and I track and perform test searches, the pages that Google ranks for keywords never seems to match or make sense, at all. For example, I search for "solid scrub tops" and it ranks the "print scrub tops" category. Or the "Men Clearance" page is ranking for keyword "Women Scrub Pants". Or, I will search for a specific brand, and it ranks a completely different brand. Has anyone else seen this behavior with duplicate content issues? Or is it an issue with some other penalty? At this point, our only option is to test something and see what impact it has, but it is difficult to do when keywords do not align with content.
Intermediate & Advanced SEO | | lunavista-comm0 -
Google Manual Penalties:Different Types of Unnatural Link Penalties?
Hello Guys, I have a few questions regarding google manual penalties for unnatural link building. They are "partial site" penalties, not site wide. I have two sites to discuss. 1. this site used black hat tactics and bought 1000's of unnatural backlinks. This site doesn't rank for the main focus keywords and traffic has dropped. 2. this site has the same penalty, but has been all white hat, never bought any links or hired any seo company. It's all organic. This sites organic traffic doesn't seem to have taken any hit or been affected by any google updates. Based on the research we've done, Matt Cutts has stated that sometimes they know the links are organic so they don't penalize a website, but they still show us a penalty in the WMT. "Google doesn't want to put any trust in links that are artificial or unnatural. However, because we realize that some links may be outside of your control, we are not taking action on your site's overall ranking. Instead, we have applied a targeted action to the unnatural links pointing to your site." "If you don't control the links pointing to your site, no action is required on your part. From Google's perspective, the links already won't count in ranking. However, if possible, you may wish to remove any artificial links to your site and, if you're able to get the artificial links removed, submit areconsideration request. If we determine that the links to your site are no longer in violation of our guidelines, we’ll revoke the manual action." Check that info above at this link: https://support.google.com/webmasters/answer/2604772?ctx=MAC Recap: Does anyone have any experience like with site #2? We are worried that this site has this penalty but we don't know if google is stopping us from ranking or not, so we aren't sure what to do here. Since we know 100% the links are organic, do we need to remove them and submit a reconsideration request? Is it possible that this penalty can expire on its own? Are they just telling us we have an issue but not hurting our site b/c they know it's organic?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Does Google crawl and spider for other links in rel=canonical pages?
When you add rel=canonical to the page, will Google still crawl your page for content and discover new links in that page?
Intermediate & Advanced SEO | | ReferralCandy0 -
How long for Google Webmaster tools to update/reflect link changes
Hi all, Does anyone know or have experience of how long GWMT takes to update its data?, we did some work on our link profile back in October/November but are still seeing old links (removed) showing in GWMT. Thanks in advance,
Intermediate & Advanced SEO | | righty0 -
Effect of I-Frame on Google Rank
My commercial real estate web site (www.nyc-officespace-leader.com) allows visitors to search for office space listings. The site sources listings through a third party and they are displayed in an i-frame. The i-frame directs visitors to listing pages such as: http://listings.nyc-officespace-leader.com/getspace.mpl?sp_id=A0173921&cust_id=offspldr Atleast 10,000 of these pages have backlinks to my site. My question is the following: Could these tens of thoudands of alpha numeric URLs be detrimental to my sites ranking on Google after the Panda/Penguin updates? SIte traffic dropped from 7,000 per month to about 3,300 after the April Google update. Rewriting content for dozens of pages and adding a blog have only somewhat mitigated the negative effects of Panda/Penguin. Could Google be viewing these links from the third party lisitng provider as a negative when they viewed these links as a plus before? Any downside to removing the third party links and parsing these listings from landlord websited and displaying them as part of my site with their own URL, title tag, description tag? Obviously the new URLS would not be alphanumeric. If these links have not caused the drop in traffic last April, what could be responsible? Thanks in advance for your opinion!!! Alan
Intermediate & Advanced SEO | | Kingalan10 -
Google Sitemap only indexing 50% Is that a problem?
We have about 18,000 pages submitted on our Google Sitemap and only about 9000 of them are indexed. Is this a problem? We have a script that creates a sitemap on a daily basis and it is submitted on a daily basis. Am I better off only doing it once a week? Is this why I never get to the full 18,000 indexed?
Intermediate & Advanced SEO | | EcommerceSite0 -
What is the time of impact of a link regarding to rankings?
I own a website in a pretty semi-competitive market (220 000 searches a month for my main keyword). I've been doing some intensive linkbuilding with some good results. I got around 10 links from organisations, schools and websites of city halls, all of them, the pages being at least pagerank 3 or 4. I let some time pass inbetween, to let Google craw the pages I got the links from and most of them also start to appear in my GWT. The thing is, my rankings havn't improved anything, they are doing quite some Google dancing, staying around position 50. I got the links about 2 months ago (April). When checking other websites in my market, they all have fewer links and mostly low quality. My website itself is also pretty good, all unique content, updated pretty often, around 100 pages of content. All on-site SEO is done as it should be. Am I just being impatient? Or should i start digging deeper?
Intermediate & Advanced SEO | | internetrepublic
What, on average, is the 'impact time' of decent links on your rankings in a semi-competitive market?? Thanks!0