What might make Bing.bot find a URL that looks like this on our site?
-
I have been doing something Richard Baxter recently suggested and reviewing our server logs.
I have found an oddity that hopefully some of you smart Mozzers can help me figure out.
Here is the line from the server log (there are many more like this):
157.55.32.166 - - [04/Mar/2013:08:00:59 -0800] "GET /StoreFront/category/www.ccisolutions.com/StoreFront/category/shure-se-earphones HTTP/1.1" 200 94133 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-"
See how the www.ccisolutions.com appears after /StoreFront/category/ ? We used to see weird URLs reported in GWT that looked like this, but ever since we fixed our canonical tags to be absolute instead of relative URLs, they no longer appeared in our Webmaster Tools reports.
However, it seems there is still a problem. Where/how could Bingbot be seeing URLs configured this way? Could it be a server issue, or is it most likely a data problem?
Thanks in advance!
Dana
P.S. Could this be resulting from our massive use of relative URLs all over the site?
-
Hi Streamline,
I thought I would circle back and update everyone as to what I found. You were correct about mal-formed URLs being the culprit of this problem. We have many isolated incidences of URLs for internal links that are missing the "/" at the beginning of a relative URL. There are inconsistencies on the relative URLs all over the site. It's certainly an example of one of many problems that can be caused by using relative rather than absolute URLs.
Since we are in the process of completely re-doing the site and moving to a new platform, it's something we can definitely work to get right during the transition.
Thanks again to you, Daniel and Keri for jumping in with answers.
Dana
-
Thanks to you both Daniel and Streamline.
I believe the problem may have to do with our .htaccess file. I am obtaining a copy of it now.
-
Thanks Keri. That's very helpful. I will do that.
-
Hi Dana,
I agree with Streamline, there will be a hidden issue in you site that it attempting to connect to an under formed link (a URL missing 'http://'). Given there is a number of them in one day I will guess this is happening in a templated page.
Have a look at;
It renders as a page.
The best course of action would be resolve it at the source. If you can pinpoint when this issue is due to occur next, have your developer get each page to append it's URL into the log at the beginning of the page. Then you should be able to determine where the issue is occurring. I am hoping you well see a discernible pattern.
Worse case scenario, possibly a canonical will work, OR create a REGEX redirect to handle this URL pattern in htaccess...
Hope this helps,
Dan
-
Dana, you might also want to contact Bing at https://support.discoverbing.com/eform.aspx?productKey=bingwebmaster&ct=eformts&scrx=1. I sent a quick note on Twitter to Duane Forrester and that's the URL he provided.
-
Can you tell from which page Bing is trying to access these URLs? And it only happened on the 4th and not on any other day? Could it be an issue with the sitemap on that day?
I'm looking at your site now and the page http://www.ccisolutions.com/StoreFront/category/www.ccisolutions.com/StoreFront/category/shure-se-earphones is returning a 200 response code to me, not a 404 code. The key is to figure out how Bing discovered the URL in the first place...
-
While this is certainly a possibility, I'm not sure it's the cause of the problem. If this were the case, wouldn't it most likely cause a 404 error, instead of rendering the proper page (albeit with a very funky URL) and a 200 status code?
The other thing making me think it's not just a poorly constructed link on the site is that there are over 100 of these in the server log, from just one day.
Thoughts?
-
I'm willing to bet that on some page of your site, there is a link pointing to www.ccisolutions.com/StoreFront/category/shure-se-earphones which is missing the "http://" at the beginning. So if Bing or a user tried to click on that link, they would be directed to /StoreFront/category/www.ccisolutions.com/StoreFront/category/shure-se-earphones instead of the correct link.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Bing search results - Site links
My site links in Bing search results are pulling through the footer text instead of the meta description (see image). Is there any way of controlling this? 2L2VusT
Technical SEO | | RWesley0 -
Https Cached Site
Hi there, I recently switch my site to a new ecommerce platform which hosts the SSL certificate on their end so my site no longer has the HTTPS status unless a user is going through the checkout. Google has cached the HTTPS version of the site so in search it comes up sometimes which leads to a nasty warning that the site may not be what they are looking for. Is there a way to tell google NOT to look at the https version of the site anymore? Thanks! Bianca
Technical SEO | | TheBatesMillStore0 -
AJAX and Bing Indexation
Hello. I've been going back and forth with Bing technical support regarding a crawling issue on our website (which I have to say is pretty helpful - you do get a personal, thoughtful response pretty quickly from Bing). Currently our website is set with a java redirect to send users/crawlers to an AJAX version of our website. For example, they come into - mysite.com/category..and get redirected to mysite.com/category#!category. This is to provide an AJAX search overlay which improves UEx. We are finding that Bing gets 'hung up' on these AJAX pages, despite AJAX protocol being in place. They say that if the AJAX redirect is removed, they would index and crawl the non-AJAX url correctly - at which point our indexation would (theoretically) improve. I'm wondering if it's possible (or advisable) to direct the robots to crawl the non-AJAX version, while users get the AJAX version. I'm assuming that it's the classic - the bots want to see exactly what the users see - but I wanted to post here for some feedback. The reality of the situation is the AJAX overlay is in place and our rankings in Bing have plummeted as a result.
Technical SEO | | Blenny0 -
Variables in URLS?
How much do variables in URLs hurt indexing of that page? I'm worried that with this huge string of variables that the pages won't get indexed. Here's what I think we should have: http://adomainname.com/New/Local/State/City/Make/Model/ Here's the current URL:http://adomainname.com/New/Local/MN/Bayport/Jeep/Liberty?curPage=1&pageResultSize=50&orderDir=DESC&orderBy=ModifiedDate&conditionId=1&makeId=7&modelId=141&stateProvinceName=Minnesota&mc=1
Technical SEO | | CFSSEO0 -
Site maintenance and crawling
Hey all, Rarely, but sometimes we require to take down our site for server maintenance, upgrades or various other system/network reasons. More often than not these downtimes are avoidable and we can redirect or eliminate the client side downtime. We have a 'down for maintenance - be back soon' page that is client facing. ANd outages are often no more than an hour tops. My question is, if the site is crawled by Bing/Google at the time of site being down, what is the best way of ensuring the indexed links are not refreshed with this maintenance content? (ie: this is what the pages look like now, so this is what the SE will index). I was thinking that add a no crawl to the robots.txt for the period of downtime and remove it once back up, but will this potentially affect results as well?
Technical SEO | | Daylan1 -
I am Posting an article on my site and another site has asked to use the same article - Is this a duplicate content issue with google if i am the creator of the content and will it penalize our sites - or one more than the other??
I operate an ecommerce site for outdoor gear and was invited to guest post on a popular blog (not my site) for a trip i had been on. I wrote the aritcle for them and i also will post this same article on my website. Is this a dup content problem with google? and or the other site? Any Help. Also if i wanted to post this same article to 1 or 2 other blogs as long as they link back to me as the author of the article
Technical SEO | | isle_surf0 -
Trailing Slashes In Url use Canonical Url or 301 Redirect?
I was thinking of using 301 redirects for trailing slahes to no trailing slashes for my urls. EG: www.url.com/page1/ 301 redirect to www.url.com/page1 Already got a redirect for non-www to www already. Just wondering in my case would it be best to continue using htacces for the trailing slash redirect or just go with Canonical URLs?
Technical SEO | | upick-1623910 -
Should I create mini-sites with keyword rich domain names pointing to my main site?
Hi, I'm new to seomoz (and seo in general) and loving it so far. My main domain name is more of a brandname than a search engine friendly list of keywords. I rank well for some keywords I optimized for, and less so for the more competitive keywords. I was wondering if making one page minisites hosted on keyword rich domain names could help in this respect? What I want to do is just have a single page with a few paragraphs of content and links to the main site. I am not looking for links to boost the main site, just for the minisites to do better for several keywords. Will this help? Is this ok, or against some Google policy? Can this hurt the main site rankings? Thank you! **Edit: **I noticed that sites ranking above me on the first page for some keywords have much less on-page elements than my page, have about the same domain trust and also very little inbound links. The only factor I can see is the exact match of keywords in the domain name.
Technical SEO | | Eladla1