What might make Bing.bot find a URL that looks like this on our site?
-
I have been doing something Richard Baxter recently suggested and reviewing our server logs.
I have found an oddity that hopefully some of you smart Mozzers can help me figure out.
Here is the line from the server log (there are many more like this):
157.55.32.166 - - [04/Mar/2013:08:00:59 -0800] "GET /StoreFront/category/www.ccisolutions.com/StoreFront/category/shure-se-earphones HTTP/1.1" 200 94133 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-"
See how the www.ccisolutions.com appears after /StoreFront/category/ ? We used to see weird URLs reported in GWT that looked like this, but ever since we fixed our canonical tags to be absolute instead of relative URLs, they no longer appeared in our Webmaster Tools reports.
However, it seems there is still a problem. Where/how could Bingbot be seeing URLs configured this way? Could it be a server issue, or is it most likely a data problem?
Thanks in advance!
Dana
P.S. Could this be resulting from our massive use of relative URLs all over the site?
-
Hi Streamline,
I thought I would circle back and update everyone as to what I found. You were correct about mal-formed URLs being the culprit of this problem. We have many isolated incidences of URLs for internal links that are missing the "/" at the beginning of a relative URL. There are inconsistencies on the relative URLs all over the site. It's certainly an example of one of many problems that can be caused by using relative rather than absolute URLs.
Since we are in the process of completely re-doing the site and moving to a new platform, it's something we can definitely work to get right during the transition.
Thanks again to you, Daniel and Keri for jumping in with answers.
Dana
-
Thanks to you both Daniel and Streamline.
I believe the problem may have to do with our .htaccess file. I am obtaining a copy of it now.
-
Thanks Keri. That's very helpful. I will do that.
-
Hi Dana,
I agree with Streamline, there will be a hidden issue in you site that it attempting to connect to an under formed link (a URL missing 'http://'). Given there is a number of them in one day I will guess this is happening in a templated page.
Have a look at;
It renders as a page.
The best course of action would be resolve it at the source. If you can pinpoint when this issue is due to occur next, have your developer get each page to append it's URL into the log at the beginning of the page. Then you should be able to determine where the issue is occurring. I am hoping you well see a discernible pattern.
Worse case scenario, possibly a canonical will work, OR create a REGEX redirect to handle this URL pattern in htaccess...
Hope this helps,
Dan
-
Dana, you might also want to contact Bing at https://support.discoverbing.com/eform.aspx?productKey=bingwebmaster&ct=eformts&scrx=1. I sent a quick note on Twitter to Duane Forrester and that's the URL he provided.
-
Can you tell from which page Bing is trying to access these URLs? And it only happened on the 4th and not on any other day? Could it be an issue with the sitemap on that day?
I'm looking at your site now and the page http://www.ccisolutions.com/StoreFront/category/www.ccisolutions.com/StoreFront/category/shure-se-earphones is returning a 200 response code to me, not a 404 code. The key is to figure out how Bing discovered the URL in the first place...
-
While this is certainly a possibility, I'm not sure it's the cause of the problem. If this were the case, wouldn't it most likely cause a 404 error, instead of rendering the proper page (albeit with a very funky URL) and a 200 status code?
The other thing making me think it's not just a poorly constructed link on the site is that there are over 100 of these in the server log, from just one day.
Thoughts?
-
I'm willing to bet that on some page of your site, there is a link pointing to www.ccisolutions.com/StoreFront/category/shure-se-earphones which is missing the "http://" at the beginning. So if Bing or a user tried to click on that link, they would be directed to /StoreFront/category/www.ccisolutions.com/StoreFront/category/shure-se-earphones instead of the correct link.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does a site that is worse than mine by every objective measure I can find, keep outranking me in search?
I’ve been working on educating myself about SEO all day, again. All-Star Telescope up in Canada. We have a competitor that consistently ranks #1 and I don't get it. Their site is full of duplicate content (straight copy and paste from the manufacturer site). They don't have any meaningful blog or video content to add relevance or value to their site. We have higher page authority, higher domain authority, and they keyword analyzer in moz says that our page is higher quality than the the competitors page. Our site is slow, but theirs is slower. I can’t find a single metric on any tool (ubbersuggest, Moz, ahrefs, semrush) that says Telescopes Canada is a better site, or has a better NexStar 8SE product page (a popular telescope). Here’s the link to Telescope Canada’s page for their Celestron 8SE: https://telescopescanada.ca/products/celestron-nexstar-8se-computerized-telescope-11069?_pos=1&_sid=f0aa91cc2&_ss=r Here’s a link to the Celestron 8SE page from the manufacturer website: https://www.celestron.com/products/nexstar-8se-computerized-telescope?_pos=1&_sid=56abdabd4&_ss=r#description Telescopes Canada has just copied and pasted. There is no original content aside from adding the shipping and return policy to the tab, and having some options for selecting accessories on the page. Here is our page: https://all-startelescope.com/products/celestron-nexstar-8se Our titles are good, our metadata is good (but I don’t think that’s been a serious ranking factor for about ten years). The text is original, it’s relevant, we have healthy internal links to the page. We have invensted in some excellent blog content, we’re adding new products to the website so that we rank for more keywords. All of those things are helping, but I fundamentally don’t understand why Telescopes Canada is #1 almost across the board on every key product in our market. There is something that I’m not seeing here, something that isn't being captured by the tools that I have. Is it simple the fact that they get more traffic? Is that why some people go and buy traffic? Can you see any metric, any tool in your toolbox that indicates why they rank at the top, or even higher than we do for in these search terms specific to that product: Celestron NexStar 8SE
Technical SEO | | nkennett
NexStar 8SE
Celestron NexStar 8SE Canada
NexStar 8SE Canada We've worked with two highly ranked SEO's to try and figure this out, one in Canada, and one in the USA. I haven't seen a confidence inspiring answer from either of them. Posting on a forum is a bit of an act of desperation, I'll continue to work the problem, but it's discouraging to see the leader in my industry look like he's just phoning it in with his website.1 -
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
URL Format
Often we have web platforms that have a default URL structure that looks something like this www.widgetcompany.co.uk/widget-gallery/coloured-widgets/red-widgets This format is quite well structured but would it just be more effective to be www.widgetcompany.co.uk/red-widgets? I realise that it may depend on a lot of factors but generally is it better to have the shorter URL if targeting the key phrase "red widgets" One thing, it certainly looks a bit keyword stuffy with all those "widgets"
Technical SEO | | vital_hike0 -
Why would I suddenly start seeing a spike in hits from particular bots (specifically rogerbot, google, bing, and yahoo)?
We have seen consistent network traffic over the past month, then starting yesterday, huge spikes in hits (hits as in crawls to pages causing an increase in megabytes downloaded) started coming in from Rogerbot, Google, Bing, and Yahoo. A specific example from Rogerbot is as follows: rogerbot/1.1+(http://moz.com/help/guides/search-overview/crawl-diagnostics#more-help,+rogerbot-crawler+pr2-crawler-104@moz.com) Useragent from the bot IP address: 54.226.73.52 Domain / hostname: ec2-54-226-73-52.compute-1.amazonaws.com Physical location: United States flag United States, VA, Ashburn We've have thought about doing a crawl-delay to prevent these bots from hitting us so hard, but that still doesn't help us answer why this even started in the first place. Any clue on what may be going on here?
Technical SEO | | eTundra0 -
Site hacked, but can't find the code
Discovered some really odd words ranking for us in WMT. Looked further and found pages like this www.pdnseek.com/wll/canadian-24-hour-pharmacy. When you click it it redirects to the home page. The developers can't find /wll anywhere on the site. The pages are indexed and cached. Looked at the back links in moz and found many backlinks to our site from other sites using URLs like this. The host says there is nothing on the server, but where else could it be. We've run virus scans, nothing, looked through source code, nothing. Anyone with some idea? www.pdnseek.com is the URL
Technical SEO | | Britewave0 -
What changes do i need to make to my site to get into google news
Hi, when we had the old design, we were in google news but then when we upgraded our site, we had a major problem which forced us to have to redesign our site. Since then we have not been included in google news and we would like to get back in. We only want to be in google news for the following page http://www.in2town.co.uk/Latest-News-Headlines But for some reason, no matter what we do we keep getting knocked back. I would love to know what we should be doing to get into google news and see what the problems are. We have moved to a bigger dedicated server to increase speed so i know it is not that. Any help would be great Also is there an alternative to google news that i can get our site into to generate traffic and to get our news stories straight out to people Hi, Thank you for your note. We appreciate your interest in sharing your content with us. However, when we reviewed your site, we found that we cannot include it in Google News at this time. We have certain guidelines in place regarding the quality of sites which are included in Google News. Please feel free to review these guidelines at the following link: http://www.google.com/support/news_pub/bin/answer.py?hl=en&answer=40787 We know it can be frustrating to not have more information about this but we appreciate your efforts and understanding. We will log your site for future consideration. Please keep in mind that we will be unlikely to review your site for at least 60 days following this email. Thanks for your understanding and your continued interest in Google News. Regards,
Technical SEO | | ClaireH-184886
The Google News Team0 -
Why is this site ranking so good?
Site in question: http://bit.ly/aBvVbm Our main competitor in the UK seems to be ranking extremely good for the keyword "jigsaw puzzles" even though their linking profile doesn't seem all that great? They mainly have site-wide links on 2 of their other ecommerce sites which seem to be given them their ranking power as this equals to 100's of links. Does sitewide links on 2 sites really give this much ranking power or am I missing something?
Technical SEO | | Tonyy30 -
What to do with extremely high number of URLs on your site?
Here is the situation: The site has tons of business and personal profiles, the information needed to be categorized as such directories were created in an attempt to keep the URL structure clean - so for example: www.abc.com/product/um/name-here/city-name/state/lastname:3458765 Each profile has a unique ID#, and for some reason there needed to be a category for a user in this case /um/ stands for user name. Webmaster tool steps to resolve state to use an rel=canonical which can be done for that directory /um/ but I am concerned about the bot not being able to find the other pages beyond that directory, like the profile name, city, state associated. So I guess my ultimate question is if I use rel=canonical will the rest of the content not get crawled or indexed as well?
Technical SEO | | TLO0