When rogerbot tried to crawl my site it gets a 404\. Why?
-
When rogerbot tries to craw my site it tries http://website.com. My website then tries to redirect to http://www.website.com and is throwing a 404 and ends up not getting crawled. It also throws a 404 when trying to read my robots.txt file for some reason. We allow rogerbot user agent so unsure whats happening here. Is there something weird going on when trying to access my site without the 'www' that is causing the 404? Any insight is helpful here.
Thanks,
-
Hey Dan,
So that's the problem. Our site is up and i can manually navigate to anything including the robots.txt file. I've done this multiple times throughout the day and different days as well and manually triggered different Moz crawls at different times so i've ruled out an outage.
-
The robots.txt 404 could be a temporary outage, but it's a bit hard to tell without being able to see the actual site and robots.txt. Try checking the site is up, and you can access the robots.txt then requesting a new Moz crawl...
I do have one client who insists on blocking everything and then allowing specific crawlers, and allowing rogerbot seems to have worked fine to date.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hi! I first wrote an article on my medium blog but am now launching my site. a) how can I get a canonical tag on medium without importing and b) any issue with claiming blog is original when medium was posted first?
Hi! As above, I wrote this article on my medium blog but am now launching my site, UnderstandingJiuJitsu.com. I have the post saved as a draft because I don't want to get pinged by google. a) how can I get a canonical tag on medium without importing and b) any issue with claiming the UJJ.com post is original when medium was posted first? Thanks and health, Elliott
Technical SEO | | OpenMat0 -
White listing a site
A new clients site is blocked by a lot of Firewalls. And I can't work out why, the content is family friendly they sell nursery equipment. I've run it through the Google checker and there is no malicious software found on the site. Can anyone tell me what I need to do to get this site unblocked? The url is http://knuma.co.uk/
Technical SEO | | Marketing_Optimist0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
Then why my site is not ranking
My website's DA and PAs are good compare with my competitors. Then why my site is not ranking.
Technical SEO | | Somanathan0 -
Redirecting 404
Hi. I'm working on a wordpress site, which got some old deleted pages indexed and now shows a 404 (also in the results) As these old pages earlier got content and probably also some links pointing towards it, what would then be best practice to do? Should i make an 301 redirect? Make the 404 noindex?
Technical SEO | | Mickelp0 -
404 Errors - How to get rid of them?
Hi, I am starting an SEO job on an academic site that has been completely redone. The SEOMoz crawl detected three 404 Errors to pages that cannot be found anywhere on either Joomla or the server. What can I do to solve this? Thanks!!
Technical SEO | | michalseo0 -
How to turn WP site into Ecom site?
I have a couple of old wordpress sites that are old affiliate blogs. I currently sell products that are on amazon on these sites and sell quite a bit of volume. I have found a source and can afford the inventory to replace Amazon with my own product. So the dilema is how to turn these wordpress sites into ecommerce sites. The thing I am worried most about is that each site gets about 100-200 visitors a day for great buying keywords. I obviously don't want to lose my rankings. What are the options of turning a wordpress site into a store. I am not interested in plugins or some of the other solutions that make the store look very cheap and I would assume horribly convert. If you have inner pages ranking for keywords how does that work? Do the post pages become product pages? So to sum up I guess I am asking, what are the options that are of the higher quality that will also help me keep my rankings? Thanks
Technical SEO | | PEnterprises0 -
Crawl report showing only 1 crawled page
Hi, I´m really new to this and have just setup some Campaigns. I have setup a Campaign for the root domain: portaldeldiablo.com.uy which returned only 2 crawled pages.. As this page had a 301 redirect from the non-www to the www version, I deleted this Campaign and setup a new one for www.portaldeldiablo.com.uy which returned only 1 crawled page.. I really don´t know why is my website not being crawled..Thanks in advance for your help.
Technical SEO | | ceci27100