Crawl Diagnostics Error Spike
-
With the last crawl update to one of my sites there was a huge spike in errors reported. The errors jumped by 16,659 -- majority of which are under the duplicate title and duplicate content category.
When I look at the specific issues it seems that the crawler is crawling a ton of blank pages on the sites blog through pagination.
The odd thing is that the site has not been updated in a while and prior to this crawl on Jun 4th there were no reports of these blank pages.
Is this something that can be an error on the crawler side of things?
Any suggestions on next steps would be greatly appreciated. I'm adding an image of the error spike
-
This would be another issue. I would need to look at the code to give you more insight. But off the bat I assume that this is an issue regarding mislabeling the rel=next and rel=prev. They can be kind of tricky to work with on a broad based update due to the fact that they are intended to refer to specific pages. If you do not have the end page labeled Google says :
"When implemented incorrectly, such as omitting an expected rel="prev" or rel="next" designation in the series, we'll continue to index the page(s), and rely on our own heuristics to understand your content."
I would look into this first. If the answer is still elusive to you the next option would probably be finding a different set of eyes on the code to see if there are any minor oversights that you may have overlooked.
-
One last thing;
It seems that I have a game plan for addressing this issue, but as I think about this one thing has me concerned in the way Roger crawled the site.
The site has maybe a total of 100 articles, which would account for ?Page=10, but what I'm seeing is errors on ?Page=104. When you look at that page its a blank. Where is Roger coming up with that parameter?
Do you think this is a Roger issue or something else?
-
Makes sense
-
Unless you have some super secret page that is buried somewhere deep down in your site that you can ONLY get to from those pages, it wouldn't make sense to have them follow the links. All that will happen is they land on the next page, scrape it to the noindex tag and move on. They won't index and this just waste your sites bandwidth and slows everything else down. If it's a noindex it should usually be a nofollow unless you are looking to track conversions or some other specific only navigable through those pages.
-
Hey Jake;
Whats your option of using "nofollow" vs "follow" on the pages i'm blocking from indexing? Is there a reason to prevent them from following the links on these pages?
-
Cool glad we could help!
if you want to clean up your code and are posting site wide for them I would recommend the none tag
Accounts for both
noindex, nofollow
-
Thank you again for the input, the goal here is not provide accurate reporting and ensure that the site conforms to the search engines requirements.
Currently the "?page=" parameter is not blocked through . it sounds like this maybe the issue.
I will update the code to address that and see what kind of results we get with the next update. I think this is best addressed at the code level, rather then the robots.txt.
Thanks
-
Rodger crawls like the Google bot and takes his hints from the robot.txt file. So whatever Rodger is seeing is usually what the other spiders are seeing as well. From time to time I have encountered slight glitches to the SEOmoz crawler as they change and update their algorithm.
When it comes down to it, Google examines a link profile through a microscope akin to the Large Hadron Collider. where as we have to examine it through a magnifying glass from 1935.
The wonderful people here at SEOmoz are always trying to give us a better view, but it is still imperfect. I would say if all else fails and this report continues to show errors in moz then get your reports for your clients directly from webmaster tools.
-
** How do I tell Roger no to crawl these blank pages?**
Any easy solution is to block roger in robots.txt
User-agent: rogerbot
Disallow: [enter pages you do not wish to be crawled]
But a better solution would be to fix the root problem. If your only goal is to provide clean reporting to your client the above will work. If your goal is to ensure your site is crawled correctly by Google/Bing, then Jake's suggestion will work. You can help Google and Bing understand your site by telling them how to handle parameters.
I would prefer to fix the root issue though. Do the pages which are being reported as duplicate content have the "noindex" tag on them? If so, you can report the issue to the moz help desk (help@seomoz.org) so they can investigate the problem.
-
Hey Jake;
Thanks for your feedback, i did make some changes to the code (posted in the reply to Jamie). I'll take a closer look at the webmaster tools to make sure things are OK on that end.
FYI: The "rel=prev / rel=next tags" are implemented
I added code to manage
to pages that are accessed through
- /Blog/?tag=
- /Blog/category/
- /Blog/archive.aspx
As a secondary concern, with Roger now reporting all these issues in SEOMoz, I provide these reports to my clients and thus having 16k errors is not a good PR thing. How do I tell Roger no to crawl these blank pages?
-
It looks like Rodger found his way into your variable URLs!
This could definitely cause a problem if the engine crawlers are seeing this path as well. Have you made any changes to the code on your site or the URL structure lately?
Regardless, you might want to examine in your Webmaster Tools for both Google and Bing.
For Google you will want to check the blocked URL's under the Health menu. This will give you the information on what pages are and are not blocked. If you notice that the Head Match term you are looking to exclude is not listed make sure that you upload the term to the robots.txt file on your site. Other fixes for this include canonicalisation tagging or the implementation of the rel=prev / rel=next tags. There are a few other ways that are more complicated and I recommend avoiding unless absolutely necessary.
But good news everyone! Google has a few ways to go about fixing the indexation.
Bing is a little Different but just as easy. In the Bing Webmaster Tools under the Index tab, there is a tool called URL Nor<a class="cpad Subject message-low-priority-icon marginleft5 bold">malization</a> you can tell the crawlers to exclude a portion of the query string without changing anything on your database. It also automatically finds and suggests <a class="cpad Subject message-low-priority-icon marginleft5 bold">query parameters for normalization as well. This is a recent change for Bing and could account for the sudden jump in warnings.</a>
I hope this helps and you keep being awesome!
-
Hey Jamie;
In an effort to block crawling of pages on the blog that are essentially duplicating content I added coded (on (4/16) to insert :
to pages that are accessed through
/Blog/?tag=
/Blog/category/
/Blog/archive.aspx
I did not do this for
/Blog/?page=
There were no changes to the robots.txt
There were no updates to canonical tag
There were no updates to pagination
Thanks for your prompt reply
-
Can you share what changes have been made to the site? A few ways this can happen are:
-
a change to the robots.txt file
-
a change to your site's template either removing a canonical tag, a noindex tag, or altering your pagination in any way such as modifying paginated titles
-
resolving an onsite issue which prevented crawling of these pages
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawler errors or Page Load Time? What affect more to SEO
Hello, I have a page with a forum and at this moment the moz report says that have 15.1k of issues like url too long, meta noindex, title too long etc. But this page have a load time realy sloooow with 11 seconds. I know i need fix all that errors (i'm working on this) but... What is more important for SEO? The page load or that type of error like duplicate titles etc. Thank you!
Moz Pro | | DanielExposito1 -
Why for all my campaigns it is always shows that the number of pages crawled as 1
Hi All, I am new to moz. Can anyone help to solve my problem. I am signed up for a pro account and taking a free trial. and I've created 3 campaigns, for everything, the number of pages crawled is shown as 1 (i.e there are only one page is crawled for a given url, it doesn't crawl my pages comes through that url, like pagination and etc.) Anyone please tell me, Is this is error due to my site or any activity in my campaign.
Moz Pro | | sandy7th0 -
1 page crawled ... and other errors
1. Why is only one (1) page crawled every second time you crawl my site? 2. Why do your bot not obey the rules specified in the robots.txt? 3. Why does your site constantly loose connection to my facebook account/page? This means that when ever i want to compare performance i need to re-authorize, and therefor can not see any data until next time. Next time i also need to re-authorize ... 4. Why cant i add a competitor twitter account? What ever i type i get an "uh oh account cannot be tracked" - and if i randomly succeed, the account added never shows up with any data. It has been like this for ages. If have reported these issues over and over again. We are part of a large scandinavian company represented by Denmark, Sweden, Norway and Finland. The companies are also part of a larger worldwide company spreading across England, Ireland, Continental Europe and Northern Europe. I count at least 10 accounts on Seomoz.org We, the Northern Europe (4 accounts) are now reconsidering our membership at seomoz.org. We have recently expanded our efforts and established a SEO-community in the larger scale businees spanning all our countries. Also in this community we are now discussing the quality of your services. We'll be meeting next time at 27-28th of june in London. I hope i can bring some answers that clarify the problem we have seen here on seomoz.org. As i have written before: I love your setup and you tools - when they work. Regretebly, that is only occasionally the case!
Moz Pro | | alsvik1 -
How do I force a crawl?
In the campaign overview it reads that 0 pages were crawled. Also got an email saying that a comprehensive audit will be done in 7 days. But the 'crawl in progress' wheel disappeared. I think it stopped, and I need to submit that report to substantiate buying the tool! How do I force a crawl?
Moz Pro | | ilhaam0 -
SEOmoz showing crawl errors but webmastertools says no errors, need help!
Hi this is my first question and i couldnt find a similar question on here. basically i have a clients website that is showing 150 duplicate page titles and content errors plus others. SEOmoz analysis is showing me for example is 3 duplicate hompage URLS: 1.www.domain.com 2.domain.com 3.www.domain.com/index.html all 3 are the same page. after explaining to the guy (who built the website) the errors, he ensured me that the main URL is URl 1. and the other 2 are 301 redirects. however SEOmoz analysis doesnt seem to change the results and webmastertools doesnt seem to show any errors at all. also if i try all 3 URL's there are no redirects to URL 1. any help or clarity would be awesome! Thanks e-bob
Moz Pro | | bobsnowzell0 -
Why am I getting duplicate content errors on same page?
In the SEOmoz tools I am getting multiple errors for duplicate page content and duplicate page titles for one section on my site. When I check to see which page has the duplicate title/content the url listed is exactly the same. All sections are set up the same, so any ideas on why I would be getting duplication errors in just this one section and why they would say the errors are on the same page (when I only have one copy uploaded on the server)?
Moz Pro | | CIEEwebTeam0 -
Do crawl reports see canonical tags?
Greetings, I just redesigned my site, www.funderstanding.com, and have the old site pointing to the new site via canonical URLs. I had a new crawl test run and it showed a large amount of duplicate content. Does the SEO Moz crawl tool validate canonical urls and adjusts the duplicate content count or is this note considered? FYI, I sent from no duplicate content to having 865 errors since the redesign went up so that seems suspicious. I would think though that assuming the canonical tag were used properly, and I hope it is?, that this would not be a problem?? All help with this is most appreciated. Eric
Moz Pro | | Ericc220 -
Crawl Rate for Lower Page Authority Websites
Hi,At thumbtack.com we get tons of links from low (or no) page authority websites, and I'm wondering what the crawl rate of those links looks like. I know Google pulls in the web at an astonishing rate, but I'd imagine they aren't re-crawling lower PA very frequently.Are they discovering these links a week after they're posted? A month? More? I spent a while looking around for histograms of actual crawl rates and found surprisingly little. I'd love to see average crawl rate by Domain or Page Authority if that exists anywhere.
Moz Pro | | Thumbtack
Thanks!-MichaelP.S. Here are some random examples of the types of pages with inbound links I'm talking about. Normally we wouldn't spend too much time thinking about these, but there's just so many of them we can't ignore it!- http://www.majestic-cleaners.webs.com/- http://domchieraphotography.blogspot.com/- http://charlottepiano.musicteachershelper.com/- http://pin-upgirlphotography.vpweb.com/default.html- http://jfaithful.weebly.com/0