Dynamic URL pages in Crawl Diagnostics
-
The crawl diagnostic has found errors for pages that do not exist within the site. These pages do not appear in the SERPs and are seemingly dynamic URL pages.
Most of the URLs that appear are formatted http://mysite.com/keyword,%20_keyword_,%20key_word_/ which appear as dynamic URLs for potential search phrases within the site.
The other popular variety among these pages have a URL format of http://mysite.com/tag/keyword/filename.xml?sort=filter which are only generated by a filter utility on the site.
These pages comprise about 90% of 401 errors, duplicate page content/title, overly-dynamic URL, missing meta decription tag, etc. Many of the same pages appear for multiple errors/warnings/notices categories.
So, why are these pages being received into the crawl test? and how to I stop it to gauge for a better analysis of my site via SEOmoz?
-
I am having a similar issue. I am getting hit with 404 errors for pages that do not exist anymore of have been fixed. How do I get these to stop showing up?
-
I am having a similar issue. I am getting hit with 403 errors for pages that do not exist anymore of have been fixed. How do I get these to stop showing up?
-
Based on what has happened from time to time on our sites, my guess will be that it is caused by a widget or plug in on your CMS in some way interacting with the Bot. You are likely being crawled on these urls by Google (and producing 404's) as well and it is not likely it is just Roger bot picking it up. There is a lot on the GWMT forums regarding this with a myriad of suggested fixes: mod rewrite, http 410 for 404, etc.
One fix used by many is if your site has relative links you can do full out urls. If you have a ton of pages this might be a bit more of a pain. (Our clients typically have smaller sites so not too much of a problem).
If you are using WordPress (or another CMS that can utilize Extra Options Plug In) it is stated in the forums that the 404's can be stopped by:
In Extra Options plugin: I checked off all of the below options,, the last two do the job.. read about the nonindex nonfollow where appropriate,,, in that plugin,, this could be the answer.
Make meta descriptions from excerpts
Make home meta description from taglineAdd noindex where appropriate
Add nofollow where appropriateAnother option is to insure you have no
There are plenty of bright coders on the moz who can pitch in here and be more eloquent,
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page with "Missing Title Tag" isn't a page
Hello, I am going through the various errors that the Moz Pro Crawl report and some non-existent pages keep coming up in the report. For example, one error category is "Missing Title Tag" with one page identified. But this page http://www.immigroup.com/news/“http%3A/crs.yorku.ca”?page=2 isn't real. It would have been a 404 were there not a redirect for everything that is /news/gobbledygook to /news. So my question is: when moz (or GA for that matter) identifies these pages as "real" and having errors, do I need to take this seriously? And what do I do about it? Thanks! George
Moz Pro | | canadageorge0 -
Canonical URLs all show trailing slash on main site pages - using Yoast SEO for Wordpress - how to correct
We are using Yoast for a number of our sites. We use naked domain as the canonical. I have noticed in the header tags that all our sites show the canonical URLs as having a trailing slash: Example: http;//foxspizzajc.com, when I look at the source code, it shows the canonical as http;//foxspizzajc.com/ Of course, it is much more likely that all sites that link to us will not use the trailing slash - so preferably we do not want that to be the canonical - among other reasons. Does this need to be fixed so the trailing slash is removed? I cannot see how to do this in Yoast SEO or in Permalinks structure for Wordpress. Sorry for my ignorance. Thanks for any help.
Moz Pro | | Adam_RushHour_Marketing1 -
Magento Dynamic Pages Being Indexed
Hi there, I have about 50k Moz medium priority errors in my Crawl Diagnostic report. The bulk of them are classified as "Temporary Redirect" problems. Then if you drill into those further, I can see that the problem urls all kinda are center around: mysite.com/catalogsearch/result.. mysite.com/wishlist.. mysite.com/catalog.. Is this something I should disallow in my Robstxt file? And if so how specific do I get with it.. Disallow /catalogsearch/result/?q= Will listing the /catalogsearch be enough to cover anything after it? thanks
Moz Pro | | Shop-Sq0 -
No Crawl data in dashboard
For the second straight week, I have had no crawl data in my dashboard. It seems like the crawler erased all my results in the pro dashboard. Is there a way to manually recrawl my site, since I will have to wait another week to see if it comes back to earth? Thanks
Moz Pro | | bedwards0 -
I've got quite a few "Duplicate Page Title" Errors in my Crawl Diagnostics for my Wordpress Blog
Title says it all, is this an issue? The pages seem to be set up properly with Rel=Canonical so should i just ignore the duplicate page title erros in my Crawl Diagnostics dashboard? Thanks
Moz Pro | | SheffieldMarketing0 -
SEOMOZ Crawl Test
Guys I really have an issue that i know have but cannot see if that makes sense. Basically 3 months ago i did a site wide 301 from economyleasinguk.co.uk to www.economy-car-leasing.co.uk Every thing looks good get all the correct header responses , all canonicals work perfectly , Google webmaster tools is updated fetch as google bot shows the old site is 301 I tried the seomoz crawl test today on the old domain and got this message Oh no! Looks like the page you were trying to access is temporarily down which at first thought ok because the site was not there it wont do it on an old 301 domain, however i tried it on a domain i know has just been 301'd and i got this message The URL http://www.site1.com/ redirects to http://site2.com/. Do you want to crawl http://site2.com/ instead?
Moz Pro | | kellymandingo
Would you like to:
Continue with www.site1.com
Continue with site2.com I really do not know what to do, its either the redirect script is missing something however its doing what it should or the server is a problem but again its doing what it should so why would SEOMOZ not be able to crawl the old URL like it example site above. Now the strange thing is Open Site Explorer does see the 301 and asks if i want to check the new URL instead Ps the redirect is done using PHP redirect which i am asking him to change to a htaccess as its now on a apache server and was wondering if this could be an issue, all pages go to correct pages as requested Thanks in Advance1 -
Why do pages with canonical urls show in my report as a "Duplicate Page Title"?
eg: Page One
Moz Pro | | DPSSeomonkey
<title>Page one</title>
No canonical url Page Two
<title>Page one</title> Page two is counted as being a page with a duplicate page title.
Shouldn't it be excluded?0 -
Can someone explain why I have been seeing an increase in the number of Linking Page URLs in OSE that link directly to downloads?
Ever since the last couple Linkscape updates when doing competitive back link analysis I have noticed a large increase in the number of URLs of Linking Pages in OSE that result in an immediate file download. The majority of the time these downloads are not common files ie PDF, DOC files. For example, these were all in a competitors back link profile: http://download.unesp.br/linux/debian/pool/main/i/isc-dhcp/isc-dhcp-relay-dbg_4.1.1-P1-17_ia64.deb http://snow.fmi.fi/data/20090210_eurasia_sd_025grid.mat http://www.rose-hulman.edu/class/me/HTML/ES204_0708_S/working model examples/Le25 mad hatter.wm?a=p&id=145880&g=5&p=sia&date=iso&o=ajgrep These are just a few I came across for a single competitor. Is this sketchy black hat SEO, some sort of error, actual links, or something else? Any information on this subject would be helpful. Thank you.
Moz Pro | | Gyi0