Where does the crawler find the urls?
-
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful
however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS
Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak
thanks
-
If you export the crawl diagnostics to a CSV, we do have this information in the last column.
-
thanks for the tips. It is a little frustrating that the information I need has passed through seomoz's system but I guess they don't have the inclination or resources to show us the info
Xenu reckons it can handle 1m urls, we are in the position of not really knowing how many pages our site has!
-
You can pop the links into the free Xenu Link Sleuth* - after you've done a crawl just right-click on the URL you're interested in and click 'URL Properties' - you'll see any inlinks it finds listed there. Depending on the size of your site, it could take a while for the crawl to complete.
You could try the link: property in Google first, though it won't be as thorough as Xenu.
*If you haven't seen it before, don't worry about how the Xenu website looks - the software is kosher - as recommended by many SEOmoz staff. Screaming Frog is a paid alternative (with a limited free version).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
/essions/essions keeps appending to 1 url on our website
Moz keeps giving us an error showing URL too long, when I investigate the offending url, I get this in the crawl. We can't work out what /essions is or why it's appending to the end of the url. Is this a Moz or website issue? <colgroup><col width="841"></colgroup>
Moz Pro | | NickWillWright
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |
| https://www.mywebsite/singita-lebombo-lodge/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/essions/ |0 -
Where do you find good SEO analysts nowadays?
We are looking for talent and having a hard time finding staff with good qualification and passion. Any suggestions or feedback would be awesome. We are located in South Florida. Thanks Antoine
Moz Pro | | adupont650 -
How to find those website who are using our content
I'm tring to figure it out that by using seo moz how can i find all website who are using our content.
Moz Pro | | Showhow20 -
Rogerbot crawls my site and causes error as it uses urls that don't exist
Whenever the rogerbot comes back to my site for a crawl it seems to want to crawl urls that dont exist and thus causes errors to be reported... Example:- The correct url is as follows: /vw-baywindow/cab_door_slide_door_tailgate_engine_lid_parts/cab_door_seals/genuine_vw_brazil_cab_door_rubber_68-79_10330/ But it seems to want to crawl the following: /vw-baywindow/cab_door_slide_door_tailgate_engine_lid_parts/cab_door_seals/genuine_vw_brazil_cab_door_rubber_68-79_10330/?id=10330 This format doesn't exist anywhere and never has so I have no idea where its getting this url format from The user agent details I get are as follows: IP ADDRESS: 107.22.107.114
Moz Pro | | spiralsites
USER AGENT: rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+pr1-crawler-17@moz.com)0 -
Which URL to use for my campaign?
My outdated CMS redirects from a clean url root domain to an ugly url. Should I use the ugly one to start a my campaign? So when you type in www.site.com it redirects to www.site.com/site/c.leJRIROrEpH/b.5699537/k.BEF4/Home.htm Should I use the url it redirects to when starting a campaign?
Moz Pro | | mstanwyck0 -
Does SeoMoz realize about duplicated url blocked in robot.txt?
Hi there: Just a newby question... I found some duplicated url in the "SEOmoz Crawl diagnostic reports" that should not be there. They are intended to be blocked by the web robot.txt file. Here is an example url (joomla + virtuemart structure): http://www.domain.com/component/users/?view=registration and the here is the blocking content in the robots.txt file User-agent: * _ Disallow: /components/_ Question is: Will this kind of duplicated url errors be removed from the error list automatically in the future? Should I remember what errors should not really be in the error list? What is the best way to handle this kind of errors? Thanks and best regards Franky
Moz Pro | | Viada0 -
In Site Explorer My Blog.URL.com Shows "No Data Available for this URL"
Why when I use http://www.opensiteexplorer.org and I'm researching our Blog.URL.com's does the tool say "No Data Available for this URL"? Example: http://www.opensiteexplorer.org/links?site=blog.centurypayments.com
Moz Pro | | cfield_splashmedia.com0 -
In OpenSiteExplorer - how do I find out which in bound links were lost?
http://www.opensiteexplorer.org/www.homefinder.com/a!links The # of inbound root domains is ~3,100 and has remained relatively flat for a while yet we've been acquiring 1000's of new root domains that link to us. That said, we want to get some clarity on why the count has remained flat. It would be helpful if we could see which domains have stopped linking to us from one OSE index to the next. Is it really the case that we're losing links as fast as we're gaining them or is something else going on?
Moz Pro | | homefinder1