Where does the crawler find the urls?
-
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful
however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS
Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak
thanks
-
If you export the crawl diagnostics to a CSV, we do have this information in the last column.
-
thanks for the tips. It is a little frustrating that the information I need has passed through seomoz's system but I guess they don't have the inclination or resources to show us the info
Xenu reckons it can handle 1m urls, we are in the position of not really knowing how many pages our site has!
-
You can pop the links into the free Xenu Link Sleuth* - after you've done a crawl just right-click on the URL you're interested in and click 'URL Properties' - you'll see any inlinks it finds listed there. Depending on the size of your site, it could take a while for the crawl to complete.
You could try the link: property in Google first, though it won't be as thorough as Xenu.
*If you haven't seen it before, don't worry about how the Xenu website looks - the software is kosher - as recommended by many SEOmoz staff. Screaming Frog is a paid alternative (with a limited free version).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No english Url = No sense symbols?
Hey there, i have a greek content website and some of the urls are greek (I did this for better SEO score).
Moz Pro | | tsalatzi
When i am using the analyze page issues and write down the greek url it doesnt find it (for example if i wrote down "www.euroulakia.com/πως-να-βγαλω-λεφτα" it displays me back "Sorry! We weren't able to find that page when we crawled your site") BUT when i just copy paste it from the url the moz finds it. However when i copy-paste the url changes the greek characters to no-sense symbols (for example the same above url becomes : http://www.euroulakia.com/πως-να-βγαλω-λεφτα) As you can see the url is written with non-sense symbols.. My question is if google see this no-sense symbol as well instead of the greek characters? I am using Joomla and i have: Search Engine Friendly URLs and Unicode Aliases setting to yes. Can anyone please help me with this because i have a feeling that something is wrong here. Thanks in advance0 -
MOZ Toolbar 3.0: Can't Find the Meta Description On My Page, Why?
Hey, The new MOZ toolbar is unable to identify the meta description on any of the pages for my doman www.1099pro.com. Is there any reason that someone can see why that would be? The tag should be correct on all pages. Thanks! -Mike
Moz Pro | | Stew2220 -
No (seomoz) crawler report since 7th may !!
Hi Mozteam, I added a site less than 200 pages in the tool "seomoz crawler", at May 7. We are May 15 and the tool always displays "crawl in progress." Do you have a problem about this tool? This is embarrassing ... Thank you for your reply. David France
Moz Pro | | DavidEichholtzer0 -
How to set the crawler or reports to ignore
I have a mobile version of a site with a URL string that disables the mobile view on smartphones (view full site), string is like this example.com/page-name.html?mobile=off I need the seomoz pro reports or crawler to ignore it because the crawler visits both versions of the site then reports them as duplicate content. Is there a setting page I haven't visited yet that will set this?
Moz Pro | | Str82u0 -
Can't find duplicate page content
Hi all. I'm trying to create a report to list all of my site's duplicate content that SEOmoz says we have. However when I click on the link it just shows me the title and description of the page. I don't know what the other page is that has duplicate content or what the duplicate content is. Where do I find this information? Thanks in advance!
Moz Pro | | Info12340 -
Can I specify a url for a keyword in the rank checker tool?
Hello! I'm new to seomoz and excited to learn the system. I created a campaign and added keywords but I'm not clear how the seomoz campaign rankings tool works. As an example, one of my keywords 'cigar cutters' is reporting at position 20 for url http://www.cheaphumidors.com/c_guillotine-cutters.html. However, I think it would be better target to focus that keyword on http://www.cheaphumidors.com/c_cutters.html. as a search for 'cigar cutters' could encompass either a guillotine cutter, punch cutter or cigar scissors. Is there any way to assign http://www.cheaphumidors.com/c_cutters.html to the term 'cigar cutters' in the campaign ranking report? Brian
Moz Pro | | davesabot0 -
SEOmoz crawler bug?
I just noticed that a few of my campaigns have number of pages crawled 1. Can someone tell me what this is.... from 5 campaigns 2 have only one pages crawled from which one is an online shop with over 2000 products 🙂
Moz Pro | | mosaicpro0 -
How to remove Duplicate content due to url parameters from SEOMoz Crawl Diagnostics
Hello all I'm currently getting back over 8000 crawl errors for duplicate content pages . Its a joomla site with virtuemart and 95% of the errors are for parameters in the url that the customer can use to filter products. Google is handling them fine under webmaster tools parameters but its pretty hard to find the other duplicate content issues in SEOMoz with all of these in the way. All of the problem parameters start with ?product_type_ Should i try and use the robot.txt to stop them from being crawled and if so what would be the best way to include them in the robot.txt Any help greatly appreciated.
Moz Pro | | dfeg0