Where does the crawler find the urls?
-
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful
however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS
Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak
thanks
-
If you export the crawl diagnostics to a CSV, we do have this information in the last column.
-
thanks for the tips. It is a little frustrating that the information I need has passed through seomoz's system but I guess they don't have the inclination or resources to show us the info
Xenu reckons it can handle 1m urls, we are in the position of not really knowing how many pages our site has!
-
You can pop the links into the free Xenu Link Sleuth* - after you've done a crawl just right-click on the URL you're interested in and click 'URL Properties' - you'll see any inlinks it finds listed there. Depending on the size of your site, it could take a while for the crawl to complete.
You could try the link: property in Google first, though it won't be as thorough as Xenu.
*If you haven't seen it before, don't worry about how the Xenu website looks - the software is kosher - as recommended by many SEOmoz staff. Screaming Frog is a paid alternative (with a limited free version).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Recovering rankings after a botched url change
Hi there, I have for a long time had a bicycle maintenance website at madegood.org. Over the years the film branch of this business has taken off and moved in a slightly different direction, so I thought in March I decided to move madegood.org to madegoobikes.com, and create a new website for my film business at madegood.com. I thought I did a good job of telling google about my change of domain, but my rankings completely died, so about a month I moved madegoodbikes.com back to madegood.org. So far I haven't seen any sign of a recovery in my rankings, I'm getting almost no visits. I've check all my top pages on OSE and everything seems to be in place. https://moz.com/researchtools/ose/pages?site=http%3A%2F%2Fwww.madegood.org%2F&no_redirects=0&sort=page_authority&filter=all&page=1 Is it normal to wait over a month for my rankings to recover, or is there anything else I should be doing? Any tips/ideas/advice whatsoever will of huge help!
Moz Pro | | madegood0 -
My index URL was removed from Google, but all others remain in the search engines
HI All, My site was ranking very well and was in 1st page of google for most of my keywords. Couple of weeks back we did some update to the site and moved it to new hosting and from then onwards I dont see my site home page in Google ranking . My Website Name is : royalevents.com.au. It used to be in 1st of Google for keywords like wedding Mandaps, Indian Wedding Mandaps etc, Would be great if some one helps us to figure out whats gone wrong .. I also did Webmaster Fetch as Google but nothing happened. Thanks
Moz Pro | | Verve-Innovation0 -
Find ClickBank Affiliates
Hey everyone, just a quick question. I am interested in using the Open Site Explorer tool to find ClickBank affiliates (advertisers not merchants) for current ClickBank products. Is there a way to do this? What I have been doing: 1.) Run an inbound link search for current products 2.) Manually go through all the back links and pull links that could be from affiliate sites (and delete those that look like links from directories). Thoughts? Thanks!
Moz Pro | | goproworkouts0 -
Order of urls in SEOMoz crawl report
Is there any rhyme or reason to the order of urls in the SEOMoz crawl report, or are the urls just listed in random order?
Moz Pro | | LynnMarie0 -
Do I have to set up new SEOmoz campaigns after URL switch?
We switched hundreds of pages on our website from dynamic to static URLs (and optimized the static URLs for keywords), and did 301 redirects to the new URLs. Also submitted a new sitemap to Google. This was about a week ago. For my existing SEOmoz campaigns that have crawled since then, it looks like SEOmoz is still looking at all of the old dynamic URLs. Do I have to set up new campaigns with the same keywords in order to get SEOmoz to look at the new URLs, or will the SEOmoz crawlers figure it out over time? Or am I doing something wrong? Thanks for your help!
Moz Pro | | sally580 -
Why does SEOMoz crawler ignore robots.txt?
The SEOMoz crawler ignores robots.txt It also "indexes" pages marked as noindex. That means it is filling up the reports with things that don't matter. Is there any way to stop it doing that?
Moz Pro | | loopyal0 -
Some thoughts on MozTrust based on OSE Findings X ref'd with SERPS
I've been doing a bit of competitor analysis for a client using OSE. There are a group of about 4 websites (our clients website included) that all dominate the sector with none of the 4 clearly out in front (call this GROUP A). Then there are another group of about 5 websites, which come lower in the SE's consistently than the top 4 (Call this GROUP B) **I've been doing some analysis in OSE: ** ALL GROUP B Websites outrank all of GROUP A websites in the OSE Metrics (Including Trust Rank). I did some analysis on the backlinks in Group A VS Group B Group A - Generally a mixture of ok links from blog posts, sponsorship, and ok directories. Group B - As A, but with fewer numbers of links from quality blogs PLUS A high level of spammy links ( .edu and .gov spam filled pages), very low quality, almost non legible blog posts on MFA sites (think Digital Point sellers). From the above it is clear that the OSE metrics are out of whack with the real SE results. Clearly OSE has a few problems with working out what are spammy links and what are decent. Obviously google also has issues with working this out, so I am not surprised that OSE also does - but that doesn't solve the issue. This is a general discussion - so I would just throw in a few thoughts on how OSE may possibly try are overcome some of these issues : 1/. % Trust Links vs Non trust Links:
Moz Pro | | James77
Add in a metric to Trust Rank where the number of links close to trusted sites are also compared to the number of links not close to trusted sites. If you see a very high ratio of links from sites that are not close to trusted sites, it is a strong indicator of spammy links. 2/. Use seed "Non Trusted" sites to create a negative Trust Rank
Use something like a reverse of the "trusted sites" theory, but taking a load of very clearly spammy / link manipulative sites and work out in terms of links connections how far the site is away from these sites. Thoughts???0 -
We were unable to grade that page. We received a response code of 301\. URL content not parseable
I am using seomoz webapp tool for my SEO on my site. I have run into this issue. Please see the attached file as it has the screen scrape of the error. I am running an on page scan from seomoz for the following url: http://www.racquetsource.com/squash-racquets-s/95.htm When I run the scan I receive the following error: We were unable to grade that page. We received a response code of 301. URL content not parseable. This page had worked previously. I have tried to verify my 301 redirects and am unable to resolve this error. I can perform other on page scans and they work fine. Is this a known problem with this tool? I have verified ensuring I don't have it defined. Any help would be appreciated.
Moz Pro | | GeoffBatterham0