Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO on-demand crawl
what happened to the on-demand crawl you could do in PRO when they switched to the new MOZ site?
Moz Pro | | Vertz-Marketing0 -
1 page crawled ... and other errors
1. Why is only one (1) page crawled every second time you crawl my site? 2. Why do your bot not obey the rules specified in the robots.txt? 3. Why does your site constantly loose connection to my facebook account/page? This means that when ever i want to compare performance i need to re-authorize, and therefor can not see any data until next time. Next time i also need to re-authorize ... 4. Why cant i add a competitor twitter account? What ever i type i get an "uh oh account cannot be tracked" - and if i randomly succeed, the account added never shows up with any data. It has been like this for ages. If have reported these issues over and over again. We are part of a large scandinavian company represented by Denmark, Sweden, Norway and Finland. The companies are also part of a larger worldwide company spreading across England, Ireland, Continental Europe and Northern Europe. I count at least 10 accounts on Seomoz.org We, the Northern Europe (4 accounts) are now reconsidering our membership at seomoz.org. We have recently expanded our efforts and established a SEO-community in the larger scale businees spanning all our countries. Also in this community we are now discussing the quality of your services. We'll be meeting next time at 27-28th of june in London. I hope i can bring some answers that clarify the problem we have seen here on seomoz.org. As i have written before: I love your setup and you tools - when they work. Regretebly, that is only occasionally the case!
Moz Pro | | alsvik1 -
Why does Crawl Diagnostics report this as duplicate content?
Hi guys, we've been addressing a duplicate content problem on our site over the past few weeks. Lately, we've implemented rel canonical tags in various parts of our ecommerce store, over time, and observing the effects by both tracking changes in SEOMoz and Websmater tools. Although our duplicate content errors are definitely decreasing, I can't help but wonder why some URLs are still being flagged with duplicate content by our SEOmoz crawler. Here's an example, taken directly from our Crawl Diagnostics Report: URL with 4 Duplicate Content errors:
Moz Pro | | yacpro13
/safety-lights.html Duplicate content URLs:
/safety-lights.html ?cat=78&price=-100
/safety-lights.html?cat=78&dir=desc&order=position /safety-lights.html?cat=78 /safety-lights.html?manufacturer=514 What I don't understand, is all of the URLS with URL parameters have a rel canonical tag pointing to the 'real' URL
/safety-lights.html So why is SEOMoz crawler still flagging this as duplicate content?0 -
Crawl Diagnostics Shows thousands of 302's from a single url. I'm confused
Hi guys I just ran my first campaign and the crawl diagnostics are showing some results I'm unfamiliar with.. In the warnings section it shows 2,838 redirects.. this is where I want to focus. When I click here it shows 5 redirects per page. When I go to click on page 2, or next page, or any other page than page 1 for that matter... this is where things get confusing. Nothing shows. Downloading the csv reveals that 2,834 of these are all showing: URL: http://www.mydomain.com/401/login.php url: http://www.mydomain.com/401/login.php referrer: http://www.mydomain.com/401/login.php location_header: http://www.mydomain.com/401/login.php I guess I'm just looking for an explanation as to why it's showing so many to the same page and what possible actions can be taken on my part to correct it (if needed). Thanks in advance
Moz Pro | | sethwb0 -
Dead links-urls
What is the quickest way to get Google to clean up dead
Moz Pro | | 1step2heaven
link? I have 74,000 dead links reported back, i have added a robot txt to
disallow and added on Google list remove from my webmaster tool 4 months ago.
The same dead links also show on the open site explores. Thanks0 -
In Site Explorer My Blog.URL.com Shows "No Data Available for this URL"
Why when I use http://www.opensiteexplorer.org and I'm researching our Blog.URL.com's does the tool say "No Data Available for this URL"? Example: http://www.opensiteexplorer.org/links?site=blog.centurypayments.com
Moz Pro | | cfield_splashmedia.com0 -
How long before new links show up in OSE?
Hi - How long should it be before a brand new external link to our site shows up in OSE?
Moz Pro | | tcolling0 -
URLs getting re-directed to double http:// URLs
The "Notices" section under "Crawl Diagnostics" shows that there are 435 issues on my website. I checked out a few URLs to verify this issue and found that most of these pages are working perfectly. For instance, the above mentioned report shows that http://policycomplaints.com/about redirects to http://http://policycomplaints.com/about/ . Then, http://policycomplaints.com/aegon-religare/mis-selling-of-policy-by-aegon-religare/ redirects to http://http://policycomplaints.com/aegon-religare/mis-selling-of-policy-by-aegon-religare/ . However, when I open these pages, they seem to be working perfectly. I didn't find them getting re-directed to somewhere else. So, as per the report, it seems that all of these 435 http://URLs are getting re-directed to http://http://URL versions which in reality is not true because all the http://URLs are working perfectly. So, is this a problem with SEOmoz software? If not, what is the reason for these issues and how can I adddress them. Do notify if any further information is required for the same. Thanks. bNiEm.png
Moz Pro | | unknownID10