Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ajax4SEO and rogerbot crawling
Has anyone had any experience with seo4ajax.com and moz? The idea is that it points a bot to a html version of an ajax page (sounds good) without the need for ugly urls. However, I don't know how this will work with rogerbot and whether moz can crawl this. There's a section to add in specific user agents and I've added "rogerbot". Does anyone know if this will work or not? Otherwise, it's going to create some complications. I can't currently check as the site is in development and the dev version is noindexed currently. Thanks!
Moz Pro | | LeahHutcheon0 -
Long URLs
My website is hosted by Hubspot. When I create a blog, the URL, as an example, would be: http://www.boxtheorygold.com/blog/bid/27061/Manage-By-the-Numbers/ Instead I am getting the URL below. Google Webmaster tools and moz see this as an error and google says it can't crawl because it is a non-existent page. Users cannot see this page, and Hubspot can't figure it out, but google and moz see it. This problem is occurring on about 25 blogs out of 150. Any ideas? And thanks. URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers
Moz Pro | | Rong0 -
Change Crawl and rank report day?
Does anyone know if there is a way to get all of my account's campaign's to get crawled and rank reports on the same day?
Moz Pro | | CDUBP0 -
Tools that crawl 2 million page sites
Our site is about 2million pages deep, 50% of which is stale content. Yes, I know - OMG #unhygienic. Even if we get approval to get rid of half of it. SEOMoz Pro Elite only crawls 20k deep - what can i do to crawl and diagnose the whole site. Are there any tools anyone can suggest. SEOMoz??
Moz Pro | | ilhaam0 -
Crawl reports, date/time error found
Hello! I need to filter out the crawl errors found before a certain date/time. I find the date and time the errors were discovered to be the same. It looks more like the time the report was generated. Fix?
Moz Pro | | AJPro0 -
Title Page Two Long Still shows
This should be an easy one. I signed up for this service in November. The first report showed many title pages too long etc. I fixed all known errors. On December 28 there was another crawl but the errors still show up on the report. Why? Is there anything I can do to update so I can get down to the few I missed
Moz Pro | | Wales1 -
My crawl diagnostic is showing 2 duplicate content and titles.
First of all Hi - My name is Jason and I've just joined - How you all doing? My 1st question then: When I view where these errors are occurring it says www mydomain co uk and www mydomain co uk/index.html Isn't this the same page? I have looked into my root folder and only index.html exists.
Moz Pro | | JasonHegarty0 -
"no urls with duplicate content to report"
Hi there, i am trying to clean up some duplicate content issues on a website. The crawl diagnostics says that one of the pages has 8 other URLS with the same content. When i click on the number "8" to see the pages with duplicate content, i get to a page that says "no urls with duplicate content to report". Why is this happening? How do i fix it?
Moz Pro | | fourthdimensioninc0