Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Having 1 page crawl error on 2 sites
Help! A few weeks back, my dev team did some "changes" (that I don't know anything about), but ever since then, my Moz crawl has only shown one page for either http://betamerica.com or http://fanex.com. Moz service was helpful in talking about a redirect loop that existed, and I asked my team to fix it, which it looks to me like they have. Still, 1 page. I used SEO Book's spider tool and it also only sees 1 page, and sees the sites as http://https://betamerica.com (for example), which is just weird. I don't know enough about HT Access or server stuff to figure out what's going on, so if someone can help me figure that out, I'd appreciate it.
Moz Pro | | BetAmerica0 -
OK Crawl Test Link Question Again!
I've downloaded a crawl test and column G Link Count reads 62 and yep there are a total of 62 links on the page in question. Column AM Internal Links reads 303 and yep there are somewhere in the order of 303 pages pointing at this one. Root Domains is surprisingly low at 6, so maybe there are only 6 domains linking to this page. BUT... External Links read 51. There are not 51 links pointing away from this domain on this page, no way hozay, so can anybody tell me what is meant by 'External Links? A humble thank you in anticipation of an education. Jem
Moz Pro | | JemRobinson0 -
Crawl Diagnostics - unexpected results
I received my first Crawl Diagnostics report last night on my dynamic ecommerce site. It showed errors on generated URLs which simply are not produced anywhere when running on my live site. Only when running on my local development server. It appears that the Crawler doesn't think that it's running on the live site. For example http://www.nordichouse.co.uk/candlestick-centrepiece-p-1140.html will go to a Product Not Found page, and therefore Duplicate Content errors are produced. Running http://www.nhlocal.co.uk/candlestick-centrepiece-p-1140.html produces the correct product page and not a Product Not Found page Any thoughts?
Moz Pro | | nordichouse0 -
Why is it that certain keywords in my seomoz report card are for the wrong urls
Hi Guys, why is it that seomoz's On Page Optimization Reports for Google TH are attributing certain keywords with certain urls which are wrong? What mean is an example keyword - 'chiang mai villas for rent' has been scored an F against my home page url rather than using our 'Chiang Mai' url, why is this, is there a coding issue on my site? Is it that seomoz is finding something on my home page to suggest I want it to rank for this keyword?
Moz Pro | | ewanTHH0 -
Confusion about how SEOMOZ crawler works...
So according to my SEOMOZ dashboard, I'm ranking between #3-4 for one of my keywords. My keyword is 'Boston Wedding Photographer'. My site is http://www.symbolphoto.com I show up in google places, true. But i was wanting to rank organically. Am i right in the assumption that Google Places and Google Organic are not the same thing? SEOMOZ claims 3,4th but not organically(Assuming they aren't the same thing) I get pretty good traffic right now being in Places, but i can't help but feel that organically ranking would bring more traffic. Any suggestions or advice is greatly appreciated. TIA! -Brendan
Moz Pro | | symbolphoto0 -
Crawling a website with redirects
Hi, I started a campaign for a website which uses multiple redirects before showing the real content. in the crawling report only one page is crawled. Is there a way to let the crawler pass the redirects to get usefull reports? The website is www.cegeka.be Thank you
Moz Pro | | Cegeka0 -
Schedule crawls for 2 subdomains every 24 hours
I saw at this link: http://pro.seomoz.org/tools/crawl-test "As a PRO member, you can schedule crawls for 2 subdomains every 24 hours, and you'll get up to 3,000 pages crawled per subdomain." However I am having trouble finding where to schedule this 24 hour crawl in my Pro Dashboard. I did not see the option for this setting in the crawl diagnostics tab or in the campaign settings section from the dashboard home page. Can you help? thanks! Michael
Moz Pro | | texmeix0 -
SEOmoz crawl diagnostics report - what are the duplicate pages urls?
I just see the number of duplicates but not what the urls of the duplicates are? I don't see it in the export either, but maybe I'm missing it Cheers S
Moz Pro | | firstconversion0