Crawl reveals hundreds of urls with multiple urls in the url string
-
The latest crawl of my site revealed hundreds of duplicate page content and duplicate page title errors. When I looked it was from a large number of urls with urls appended to them at the end.
For example:
http://www.test-site.com/page1.html/page14.html
or
http://www.test-site.com/page4.html/page12.html/page16.html
some of them go on for a hundred characters.
I am totally stymied, as are the people at my ISP and the person who talked to me on the phone from SEOMoz.
Does anyone know what's going on?
Thanks So much for any help you can offer!
Jean
-
I couldn't find exactly your exact problem mentioned above, but I downloaded XENU and crawled your site and I strongly suggest you hire a SEO expert to help clean your site.
You have URLS like this:
and thousands of other strange and duplicate URL's
-
-
Please share your site and I will help analyze.
-
I forgot to say - this is the SEOMoz crawl.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved URL dynamic structure issue for new global site where I will redirect multiple well-working sites.
Dear all, We are working on a new platform called [https://www.piktalent.com](link url), were basically we aim to redirect many smaller sites we have with quite a lot of SEO traffic related to internships. Our previous sites are some like www.spain-internship.com, www.europe-internship.com and other similars we have (around 9). Our idea is to smoothly redirect a bit by a bit many of the sites to this new platform which is a custom made site in python and node, much more scalable and willing to develop app, etc etc etc...to become a bigger platform. For the new site, we decided to create 3 areas for the main content: piktalent.com/opportunities (all the vacancies) , piktalent.com/internships and piktalent.com/jobs so we can categorize the different types of pages and things we have and under opportunities we have all the vacancies. The problem comes with the site when we generate the diferent static landings and dynamic searches. We have static landing pages generated like www.piktalent.com/internships/madrid but dynamically it also generates www.piktalent.com/opportunities?search=madrid. Also, most of the searches will generate that type of urls, not following the structure of Domain name / type of vacancy/ city / name of the vacancy following the dynamic search structure. I have been thinking 2 potential solutions for this, either applying canonicals, or adding the suffix in webmasters as non index.... but... What do you think is the right approach for this? I am worried about potential duplicate content and conflicts between static content dynamic one. My CTO insists that the dynamic has to be like that but.... I am not 100% sure. Someone can provide input on this? Is there a way to block the dynamic urls generated? Someone with a similar experience? Regards,
Technical SEO | | Jose_jimenez0 -
Capitals URLs to Non Capitals...
Hi, I am working on a website which has capital urls and non capital urls which will be generating duplicate content, and I know it is better to use all lower case. The problem is that the page authority is better for the capital versions and I was wondering will it negatively impact the SEO of we 301 redirect the uppercase urls to the lowercase counterparts? Thanks.
Technical SEO | | J_Sinclair0 -
Should the date be included in news URLs
My website is not a news or magazine site, but we do have a news section updated 2-3 times a week with industry related news. We are working on a new structure for the URLs.
Technical SEO | | theLotter
Should the date be included in the URL? From this article from Google I understand that as long as we submit a news sitemap it doesnt matter whether or not numbers are included in the URL, correct? https://support.google.com/news/publisher/answer/68323?topic=116650 -
%20 URL accessible, does this matter?
I have a rewrite on the CMS I work on. What happens here is that if someone creates a page on the website and uses spaces as the name then the CMS automatically replaces the spaces with -'s. I noticed this morning that the %20 URLs are accessible but not indexed at all. Only the - URLs are indexed. could this cause duplicate content or penalties? I know best practice is to have only ONE URL for a page but somehow the developer can't redirect the %20 URLs to the - URLs. Opinions?
Technical SEO | | DROIDSTERS0 -
GWT, URL Parameters, and Magento
I'm getting into the URL parameters in Google Webmaster Tools and I was just wondering if anyone that uses Magento has used this functionality to make sure filter pages aren't being indexed. Basically, I know what the different parameters (manufacturer, price, etc.) are doing to the content - narrowing. I was just wondering what you choose after you tell Google what the parameter's function is. For narrowing, it gives the following options: Which URLs with this parameter should Googlebot crawl? <label for="cup-crawl-LET_GOOGLEBOT_DECIDE">Let Googlebot decide</label> (Default) <label for="cup-crawl-EVERY_URL">Every URL</label> (the page content changes for each value) <label style="color: #5e5e5e;" for="cup-crawl-ONLY_URLS_WITH_VALUE">Only URLs with value</label> ▼(may hide content from Googlebot) <label for="cup-crawl-NO_URLS">No URLs</label> I'm not sure which one I want. Something tells me probably "No URLs", as this content isn't something a user will see unless they filter the results (and, therefore, should not come through on a search to this page). However, the page content does change for each value.I want to make sure I don't exclude the wrong thing and end up with a bunch of pages disappearing from Google.Any help with this is greatly appreciated!
Technical SEO | | Marketing.SCG0 -
/out/ URLs in GWMTs
I am recently seeing some URLs come up as 404s in GWMTs for a client. They look like this: http://client-url/out/www.linkedin.com/company/client-linkedin-name /out/client-url/sub-directory/postname/ We thought they might have something to do with the social plugins but they are all over the place and they are sometime for internal pages on the site. Anyone run into these and know why they are happening?
Technical SEO | | DragonSearch0 -
Automatic redirect to external urls
Hi all, I'm developing a dynamic qr code service.. The service works in the following way: You create an account with an associated QR CODE pointing to a url like:
Technical SEO | | raulo79
- http://domain.me/username The user can change the target of this url.. he can:
- point to an external url ( his website for example)
- point to a vCard download page
- a mobile ready webpage ( no redirection in this case)... Visiting http://domain.me/username my company logo is displayed and we redirect the visitor with a: header("Refresh: 5;URL=http://userdomain.tld"); Google is indexing many user's URLs, this is good for those users pointing to the mobile ready webpage, in this case there is no redirection, but Google is indexing many urls that redirect to an external url and I don't know how to avoid this.. I can't do an header('Location: http://www.example.com/'); because I need to display our logo after redirection.. how can I do google friendly? Sorry for my english, I hope you can undestand the problem. Best regards.
Mauro.0 -
How to increase the crawl rate?
hello, Our site was hosted in North America and Google was crawling it reasonably fast. Since our traffic is mostly from India we moved it to India, now the crawling is terribly slow from Google. Is there anyway to fix the crawl rate(we have increased the crawl rate in GWT)
Technical SEO | | greyniumseo0