/$1 URL Showing Up
-
Whenever I crawl my site with any kind of bot or a sitemap generator over my site. it comes up with /$1 version of my URLs. For example:
It gives me hdiconference.com & hdiconference.com/$1 and hdiconference.com/purchases & hdiconference.com/purchases/$1
Then I get warnings saying that it's duplicate content. Here's the problem: I can't find these /$1 URLs anywhere. Even when I type them in, I get a 404 error. I don't know what they are, where they came from, and I can't find them when I scour my code.
So, I'm trying to figure out where the crawlers are picking this up. Where are these things? If sitemap generators and other site crawlers are seeing them, I have to assume that Googlebot is seeing them as well.
Any help? My developers are at a loss as well.
-
Perfect. Thanks for the help, guys!
-
If you can't find them, you could put a disallow in your robots.txt files to keep them from being crawled.
-
I had a similar issue and found it was due to (in the case of a MozPro crawl at least) the bot crawling a JS command in the head. One of the commands included an anchor tag that was being read as a link rather than in context of the java script command. Check your JS files/scripts. It might be in there somewhere.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google showing https:// page in search results but directing to http:// page
We're a bit confused as to why Google shows a secure page https:// URL in the results for some of our pages. This includes our homepage. But when you click through it isn't taking you to the https:// page, just the normal unsecured page. This isn't happening for all of our results, most of our deeper content results are not showing as https://. I thought this might have something to do with Google conducting searches behind secure pages now, but this problem doesn't seem to affect other sites and our competitors. Any ideas as to why this is happening and how we get around it?
Technical SEO | | amiraicaew0 -
Why is Google showing sitelinks for 1 of our keywords, but not the other which is very similar?
For the term "Corazonas Foods" Google displays the sitelinks in the SERP listing. But for the term "Corazonas" it does not. The second term, Corazonas (our brand name), is not a generic term for anything so why wouldn't Google do the same for both terms?
Technical SEO | | getwilder20 -
/~username
Hello, The utility on this site that crawls your site and highlights what it sees as potential problems reported an issue with /~username access seeing it as duplicate content i.e. mydomain.com/file.htm is the same as mydomain.com~/username/file.htm so I went to my server hosts and they disabled it using mod_userdir but GWT now gives loads of 404 errors. Have I gone about this the wrong way or was it not really a problem in the first place or have I fixed something that wasn't broken and made things worse? Thanks, Ian
Technical SEO | | jwdl0 -
Penalized by Penguin 1.1 Release
I just realized that my blog, (www.onbetterterms.com) just recently got blacklisted. That blog actually redirects to www.springcoin.com/blog On May 25th, Google released the Penguin 1.1 update. After looking at my rankings, I found out that all my keywords positions were dropped. I have no idea why, and don't know how to fix this. Any help would be greatly appreciated.
Technical SEO | | kevinyu10290 -
Canonical URLs and screen scraping
So a little question here. I was looking into a module to help implement canonical URLs on a certain CMS and I came a cross a snarky comment about relative vs. absolute URLs being used. This person was insistent that relative URLs are fine and absolute URLs are only for people who don't know what they are doing. My question is, if using relative URLs, doesn't it make it easier to have your content scraped? After all, if you do get your content scraped at least it would point back to your site if using absolute URLs, right? Am I missing something or is my thinking OK on this? Any feedback is much appreciated!
Technical SEO | | friendlymachine0 -
Is there any value to a home page URL adding the /index.html ?
For proper SEO, which version would you prefer? A. www.abccompany.com B. www.abccompany.com/index.html Is there any value or difference with either home page URL??
Technical SEO | | theideapeople0 -
Does anyone see benefit in .com/en vs .com/uk for a UK site?
The client is already on /en and in my opinion there is not much to be gained by switching to /uk
Technical SEO | | Red_Mud_Rookie0