Restricted by robots.txt and soft bounce issues (related).
-
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this.
Any help?
Thanks, Libby
-
**These are duplicate URLs that we can't figure out how they are getting created. **
I want to be sure we are talking about the same thing here. When I hear "duplicate URL" I am thinking of multiple URLs which point to the same web page. Depending on how your site is set up it is possible to have many different URLs point to the same web page. Possible examples are:
www.mydomain.com/tennis-rackets
www.mydomain.com/tennis-rackets/
mydomain.com/tennis-rackets?sort=asc
Above are three examples of URLs which can all lead to the same page. You can have dozens of URLs all lead to a page with identical content. How these issues get resolved depends upon how they were created.
The best tool to help you figure this out is your crawl report. Use the SEOmoz crawl tool, then examine the crawl report. It can be a bit overwhelming at first, but you can narrow things down real fast if you use Excel.
Select the header row for your data (begins with the URL field), then select Data > Filter > Auto Filter from the menu. Then start by looking at fields such as "Duplicate Page Content", "URLs with duplicate content", etc. Simply choose YES in the drop down menu to filter for that particular data. This will help you uncover the source of these issues.
The URLs in my example should all be 301'd or canonicalized to the primary page to resolve the duplication issue.
-
Well, part of the problem is these are duplicate URLs that we can't figure out how they are getting created. They were supposed to resolve to our 404 page... Should we remove them all?
-
Hi Libby.
How do you intend to resolve these URLs? Ideally you would remove your robots.txt entries and restrict the pages with meta tags such as "noindex follow" or whatever is appropriate. Any links to 404 pages should be updated or removed.
What further direction do you seek?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
PageSpeed Insights DNS Issue
Hi Anyone else having problems with Google's Pagespeed tool? I am trying to benchmark a couple of my sites but, according to Google, my sites are not loading. They will work when I run them through the test at one point but if I try again, say 15 mins later, they will present the following error message An error has occured DNS error while resolving DOMAIN. Check the spelling of the host, and ensure that the page is accessible from the public Internet. You may refresh to try again. If the problem persists, please visit the PageSpeed Insights mailing list for support. This isn't too much an issue for testing page speed but am concerned that if Google is getting this error on the speed test it will also get the error when trying to crawl and index the pages. I can confirm the sites are up and running. I the sites are pointed at the server via A-records and haven't been changed for many weeks so cannot be a dns updating issue. Am at a loss to explain. Any advice would be most welcome. Thanks.
Technical SEO | | daedriccarl0 -
Home page canonical issues
Hi, I've noticed I can access/view a client's site's home page using the following URL variations - http://example.com/
Technical SEO | | simon-145328
http://example/index.html
http://www.example.com/
http://www.example.com/index.html There's been no preference set in Google WMT but Google has indexed and features this URL - http://example.com/ However, just to complicate matters, the vast majority of external links point to the 'www' version. Obviously i would like to tidy this up and have asked the client's web development company if they can place 301 redirects on the domains we no longer want to work - I received this reply but I'm not sure whether this does take care of the duplicate issue - Understand what you're saying, but this shouldn't be an issue regarding SEO. Essentially all the domains listed are linking to the same index.html page hosted at 1 location My question is, do i need to place 301 redirects on the domains we don't want to work and do i stick with the 'non www' version Google has indexed and try to change the external links so they point to the 'non www' version or go with the 'www' version and set this as the preferred domain in Google WMT? My technical knowledge in this area is limited so any help would be most appreciated. Regards,
Simon.0 -
Http & https canonicalization issues
Howdyho I'm SEOing a daily deals site that mostly runs on https Versions. (only the home page is on http). I'm wondering what to do for canonicalization. IMO it would be easiest to run all pages on https. But the scarce resources I find are not so clear. For instance, this Youmoz blog post claims that https is only for humans, not for bots! That doesn't really apply anymore, right?
Technical SEO | | zeepartner0 -
Duplicate Homepage issue
SEOMOZ says my site has two homepages: www.mysite.com www.mysite.com/ When you go to "www.mysite.com/" the URL changes to "www.mysite.com" Why is this happening and what can I do about it?
Technical SEO | | LucasF0 -
Duplicate Content Issue with
Hello fellow Moz'rs! I'll get straight to the point here - The issue, which is shown in the attached image, is that for every URL ending in /blog/category/name, it has a duplicate page of /blog/category/name/?p=contactus. Also, its worth nothing that the ?p=contact us are not in the SERPs but were crawled by SEOMoz and they are live and duplicate. We are using Pinnacle cart. Is there a way to just stop the crawlers from ?p=contactus or? Thank you all and happy rankings, James
Technical SEO | | JamesPiper0 -
Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?
Hello, I see hundreds of notices for blocked by meta-robots/meta robots nofollow and it appears it is linked to the comments on my site which I assume I would not want to be crawled. Is this the case and these notices are actually a positive thing? Please advise how to clear them up if these notices can be potentially harmful for my SEO. Thanks, Talia
Technical SEO | | M80Marketing0 -
Canonical Issues with Wordpress
Hi all, I have just started using Wordpress SEO by Yoast and still having a hard time correcting my Canonical issues for all posts with a .html at the end. The pluggin allows you to add a '/' to the end for canonical issues, but just for pages, not posts. How best in Wordpress to make my post change from .html/ to .html. I really don't want to go to the hassle to make each URL a new 301 redirect in my .htaccess. I hate the .html, but if they are going to stay, how can I make sure I get the .html/ link juice back to them. Many thanks!
Technical SEO | | RunningInTheRain0