Restricted by robots.txt and soft bounce issues (related).

GristMarketing

In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this.

Any help?

Thanks, Libby

RyanKent

**These are duplicate URLs that we can't figure out how they are getting created. **

I want to be sure we are talking about the same thing here. When I hear "duplicate URL" I am thinking of multiple URLs which point to the same web page. Depending on how your site is set up it is possible to have many different URLs point to the same web page. Possible examples are:

www.mydomain.com/tennis-rackets

www.mydomain.com/tennis-rackets/

mydomain.com/tennis-rackets?sort=asc

Above are three examples of URLs which can all lead to the same page. You can have dozens of URLs all lead to a page with identical content. How these issues get resolved depends upon how they were created.

The best tool to help you figure this out is your crawl report. Use the SEOmoz crawl tool, then examine the crawl report. It can be a bit overwhelming at first, but you can narrow things down real fast if you use Excel.

Select the header row for your data (begins with the URL field), then select Data > Filter > Auto Filter from the menu. Then start by looking at fields such as "Duplicate Page Content", "URLs with duplicate content", etc. Simply choose YES in the drop down menu to filter for that particular data. This will help you uncover the source of these issues.

The URLs in my example should all be 301'd or canonicalized to the primary page to resolve the duplication issue.

GristMarketing

Well, part of the problem is these are duplicate URLs that we can't figure out how they are getting created. They were supposed to resolve to our 404 page... Should we remove them all?

RyanKent

Hi Libby.

How do you intend to resolve these URLs? Ideally you would remove your robots.txt entries and restrict the pages with meta tags such as "noindex follow" or whatever is appropriate. Any links to 404 pages should be updated or removed.

What further direction do you seek?

Explore more categories

Should you use robots.txt for pages within your site which do not have high quality content or are not contributing a great deal so when Google crawls your site the best performing content has a higher chance of being indexed?

PageSpeed Insights DNS Issue

Home page canonical issues

Http & https canonicalization issues

Duplicate Homepage issue

Duplicate Content Issue with

Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?

Canonical Issues with Wordpress

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Restricted by robots.txt and soft bounce issues (related).

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions