Restricted by robots.txt and soft bounce issues (related).
-
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this.
Any help?
Thanks, Libby
-
**These are duplicate URLs that we can't figure out how they are getting created. **
I want to be sure we are talking about the same thing here. When I hear "duplicate URL" I am thinking of multiple URLs which point to the same web page. Depending on how your site is set up it is possible to have many different URLs point to the same web page. Possible examples are:
www.mydomain.com/tennis-rackets
www.mydomain.com/tennis-rackets/
mydomain.com/tennis-rackets?sort=asc
Above are three examples of URLs which can all lead to the same page. You can have dozens of URLs all lead to a page with identical content. How these issues get resolved depends upon how they were created.
The best tool to help you figure this out is your crawl report. Use the SEOmoz crawl tool, then examine the crawl report. It can be a bit overwhelming at first, but you can narrow things down real fast if you use Excel.
Select the header row for your data (begins with the URL field), then select Data > Filter > Auto Filter from the menu. Then start by looking at fields such as "Duplicate Page Content", "URLs with duplicate content", etc. Simply choose YES in the drop down menu to filter for that particular data. This will help you uncover the source of these issues.
The URLs in my example should all be 301'd or canonicalized to the primary page to resolve the duplication issue.
-
Well, part of the problem is these are duplicate URLs that we can't figure out how they are getting created. They were supposed to resolve to our 404 page... Should we remove them all?
-
Hi Libby.
How do you intend to resolve these URLs? Ideally you would remove your robots.txt entries and restrict the pages with meta tags such as "noindex follow" or whatever is appropriate. Any links to 404 pages should be updated or removed.
What further direction do you seek?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 and 200 Status Issues
Hi, Moz has highlighted that we have duplicate page content on our site, displaying the following: http://bmiresearch.com/press 200 status code and http://www.bmiresearch.com/press 200 status code We have setup a 301 redirect rule on http://bmiresearch.com/press to permanently redirect to http://www.bmiresearch.com/press and on Google inspect element network it shows this http://bmiresearch.com/press 301 status code which mean redirect to this URL permanently http://www.bmiresearch.com/press 200 status code Does anyone know why this might be occuring? Is it possible that because Google has index both URL http://www.bmiresearch.com/press and http://bmiresearch.com/press with 200 status code? If so how would we correct this? Thanks
Technical SEO | | carlsutherland0 -
Soft 404 errors
Google webmaster tools is telling me I have 8 "soft 404's". They are all like this page...
Technical SEO | | sdwellers
http://www.seadwellers.com/search/page/8/ All 8 pages are the same except the number at the end...... I just can't figure this....any insight at all is appreciated and do i need to correct somehow?0 -
Robots.txt - "File does not appear to be valid"
Good afternoon Mozzers! I've got a weird problem with one of the sites I'm dealing with. For some reason, one of the developers changed the robots.txt file to disavow every site on the page - not a wise move! To rectify this, we uploaded the new robots.txt file to the domain's root as per Webmaster Tool's instructions. The live file is: User-agent: * (http://www.savistobathrooms.co.uk/robots.txt) I've submitted the new file in Webmaster Tools and it's pulling it through correctly in the editor. However, Webmaster Tools is not happy with it, for some reason. I've attached an image of the error. Does anyone have any ideas? I'm managing another site with the exact same robots.txt file and there are no issues. Cheers, Lewis FNcK2YQ
Technical SEO | | PeaSoupDigital0 -
Why is robots.txt blocking URL's in sitemap?
Hi Folks, Any ideas why Google Webmaster Tools is indicating that my robots.txt is blocking URL's linked in my sitemap.xml, when in fact it isn't? I have checked the current robots.txt declarations and they are fine and I've also tested it in the 'robots.txt Tester' tool, which indicates for the URL's it's suggesting are blocked in the sitemap, in fact work fine. Is this a temporary issue that will be resolved over a few days or should I be concerned. I have recently removed the declaration from the robots.txt that would have been blocking them and then uploaded a new updated sitemap.xml. I'm assuming this issue is due to some sort of crossover. Thanks Gaz
Technical SEO | | PurpleGriffon0 -
Yahoo Local SERPs Index Issue
I recently updated all my website page Titles and was checking to see how many have been crawled so far. On Yahoo/Bing I noticed something very strange when entering site:bandpages.ie in the search field. Selection Buttons (top of SERPs): 'Web' search shows all my pages indexed 'UK' has has most pages 'Only In Ireland' has just 1 page indexed - which is the site RSS Feed and nothing else! The site has been live for 2 years now. Considering we don't trade with the UK and our main focus is here in Ireland - what is going wrong? Why doesn't Yahoo/Bing index list the site pages in the Ireland index? Any insights or solutions appreciated...
Technical SEO | | Ubique0 -
Robots.txt
Hello Everyone, The problem I'm having is not knowing where to have the robots.txt file on our server. We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes... Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file? Thanks for your insight on this matter.
Technical SEO | | BailHotline0 -
Duplicate Page Issue
Dear All, I am facing stupid duplicate page issue, My whole site is in dynamic script and all the URLs were in dynamic, So i 've asked my programmer make the URLs user friendly using URL Rewrite, but he converted aspx pages to htm. And the whole mess begun. Now we have 3 different URLs for single page. Such as: http://www.site.com/CityTour.aspx?nodeid=4&type=4&id=47&order=0&pagesize=4&pagenum=4&val=Multi-Day+City+Tours http://www.tsite.com/CityTour.aspx?nodeid=4&type=4&id=47&order=0&pagesize=4&pagenum=4&val=multi-day-city-tours http://www.site.com/city-tour/multi-day-city-tours/page4-0.htm I think my programmer messed up the URL Rewrite in ASP.net(Nginx) or even didn't use it. So how do i overcome this problem? Should i add canonical tag in both dynamic URLs with pointing to pag4-0.htm. Will it help? Thanks!
Technical SEO | | DigitalJungle0 -
Confused about robots.txt
There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots. User-agent: * Disallow: javascript.js Disallow: /images/ Disallow: /embedconfig Disallow: /playerconfig Disallow: /spotlightmedia Disallow: /EventVideos Disallow: /playEpisode Allow: / Sitemap: http://www.example.tv/sitemapindex.xml Sitemap: http://www.example.tv/sitemapindex-videos.xml Sitemap: http://www.example.tv/news-sitemap.xml Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools! Help someone, anyone! Can't seem to understand this robotic business! Regards,
Technical SEO | | Netpace0