Restricted by robots.txt and soft bounce issues (related).
-
In our web master tools we have 35K (ish) URLs that are restricted by robots.txt and as have 1200(ish) soft 404s. WE can't seem to figure out how to properly resolve these URLs so that they no longer show up this way. Our traffic from SEO has taken a major hit over the last 2 weeks because of this.
Any help?
Thanks, Libby
-
**These are duplicate URLs that we can't figure out how they are getting created. **
I want to be sure we are talking about the same thing here. When I hear "duplicate URL" I am thinking of multiple URLs which point to the same web page. Depending on how your site is set up it is possible to have many different URLs point to the same web page. Possible examples are:
www.mydomain.com/tennis-rackets
www.mydomain.com/tennis-rackets/
mydomain.com/tennis-rackets?sort=asc
Above are three examples of URLs which can all lead to the same page. You can have dozens of URLs all lead to a page with identical content. How these issues get resolved depends upon how they were created.
The best tool to help you figure this out is your crawl report. Use the SEOmoz crawl tool, then examine the crawl report. It can be a bit overwhelming at first, but you can narrow things down real fast if you use Excel.
Select the header row for your data (begins with the URL field), then select Data > Filter > Auto Filter from the menu. Then start by looking at fields such as "Duplicate Page Content", "URLs with duplicate content", etc. Simply choose YES in the drop down menu to filter for that particular data. This will help you uncover the source of these issues.
The URLs in my example should all be 301'd or canonicalized to the primary page to resolve the duplication issue.
-
Well, part of the problem is these are duplicate URLs that we can't figure out how they are getting created. They were supposed to resolve to our 404 page... Should we remove them all?
-
Hi Libby.
How do you intend to resolve these URLs? Ideally you would remove your robots.txt entries and restrict the pages with meta tags such as "noindex follow" or whatever is appropriate. Any links to 404 pages should be updated or removed.
What further direction do you seek?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding your sitemap to robots.txt
Hi everyone, Best practice question: When adding your sitemap to your robots.txt file, do you add the whole sitemap at once or do you add different subcategories (products, posts, categories,..) separately? I'm very curious to hear your thoughts!
Technical SEO | | WeAreDigital_BE0 -
Issue Duplicate Page Title
I'm having some really strange issues with duplicate page titles and I can't seem to figure out what's going on. I just got a new crawl from SEOMOZ and it's showing some duplicate page titles. http://www.example.com/blog/ http://www.example.com/blog/page/2/ http://www.example.com/blog/page/3/ Repeat .............. I have no idea what's going on, how these were duplicated, or how to correct it. Does anyone have a chance to take a look and see if you can figure out what's happening and what I need to do to correct the errors? I'm using Wordpress and all in one SEO plugin. Thanks so much!
Technical SEO | | KLLC0 -
Rankings Issues
website: trophycentral.com Back in February of this year, we took a huge rankings hit on Google. We thought it might be Panda related, but realized it was before the big change. We also thought it could be related to our Mobile rollout and a wholesale system we put in place, but were not sure. Most pages were fine, but our home page was hit hard and taken off line, included for branded terms (although the branded terms came back in a couple of weeks). There were no manual actions against us. We made a ton of changes with the mobile site, crawl issues, reducing links, reviewing and removing potentially bad outside links, etc. After a few months (long time!) almost all of our keywords came back except one, trophies. We used to rank in the top few positions and now are not in the top 500. We believe we are being penalized, but have no idea why this one keyword is being impacted. It is obviously a huge one for us and we want to get it back. Does anyone have any ideas as to why Trophies is not ranking on our home page. We are grateful that we have many other words and phrases in the top 5 positions (not pages) of google, yahoo and bing, but on Google, we cant seem to get trophies to even show up! Please help!!! Thanks!!!
Technical SEO | | trophycentraltrophiesandawards0 -
Facebook Like button issue
In looking through my top pages in Google Analytics, my #2 page (oddly enough) looked like this "/?fb_xd_fragment=". Apparently, this is because we added the Facebook Like button to many of our pages. But I'm worried these show very skewed PageView data and lower Time Spent on each page. The average time on this page is 5 seconds whereas the average sitewide time is much higher. Further, it shows 9,000 pageviews coming from only 250 Unique Visitors. I'm sure this is messing with our SEO. Is there a fix for this? Should I even be worried about it? I heard that I can remove it from my GA stat reporting, but I don't want it to be causing problems in the background. Please advise..my boss wants to keep the Facebook Like button the pages as it has brought us some good response. The page that this is on is: www.accupos.com Maybe there's an alternate version of the Facebook Like that we don't know about... I would appreciate any help on this DM
Technical SEO | | DerekM880 -
NEED HELP ASAP: SERVER ISSUE
Hey guys, Some of you may be aware of our story. We have a website about or son who was born with Down syndrome. Two days a go a post I wrote went sort of viral, and I woke up this morning to an email from my host saying they had to take my site down as an emergency because of the amount of resources it is using. So now my site is down (noahsdad.com.) ...any ideas how to proceeded? I really need to get my site back online asap. Thank you.
Technical SEO | | NoahsDad0 -
Robots.txt file question? NEver seen this command before
Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay0 -
Robots.txt
Hi everyone, I just want to check something. If you have this entered into your robots.txt file: User-agent: *
Technical SEO | | PeterM22
Disallow: /fred/ This wouldn't block /fred-review/ from being crawled would it? Thanks0