Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I set up a disallow in the robots.txt for catalog search results?
-
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
-
One step at a time = long term success. I wish you the best with it Jordan.
-
Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.
-
Totally agree with Alan, it can cause circular navigation problems for crawlers too.
-
Jordan,
Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.
I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).
I'd also suggest you have a big push ahead in dealing with near-duplicate content.
For example:
http://www.durafaucet.com/mk850-orb.html
http://www.durafaucet.com/kitchen-faucets/mk850.html
Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.
And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).
So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
Duplicate title while setting canonical tag.
Hi Moz Fan, My websites - https://finance.rabbit.co.th/ has run financial service, So our main keywords is about "Insurance" in Thai, But today I have an issues regarding to carnonical tag. We have a link that containing by https://finance.rabbit.co.th/car-insurance?showForm=1&brand_id=9&model_id=18&car_submodel_id=30&ci_source_id=rabbit.co.th&car_year=2014 and setting canonical to this url - https://finance.rabbit.co.th/car-insurance within 5,000 items. But in this case I have an warning by site audit tools as Duplicate Page Title (Canonical), So is that possible to drop our ranking. What should we do, setting No-Index, No-Follow for all URL that begin with ? or keep them like that.
Technical SEO | | ASKHANUMANTHAILAND0 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
How to set up internal linking with subcategories?
I'm building a new website and am setting up internal link structure with subcategories and hoping to do so with best Seo practices in mind. When linking to a subcategory's main page, would I make the internal link www.xxx.com/fishing/ or www.xxx.com/fishing/index.html or does it matter? I'm just trying to avoid duplicate content I guess, if Google saw each page as a separate page. Any other cautions when using subdirectories in my navigation?
Technical SEO | | wplodge0 -
Why are Google search results different if you are log'd into Google or not?
I get different results when I'm log'd into my Google account associated with my website than if I'm not. The same country is occurring. So how can I rely on the google results I'm seeing? For instance my site is page 1 with the improvements I made based on SEOMOZ if I'm log'd in. Yet I'm not on the first 25 pages if I'm not logged in.
Technical SEO | | Romana0 -
How to handle (internal) search result pages?
Hi Mozers, I'm not quite sure what the best way is to handle internal search pages. In this case it's for an ecommerce website with about 8.000+ products and search pages currently look like: example.com/search.php?search=QUERY+HERE. I'm leaning towards making them follow, noindex. Since pages like this can be easily abused for duplicate content and because I'd rather have the category pages ranked. How would you handle this?
Technical SEO | | Qon0 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0 -
Different Results in Chrome, Firefox and IE?
I clear the cache and log out from any accounts and I still get different results for the same keyword if I use different browsers. Any idea whats going on? And which browser would have my true ranking?
Technical SEO | | musillawfirm0