Big problem with my new crawl report
-
I am owner of small opencart online store. I installed http://www.opencart.com/index.php?route=extension/extension/info&extension_id=6182&filter_search=seo. Today my new crawl report is awful. The number of errors is up by 520 (30 before), up with 1000 (120 before), notices up with 8000 (1000 before). I noticed that the problem is with search. There is a lot duplicate content in search only. What to do ?
-
Thank you again Alan.
Typo fixed.
-
I use Bing search API,
By the way, you want to change from GET to POST, not the other way around.
-
Alan,
Thank you for the great advice. If one has enough control over the eCommerce system, or the internal site search product, to change from GET to POST so these pages act more like real dynamically generated "search pages" than an infinite amount of "landing pages" I think that is a fantastic solution. It would keep merchandisers and others from linking to those pages - because we all know that they will continue to do it even if the SEO pleads on hands and knees for them to stop.
However, I have found it to be the case that most eCommerce businesses (from small mom-n-pop shops to fortune 500 companies) do not have the ability to do this because the internal site search functionality they use is out of their hands. Site search vendors like Endeca and Celebros serving enterprise eCommerce businesses don't typically hand over the keys to the client.
If you know any site search vendors or solutions that allow one to do this it would make a great contribution to this thread if you could share a few of them. I'd definitely look into recommending them in the future!
Thanks again!
-
The problem with PR leaks is that they are scalable, If you are losing 10%, then you get some quality links, 10% of them will be wasted, every effort you do in the future will be discounted by 10%.
There are ways to fix all these problems, for example I would make a search to be POST and not GET so that links to search pages can not be made and therefor search pages will not get indexed.
We work so hard to get good links, why waste them when you do?
-
I have tried different methods to fix this. First-hand experience tells me that oftentimes it is better to just block the paths (assuming there is better navigation on the site) from being crawled or indexed using robots.txt than to use a noindex,follow tag in order to save the pagerank you're sending via internal links. It is very easy for Google to get bogged down crawling around in the internal search results area.
Unless there are lots of links to search pages from top pages on the site, or a big list of search page links from every page (sitewide footer, for example) I really don't think the waste of internal pagerank is noticeable in the rankings, or worth salvaging if it risks sending spiders into a maze or a trap.
Yes, best practice is not to link to pages that you are blocking. In the real world though, search pages can be very useful to visitors, and to merchandisers who don't have the ability to create more targeted sub-sub-sub categories will often use them, and link to them on the site, as landing pages for promotional purposes (emails, PPC, sales...).
Everyone has their own strategies, and all we can do is make recommendations based on our own experience and knowledge. Thanks for helping out with this question Alan. Feel free to elaborate so Anastas has more input to help guide his decision.
-
as long as no one is linking to the search pages including internal links.
-
Hello Anastas,
I agree that you should block the search folder from being indexed. I'm going to assume that nobody is linking to your search pages and that you have other paths (e.g. SEO-friendly navigation, sitemaps...) for search engines to use to access your products).
I don't understand why you have formatted the disallow statement that way, however. Unless I'm missing something (and could be since I don't know what your site is) you only need to do this:
Disallow: /product/search*
And of course after doing this you should test it in GWT to make sure that A: You are blocking the pages you want to block, such as search pages with lots of parameters, and B: You are NOT blocking other pages you don't want to block, such as product pages. Here is more info on where to find the testing tool in GWT if you don't know: http://productforums.google.com/forum/#!topic/webmasters/tbikAxJiIZ4
Let us know how it goes. Good luck.
-
Please I need help
-
I am using opencart. I dont know what to do. Before I had 50 errors, now they are more than 500 after this plug in. The plug in removed the previous errors, but now there are many different errors. I have 2 options:
1. Remove the plug in
2. Do something with new errors - the new errors are only because of search, I have dublicate page content because when you type PDODUCT NAME in search box, there is same content as www.mydomain.com/category1/PRODUCT NAME
Maybe this plug in removed the canonical urls in search or I dont know what.
In robots.txt there is row:
Disallow: /*?route=product/search
The duplicate content is mydomain.com/product/search&filter_tag=XXXXXX
Instead of XXXXX there are many paths.
I decided to add another row in robots.txt:
Disallow: /*?route=product/search&filter_tag=/
Do you thing it is correct or to remove the plug in?
I hope you understand what is the problem.
-
When you no index a page, any links pointing to those pages pour away link juice from you indexed pages. you should never no-index pages IMO
I assume you are using a CMS or some sort of plug in, this is a common cost when you do so. CMS create very untidy code, not good for SEO
-
The urls are: /product/search&filter_tag=%D0%B1%D0%B8%D0%B6%D1%83%D1%82%D0%B0
after = there are a lot of combinations. Is it correct to put this in robots.txt
Disallow: /*?route=product/search&filter_tag=/
-
Sholud I disallow search (in robots.txt)?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it better to put all your CSS in 1 file or is it no problem to use 10 files or more like on most frameworks?
Is it better to put all your CSS in 1 file or is it no problem to use 10 files or more like on most frameworks?
On-Page Optimization | | conversal0 -
New content cause for drop? Still showing old cache though...
About 10 days ago I added about 750 words of new text to a homepage as it was simply a sign up landing page and rather sparse. I presumed that by adding content it would help google understand the content of the site as it is a single page on the root domain (as soon as users sign up they go to a subdomain). Yesterday though the site recieved a huge drop in traffic and now recieves almost no organic traffic at all from google. I've done the obvious like checked WMT for messages but I'm not sure as to what caused the drop. As far as I'm aware there were no confirmed google updates on 7/6/13. The strange thing is that when I check the google cache its about 3 weeks old so I'm guessing google is using the old cache to reference my site content against search queries. Does that mean the changes I made are not the cause for the drop in rankings? The title tag however has changed when the site is showin in the SERPS. How can that be if google has an old cache?
On-Page Optimization | | SamCUK0 -
I have more pages in my site map being blocked by the robot file than I have being allowed to be crawled. Is Google going to hate me for this?
Using some rules to block all pages which start with "copy-of" on my website because people have a bad habit of duplicating new product listings to create our refurbished, surplus etc. listings for those products. To avoid Google seeing these as duplicate pages I've blocked them in the robot file, but of course they are still automatically generated in our sitemap. How bad is this?
On-Page Optimization | | absoauto0 -
Why so many crawl errors?
Our site is showing it has a ton of crawl errors in the back end, mostly concerning duplicate content within our blog. The content is unique however. We know this for certain because it's done in house or put together by some of the freelance writers we work with. The site is for an RV dealership and we're using a template-based system from a well known company. Any ideas on what may be causing this?
On-Page Optimization | | BlakeArbogast0 -
Handling a Huge Amount of Crawl Errors
HI all, I am faced with a crawl errors issue of a huge site (>1MiO pages) for which I am doing On-page Audit. 404 Erorrs: >80'000 Soft 404 Errors: 300 500 Errors: 1600 All of the above reported in GWT. Many of the error links are simply not present on the pages "linked from". I investigated a sample of pages (and their source) looking for the error links footprints and yet nothing. What would be the right way to address this issue from SEO perspective, anyway? Clearly. I am not able to investigate the reasons since I am seeing what is generated as HTML and NOT seeing what's behind. So my question is: Generally, what is the appropriate way of handling this? Telling the client that he has to investigate that (I gave my best to at least report the errors) Engaging my firm further and get a developer from my side to investigate? Thanks in advance!!
On-Page Optimization | | spiderz0 -
Waiting 3 days for Crawl Test to complete
Being new to seomoz Im not sure if I understand the crawl test completely. You setup a campaign, enter all your info, rogerbot goes out and crawls your site and gives you results as to what your doing right and what is wrong or could use looking into. So once I get my results, I make edits to my site pages. In my case Im getting lots of duplicate content and duplicate titles. So I go back and make adjustments and then submit a crawl test to see the change results. In other tools Ive used in past I was able to re run crawl immediately and fine tune results on the fly. seomoz crawl test is still pending after three days. is this normal? or is there another way to make changes and run reports to see results instantly? If your working on many sites and making changes, having to wait 3 or more days to see how your changes were received seems like a long time.
On-Page Optimization | | anthonytjm0 -
Source page leading to a 404 pages in reports
Hi everybody, I wonder how to find and quickly correct 404 errors in my crawl reports : SeoMoz says me "http://domain.com/this-page-is-dead" is 404, but I can't figure out a source page where a link to that url appears. I tried a google link:http://domain.com/this-page-is-dead request, with no more luck. I imagine the trick is trivial, but I need it 🙂 Moreover, why do not show a list of pages referring to this 404 page on reports ? Thanks, Loïc
On-Page Optimization | | mandinga0 -
Duplicate content problem
I am having an issue with duplicate content that I can't seem to figure out. I got rid of the www.mydomain.com by modifying the htaccess file but I can't figure out how to fix theproblem of mydomain.com/ and mydomain.com
On-Page Optimization | | ayetti0