Magneto site with many pages
-
just finsihed scan to a magento site.
off course I am getting thousand of pages that are dynamic.
search pages and other.
checking with site command on Google I see 154,000 results
which pages it is recommended to block?
some people are talking about blocking the search pages and some actually talking about allowing them?
any answer on this?
Thanks
-
There was no effect on the rankings, no significant ups or downs. But now when I do the site:www.domain.com command in Google, I see just the pages that I want Google to index.
It can only help in the long run I guess.
-
Hi
sorry for responding late.
what was the affect on your results when u did block all those pages?
Thanks
-
yes my thought is blocking through robot.txt
-
I've had similar problems with a few Magento sites. This is a standard list I use in my robots.txt files (below.) I hope it helps.
You don't have to include all the Magento folders like 'app' 'lib' 'var' and 'admin' t etc hey are just there to be thorough.
I think you'll get the idea. I've brought the number of indexed pages down from half a million to just a few thousand using these.
Disallow: /*? Disallow: /*.js$ Disallow: /*.css$ Disallow: /404/ Disallow: /admin/ Disallow: /api/ Disallow: /app/ Disallow: /catalog/category/view/ Disallow: /catalog/product/view/ Disallow: /catalog/product_compare/ Disallow: /catalogsearch/ Disallow: /catalogsearch/advanced/ Disallow: /catalogsearch/term/ Disallow: /catalogsearch/term/popular/ Disallow: /cgi-bin/ Disallow: /checkout/ Disallow: /checkout/cart/ Disallow: /contacts/ Disallow: /contacts/index/ Disallow: /contacts/index/post/ Disallow: /customer/ Disallow: /customer/account/ Disallow: /customer/account/login/ Disallow: /downloader/ Disallow: /install/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /newsletter/ Disallow: /pkginfo/ Disallow: /private/ Disallow: /poll/ Disallow: /report/ Disallow: /review/ Disallow: /sendfriend/ Disallow: /skin/ Disallow: /tag/ Disallow: /var/ Disallow: /wishlist/
-
Hi there! When you say block, do you mean through your robots.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages.
Hello, My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages. I have contacted my theme company but not sure what could have done this. Any ideas? The original posts/pages are still correct and working it just looks like it did duplicates and added void(0 to the end of each post/page. Questions: There is no way to undo this correct? Do I have to do a redirect on each of these? Will this hurt my rankings and domain authority? Any suggestions would be appreciated. Thanks, Wade
Intermediate & Advanced SEO | | neverenoughmusic.com0 -
301 migration - Indexed Pages rising on old site
Hello, We did a 301 redirect from site a to site b back in March. I would check on a daily basis on the index count using query "site:sitename" The past couple of days, the old domain (that was 301 redirected) indexed pages has been rising which is really concerning. We did a 301 redirect back in march 2016, and the indexed count went from 400k pages down to 78k. However, the past 3 days it went from 78k to 89,500. And I'm worried that the number is going to continue to rise. My question - What would you do to investigate / how to investigate this issue? Would it be screaming frog and look at redirects? Or is this a unique scenario that I'd have to do other steps/procedures?
Intermediate & Advanced SEO | | ggpaul5620 -
Something happened within the last 2 weeks on our WordPress-hosted site that created "duplicates" by counting www.company.com/example and company.com/example (without the 'www.') as separate pages. Any idea what could have happened, and how to fix it?
Our website is running through WordPress. We've been running Moz for over a month now. Only recently, within the past 2 weeks, have we been alerted to over 100 duplicate pages. It appears something happened that created a duplicate of every single page on our site; "www.company.com/example" and "company.com/example." Again, according to our MOZ, this is a recent issue. I'm almost certain that prior to a couple of weeks ago, there existed both forms of the URL that directed to the same page without be counting as a duplicate. Thanks for you help!
Intermediate & Advanced SEO | | wzimmer0 -
301'd an important, ranking page to the wrong new page, any recourse?
Our 1,300 page site conversion from static html to Wordpress platform went flawlessly with the exception of 1 significant issue....an old, important, highly ranking page was 301 redirected to the wrong corresponding new page. The page it was redirected to is about a similar product, but not the same. This was an oversight that slipped through. It was brought to my attention when I noticed this new page was still holding the old page's rankings but the bounce rate skyrocketed (clearly because the content on the wrong new page was not relevant). Once identified, we cleaned up the redirect. My fear is that all the juice built up on the old .html page that ranked well has now permanently been passed to an irrelevant, insignificant page. -Is there any way to clean up this mistake? -Is there anything I can do to assist Google in associating the correct 'new' page with correct 'old' page after the wrong redirect was initially set-up? -Am I going to have to start from scratch with the new page in terms of trust, backlinks, etc. since google already noted the redirect? Thanks!
Intermediate & Advanced SEO | | seagreen0 -
Investigating Google's treatment of different pages on our site - canonicals, addresses, and more.
Hey all - I hesitate to ask this question, but have spent weeks trying to figure it out to no avail. We are a real estate company and many of our building pages do not show up for a given address. I first thought maybe google did not like us, but we show up well for certain keywords 3rd for Houston office space and dallas office space, etc. We have decent DA and inbound links, but for some reason we do not show up for addresses. An example, 44 Wall St or 44 Wall St office space, we are no where to be found. Our title and description should allow us to easily picked up, but after scrolling through 15 pages (with a ton of non relevant results), we do not show up. This happens quite a bit. I have checked we are being crawled by looking at 44 Wall St TheSquareFoot and checking the cause. We have individual listing pages (with the same titles and descriptions) inside the buildings, but use canonical tags to let google know that these are related and want the building pages to be dominant. I have worked though quite a few tests and can not come up with a reason. If we were just page 7 and never moved it would be one thing, but since we do not show up at all, it almost seems like google is punishing us. My hope is there is one thing that we are doing wrong that is easily fixed. I realize in an ideal world we would have shorter URLs and other nits and nats, but this feels like something that would help us go from page 3 to page 1, not prevent us from ranking at all. Any thoughts or helpful comments would be greatly appreciated. http://www.thesquarefoot.com/buildings/ny/new-york/10005/lower-manhattan/44-wall-st/44-wall-street We do show up one page 1 for this building - http://www.thesquarefoot.com/buildings/ny/new-york/10036/midtown/1501-broadway, but is the exception. I have tried investigating any differences, but am quite baffled.
Intermediate & Advanced SEO | | AtticusBerg10 -
Rel canonical on every page, pointing to home page
I've just started working with a client and have been surprised to find that every page of their site (using Concrete5 CMS) has a rel=canonical pointing to their home page. I'm feeling really dumb, because this seems like a fatal flaw which would keep Google from ranking any page other than the home page... but when I look at Google Analytics, Content > Site Content > Landing Pages, using Secondary Dimension = Source, it seems that Google is delivering users to numerous pages on their site. Can anyone help me out?! Thanks very much!!
Intermediate & Advanced SEO | | measurableROI0 -
Our Site's Content on a Third Party Site--Best Practices?
One of our clients wants to use about 200 of our articles on their site, and they're hoping to get some SEO benefit from using this content. I know standard best practices is to canonicalize their pages to our pages, but then they wouldn't get any benefit--since a canonical tag will effectively de-index the content from their site. Our thoughts so far: add a paragraph of original content to our content link to our site as the original source (to help mitigate the risk of our site getting hit by any penalties) What are your thoughts on this? Do you think adding a paragraph of original content will matter much? Do you think our site will be free of penalty since we were the first place to publish the content and there will be a link back to our site? They are really pushing for not using a canonical--so this isn't an option. What would you do?
Intermediate & Advanced SEO | | nicole.healthline1 -
What causes internal pages to have a page rank of 0 if the home page is PR 5?
The home page PageRank is 5 but every single internal page is PR 0. Things I know I need to address each page has 300 links (Menu problem). Each article has 2-3 duplicates caused from the CMS working on this now. Has anyone else had this problem before? What things should I look out for to fix this issue. All internal linking is follow there is no page rank sculpting happening on the pages.
Intermediate & Advanced SEO | | SEOBrent0