New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is the URL Matching the Page Title Important?
Hello I have tried searching for an answer on this but I can't get a clear answer due to the results when searching for URL title. I have just launched our second Shopify site for one of our brands. My first site launched in 2014 but when I launched I didn't pay much heed to SEO for page titles, URLs, etc so have retrospectively fixed this over time. For my Shopify site just launching I want to get it as right as possible from the start (learning from mistakes). My question is regarding URLs and what my approach should be for better SEO. So, I have a page with a Title of Newton Leather Wallets, Purses, Card Holders & Glasses Cases and the URL is https://www.tumbleandhide.com/collections/newton-leather-wallets-card-holders It was my understanding that I should try and make the URL reflect the Page Title more accurately. The problem is that this takes the character count to 77. On other pages it can be in the 80s. Will the above link be better for SEO than say just https://www.tumbleandhide.com/collections/newton I am just wary of the URL's being too long as my Moz Site Crawl is returning a lot of URLs that are too long. Thanks in Advance.
On-Page Optimization | | lukegj0 -
WordPress image urls - need a WP maven
We were having a conversation re urls that are indexed for images that are stored in various media plugins in WP. My question for anyone who is an uberWP person is: What is your opinion re best media storage plugins and how these URLs affect pages on a site for ranking, etc. I realize this is broad, but it is driven out of my concern that I cannot touch everything. When I see a url like this: http://www.drumbeatmarketing.net/wp-content/themes/drumbeat2/img/DB-LOGO-White.png I know there is no way with all the sites and clients we handle that I can get it perfect but this just bugs me for some reason. Should I just chill since it (seemingly) affects so little....?
On-Page Optimization | | RobertFisher1 -
URL advice
Hi & thanks for looking, I'm not sure if I've adopted the best SEO URL structure for my site, www.vintageheirloom.com For instance, www.vintageheirloom.com/product-category/authentic-designer-vintage-bags/ Works great for the top level category 'All bags', as I'm trying to keyword authentic designer vintage bags. However the sub categories for instance 'Clutch bags' appears as, www.vintageheirloom.com/product-category/authentic-designer-vintage-bags/vintage-clutch-bags/. As you can see at the moment this URL contains duplicate terms vintage & bags. I'm guessing that duplicate keywords in a url isn't too smart, but should amend with Option 1, 2, 3 or something completely different? Option 1 - keep the top level category url the same, change the subcategory: www.vintageheirloom.com/product-category/authentic-designer-vintage-bags/clutch/ Option 2 - amend the top level category: www.vintageheirloom.com/product-category/authentic-designer/vintage-clutch-bags/ Option 3 - amend the top level category as this: www.vintageheirloom.com/product-category/bags/authentic-designer-vintage-clutch/ By the way I'm using WordPress with Woocommerce. I've asked but it's not possible with some technical issues to remove the /product-category/ section. But each product is for example just: www.vintageheirloom.com/shop/vintage-coach-yellow-duffel-sac-bag/ .... sweet. Thanks again !!
On-Page Optimization | | well-its-1-louder0 -
SEO without CMS: Impossible?
Is WordPress the ONLY way to go for an SEO friendly website? Any REAL reason for using anything but?
On-Page Optimization | | EliteErikSEO0 -
Changing the url of a page
Hello. I would like to change the url of a page. It currently has very few inbound links. I would set up a 301 redirect to the new url. Is there anything else I should take into account before changing the url? Is there a downside to changing a url? Do inbound links carry the same value when a 301 redirect is involved? Thank you!
On-Page Optimization | | nyc-seo0 -
404 crawl errors with all url+domain
We have 187 crawl 404 errors. All urls on web make a 404 error that this http://www.domain.com/[.....]l/www.domain.com all errors added to the url, the url domain I put an example gestoriabarcelona.com/www.gestoriabarcelona.com
On-Page Optimization | | promonet
gestoriabarcelona.com/tarifas/www.gestoriabarcelona.com
gestoriabarcelona.com/category/noticias/page/7/www.gestoriabarcelona.com
gestoriabarcelona.com/2012/08/amortizacion-de-unaconstruccion/
www.gestoriabarcelona.com
[..] I don't know where can i find to solve errors Anyone can help me? Thanks0 -
Two different keywords - one URL
We're new to SEO, but have two keywords that are really not quite the same, but Google has targeted the same URL for us ... which means that SEO Moz is recommending we optimize the same URL, for opposite keywords (using the on page SEO). For example, the keywords (these aren't our keywords) of say, "beer brewing" and "ways to make beer for small breweries" are both pointing at our home page. The on page SEO is showing that "beer brewing" is a rank of say, a google ranking of 9. However, "ways to ..." is a google ranking of 47. So ... what am I supposed to do now? Do I rewrite the page to have "ways to ..." more prominent? I cannot really have the title and h1's include both ... What do I do now? We have about 3 or 4 of these "pairs". -- Anthony
On-Page Optimization | | apresley0 -
SEF URLs. Should I use / or - ?
I have o activate SEF URLs in a website. Regarding SEO, is there any difference between using / or - ? I mean, Is it better to write URLs like this: http://www.domain.com/folder/folder/page or like this: http://www.domain.com/folder-folder-page ? Is there any difference? Thanks!
On-Page Optimization | | ociosu0