Help recover lost traffic (70%) from robots.txt error.
-
Our site is a company information site with 15 million indexed pages (mostly company profiles). Recently we had an issue with a server that we replaced, and in the processes mistakenly copied the robots.txt block from the staging server to a live server. By the time we realized the error, we lost 2/3 of our indexed pages and a comparable amount of traffic. Apparently this error took place on 4/7/19, and was corrected two weeks later. We have submitted new sitemaps to Google and asked them to validate the fix approximately a week ago. Given the close to 10 million pages that need to be validated, so far we have not seen any meaningful change.
Will we ever get this traffic back? How long will it take? Any assistance will be greatly appreciated.
On another note, these indexed pages were never migrated to SSL for fear of losing traffic. If we have already lost the traffic and/or if it is going to take a long time to recover, should we migrate these pages to SSL?
Thanks,
-
Firstly, I would definitely take the opportunity to switch to SSL. A migration to SSL shouldn't be something to worry about if you set up your redirects properly, but given that most of your pages aren't indexed at all, it is even less risky.
You will eventually get the traffic back, as far as how long, it's very difficult to say.
I would concentrate on crawlability, and make sure your structure makes sense, and that you aren't linking any 404's or worse. Given the size of your site, that wouldn't be a bad thing anyway.
From your description of your pages, I'm not sure there is any "importance hierarchy", so my suggestion may not help, but you could make use of Google's API to submit pages for crawling. Unfortunately, you can only submit in batches of 100 and you are limited to 200 a day. You could, of course, prioritise or cherry pick some important pages and "hub" pages, if such things exist within your site, and then start working through those.
Following the recent Google blunder where they deindexes huge swathes of the web and, in the short term, the only way to get them back in the index was to resubmit them, someone has provided a tool to interact with the API, which you can find here: https://github.com/steve-journey-further/google-indexing-api-bulk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate URL errors when URL's are unique
Hi All, I'm running through MOZ analytics site crawl report and it is showing numerous duplicate URL errors, but the URLs appear to be unique. I see that the majority of the URL's are the same, but shouldn't the different brands make them unique to one another? http://www.sierratradingpost.com/clearance~1/clothing~d~5/tech-couture~b~33328/ http://www.sierratradingpost.com/clearance~1/clothing~d~5/zobha~b~3072/ Any ideas as to why these would be shown as duplicate URL errors?
On-Page Optimization | | STP_SEO0 -
Will German meta data help an English USA-based site get found in Germany?
I think I already know the answer to this, but I'd love a second opinion. If you add German meta data to some web pages will it help those pages turn up in search results in Germany?I'm assuming not. It's an English, USA based site, doing well here for English searches - and I don't want to mess that up. A German site is being planned, but this client is hoping the English site could be found in the meantime. ~Caro
On-Page Optimization | | Caro-O0 -
Website server errors
I launched a new website at www.cheaptubes.com and had recovered my search engine rankings as well after penguin & panda devestation. I'm was continuing to improve the site Sept 26th by adding caching of images and W3 cache but moz analytics is now saying I went from 288 medium issues to over 600 and i see the warning "45% of site pages served 302 redirects during the last crawl". I'm not sure how to fix this? I'm on WP using Yoast SEO so all the 301's I did are 301's not 302's. I do have SSL, could it be Http vs Https? I've asked this question before and two very nice people replied with suggestions which I tried to implement but couldn't, i got the WP white screen of death several times. They suggested the code below. Does anyone know how to implement this code or some other way to reduce the errors I'm getting? I've asked this at stackoverflow with no responses. "you have a lot of http & https issues so you should fix these with a bit of .htaccess code, as below. RewriteEngine On
On-Page Optimization | | cheaptubes
RewriteCond %{HTTPS} !=on
RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [R,L] You also have some non-www to www issues. You can fix these in .htaccess at the same time... RewriteCond %{HTTP_HOST} !^www.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] You should find this fixes a lot of your issues. Also check in your Wordpress general settings that the site is set to www.cheaptubes.com for both instances." When I tried to do as they suggested it gave me an internal server error. Please see the code below from .htaccess and the server error. I took it out for now. BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
RewriteEngine On RewriteCond %{HTTPS} !=on RewriteRule ^.$ https://%{SERVER_NAME}%{REQUEST_URI} [R,L]
RewriteCond %{HTTP_HOST} !^www. RewriteRule ^(.)$ http://www.%{HTTP_HOST}/$1 [R=301,L]</ifmodule> END WordPress Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, webmaster@cheaptubes.com and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.0 -
Do a bunch of footer internal links help or hurt?
We are an ecommerce site... In days gone by, having a bunch of footer links with your top products / categories was a good idea - as it created a ton of internal links to these products. Now, I am hearing that those links "dilute" the value of our other links on a page - and essentially, there is more harm than good from these. Does anyone know what I am talking about (the olds days) and should we still be doing this? Thanks
On-Page Optimization | | Ted_Cullen0 -
What to do with removed pages and 404 error
I recently removed about 600 'thin' pages from my site which are now showing as 404 errors in WMT as expected. As I understand it I should just let these pages 404 and eventually they'll be dropped from the index. There are no inbound links pointing at them so I don't need to 301 them. They keep appearing in WMT as 404's though so should I just 'mark as fixed' until they stop appearing? Is there any other action I need to take?
On-Page Optimization | | SamCUK0 -
Our urls for adwords are slightly different from current urls presented on site (weused htaccess to help create shorter urls). How important is it that the adwords url match the sitemap url for keywords on those pages?
Hello, We have dynamic urls that we have made into short urls through htaccess and code manipulation. Some of our adwords urls are different from our page urls - for example a) Latest version of page www.abc.com/x-y-z.html b) Previous version of url www.abc.com/x+y+z.html c) raw original version www.abc.com/yyy/zzz?category=X&Product-code=Y etc etc. Would my ranking for keywords on the page improve if I diligently made all of them the same? They all go to the same page even now, and no 404 errors or anything. Thanks Sam
On-Page Optimization | | samgold0 -
Should I index news and blog posts which receive little traffic
Hi all, I have a very large site at the moment with a handful of high authority pages with great content which describe the charity's work. But the main content on the site is blogs and news article which, while being good quality content from a reader's point of view, receive little traffic from search engines as they are so niche and long-tail. They do, however, get ok internal traffic from other pages. Should these pages be indexed still or should I remove them? Is there a rule of thumb regarding minimum clicks/bounce rates that an indexed page should have? What do big news agencies do with really niche articles that may get next to no traffic but have valuable content on for those that do click through? Thanks in advance! Den
On-Page Optimization | | Deniz0 -
Help, a certain directory is not being indexed
Before I start, dont expect this to be too easy. This really has me puzzled and am surprised I am still yet to find a solution for it. Get ready. We have a wordpress website, launched over 6 months ago and have never had an issue getting content such as pages and post pages and categories indexed. However, I some what recently (about 2 months ago) installed a directory plugin (Business Directory Plugin) which lists businesses via unique urls that are accesible from a sub folder. Its these business listings that I absolutely cannot get indexed. The index page to the directory which links to the business pages is indexed, however for some reason google is not indexing all the listing pages which are linked to from this page. Its not an issue of the content being uncrawlable or at least dont think so as when I run crawlers on my site such as xml sitemap crawlers it finds all the pages including the directory pages so I am sure its not an issue of the search engines not finding the content. I have created xml sitemaps and uploaded to webmaster tools, tools recongises that there are many pages in the xml sitemap but google continues to only index a small percentage (everything but my business listings). The directory has been there for about 8 weeks now so I know there is a issue as it should of been indexed by now. See our main website at www.smashrepairbid.com.au and the business directory index page at www.smashrepairbid.com.au/our-shops/ To throw in a curve ball, in looking into this issue and setting up tools we noticed a lot of 404 error pages (nearly 4,000). We were very confused where these were coming from as they were only being generated from search engines - humans could not access the 404s and so we are guessing se's were firing some javascript code to generate them or something else weird. We could see the 404s in the logs so we know they were legit but again feel it was only search engines, this was validated when we added some rules to robots.txt and we saw the errors in the logs stop. We put the rules in robots txt file to try and stop google from indexing the 404 pages as we could not find anyway to fix the site / code (no idea what is causing them). If you do a site search in google you will see all the pages that are omitted in the results. Since adding the rules to robots, our impressions shown through tools have jumped right up (increased by 5 times) so thought this was a good indication of improvement but still not getting the results we want. Does anyone have any clue whats going on or why google and other se's are not indexing this content? Any help would be greatly appreciated and if you need any other information to assist just ask me. Really appreciate anyone who can spare their time to help me, I sure do need it. Thanks.
On-Page Optimization | | ziller0