Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites
-
Hi,
My first time posting here, I am just looking for some feedback on a indexation issue we have with a client and any feedback on possible next steps or items I may have overlooked.
To give some background, our client operates a website for the core band and a also a number of microsites based on specific business units, so you have corewebsite.com along with bu1.corewebsite.com, bu2.corewebsite.com.
The content structure isn't ideal, as each microsite follows a structure of bu1.corewebsite.com/bu1/home.aspx, bu2.corewebsite.com/bu2/home.aspx and so on.
In addition to this each microsite has duplicate folders from the other microsites so bu1.corewebsite.com has indexable folders bu1.corewebsite.com/bu1/home.aspx but also bu1.corewebsite.com/bu2/home.aspx the same with bu2.corewebsite.com has bu2.corewebsite.com/bu2/home.aspx but also bu2.corewebsite.com/bu1/home.aspx. Therre are 5 different business units so you have this duplicate content scenario for all microsites.
This situation is being addressed in the medium term development roadmap and will be rectified in the next iteration of the site but that is still a ways out.
The issue
About 6 weeks ago we noticed a drop off in search rankings for two of our microsites (bu1.corewebsite.com and bu2.corewebsite.com) over a period of 2-3 weeks pretty much all our terms dropped out of the rankings and search visibility dropped to essentially 0.I can see that pages from the websites are still indexed but oddly it is the duplicate content pages so (bu1.corewebsite.com/bu3/home.aspx or (bu1.corewebsite.com/bu4/home.aspx is still indexed, similiarly on the bu2.corewebsite microsite bu2.corewebsite.com/bu3/home.aspx and bu4.corewebsite.com/bu3/home.aspx are indexed but no pages from the BU1 or BU2 content directories seem to be indexed under their own microsites.
Logging into webmaster tools I can see there is a "Google couldn't crawl your site because we were unable to access your site's robots.txt file." This was a bit odd as there was no robots.txt in the root directory but I got some weird results when I checked the BU1/BU2 microsites in technicalseo.com robots text tool.
Also due to the fact that there is a redirect from bu1.corewebsite.com/ to bu1.corewebsite.com/bu4.aspx I thought maybe there could be something there so consequently we removed the redirect and added a basic robots to the root directory for both microsites.
After this we saw a small pickup in site visibility, a few terms pop into our Moz campaign rankings but drop out again pretty quickly. Also the error message in GSC persisted.
Steps taken so far after that
- In Google Search Console, I confirmed there are no manual actions against the microsites.
- Confirmed there is no instances of noindex on any of the pages for BU1/BU2
- A number of the main links from the root domain to microsite BU1/BU2 have a rel="noopener noreferrer" attribute but we looked into this and found it has no impact on indexation
- Looking into this issue we saw some people had similar issues when using Cloudflare but our client doesn't use this service
- Using a response redirect header tool checker, we noticed a timeout when trying to mimic googlebot accessing the site
- Following on from point 5 we got a hold of a week of server logs from the client and I can see Googlebot successfully pinging the site and not getting 500 response codes from the server...but couldn't see any instance of it trying to index microsite BU1/BU2 content
So it seems to me that the issue could be something server side but I'm at a bit of a loss of next steps to take.
Any advice at all is much appreciated!
-
Hello ImpericMedia,
If you can share the site with me (private message is OK) I'll look into it. If you don't want to do that, here are some things I would look at:
1. If you have verified that the Robots.txt file is not blocking the pages you want indexed, and the pages are still not indexed (or indexed with a message about the Robots.txt file) you should check for a Robots Noindex meta tag on the page. If the source code looks strange you may have to use the Chrome Inspect tool to see the fully rendered page.
2. If there are no blocking robots meta tags on the page you should check the HTTP response for an X-Robots header.
3. If there is no X-Robots header, it's probably because of the duplicate content and spammy(seeming) subdomain setup.
Sorry about the wait. If you include the site URL it will get other community member's curious enough to check it out next time.
I hope this helps. If not, feel free to message me.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Fix Google Index error
I changed my blog URL structure Can Someone please let me how to solve this?
Intermediate & Advanced SEO | | Michael.Leonard0 -
Why isn't my site being indexed by Google?
Our domain was originally pointing to a Squarespace site that went live in March. In June, the site was rebuilt in WordPress and is currently hosted with WPEngine. Oddly, the site is being indexed by Bing and Yahoo, but is not indexed at all in Google i.e. site:example.com yields nothing. As far as I know, the site has never been indexed by Google, neither before nor after the switch. What gives? A few things to note: I am not "discouraging search engines" in WordPress Robots.txt is fine - I'm not blocking anything that shouldn't be blocked A sitemap has been submitted via Google Webmaster Tools and I have "fetched as Google" and submitted for indexing - No errors I've entered both the www and non-www in WMT and chose a preferred There are several incoming links to the site, some from popular domains The content on the site is pretty standard and crawlable, including several blog posts I have linked up the account to a Google+ page
Intermediate & Advanced SEO | | jtollaMOT0 -
Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page?
Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page? If I have 4 or 5 different hashtag link section pages , consolidated into one HTML Page, no chance to get one of the Hashtag Pages to appear as a search result? like, if under one Single Page Travel Guide I have two essential sections: #Attractions #Visa no chance to direct search queries for Visa directly to the Hashtag Link Section of #Visa? Thanks for any help
Intermediate & Advanced SEO | | Muhammad_Jabali0 -
What NAP format do I use if the USPS can't even find my client's address?
My client has a site already listed on Google+Local under "5208 N 1st St". He has some other NAPs, e.g., YellowPages, under "5208 N First Street". The USPS finds neither of these, nor any variation that I can possibly think of! Which is better? Do I just take the one that Google has accepted and make all the others like it as best I can? And doesn't it matter that the USPS doesn't even recognize the thing? Or no? Local SEO wizards, thanks in advance for your guidance!
Intermediate & Advanced SEO | | rayvensoft0 -
E-Commerce site - How do I geo-target towns/cities/states if there aren't any store locations?
Site = e-commerce Products = clothing (no apparel can be location specific like sports gear where you can do the location specific team gear (NBA, NFL, etc)) Problems = a. no store front b. I don't want to do any sitewides (footers, sidebars, etc) because of the penguin update Question = How do you geo-target these category pages and product pages? Ideas = a. reviews with clients locations b. blog posts with clients images wearing apparel and location description and keywords that also links back to that category or be it product page (images geo- targeted, tags, and description) c. ? Thanks in advance!
Intermediate & Advanced SEO | | Cyclone0 -
SEO and marketing for a company that doesn't want to promote their primary website
Hi All! One of my new clients is in a semi-grey-hat industry, and is in perpetual danger of having their real websites (of which they have several), blocked by the Chinese firewall (which is where their target market is). So their idea is to use neutral sites to write information (Squidoo, article site, maybe a stand-alone WP site with a few pages) and promote those pages. The idea being that China is less likely to block those sites, and then the link to the actual website from those pages could always be changed if China blocks the website listed. I'm a little dubious as to how feasible this is - how do you promote a Squidoo page? Or an article on an article site for semi-competitive keywords? Besides on-page SEO (which may not be enough), is there anything you can really do post-Penguin? If anyone has any ideas as to the above - or as to how else to effectively market sites when you can't market the site and brand directly, I'd be very happy to hear. Thanks!
Intermediate & Advanced SEO | | debi_zyx0 -
Sitemaps / Google Indexing / Submitted
We just submitted a new sitemap to google for our new rails app - http://www.thesquarefoot.com/sitemap.xml Which has over 1,400 pages, however Google is only seeing 114. About 1,200 are in the listings folder / 250 blog posts / and 15 landing pages. Any help would be appreciated! Aron sitemap.png
Intermediate & Advanced SEO | | TheSquareFoot0 -
How can I change my website's content on specific pages without affecting ranking for specific keywords?
My client's website (www.nursevillage.com) content has not been touched for 4 years and we are currently ranking #1 for "per diem nursing". They do not want to make any changes to the site in fear that it might decrease our rankings. We want to try to use utilize that keyword ranking on specific pages (www.nursevillage.com/nv/content/careeroptions/perdiem.jsp ) ranking for "per diem nursing" and try redirecting traffic or placing some banners and links on that page to specific pages or other sites related to "per diem nursing" jobs so we can get nurses to apply to our new nursing jobs. Any advice on why "per diem nursing" is ranking so high for us and what we can change on the site without messing up our ranking would be greatly appreciated. Thanks
Intermediate & Advanced SEO | | ryanperea1000