Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites
-
Hi,
My first time posting here, I am just looking for some feedback on a indexation issue we have with a client and any feedback on possible next steps or items I may have overlooked.
To give some background, our client operates a website for the core band and a also a number of microsites based on specific business units, so you have corewebsite.com along with bu1.corewebsite.com, bu2.corewebsite.com.
The content structure isn't ideal, as each microsite follows a structure of bu1.corewebsite.com/bu1/home.aspx, bu2.corewebsite.com/bu2/home.aspx and so on.
In addition to this each microsite has duplicate folders from the other microsites so bu1.corewebsite.com has indexable folders bu1.corewebsite.com/bu1/home.aspx but also bu1.corewebsite.com/bu2/home.aspx the same with bu2.corewebsite.com has bu2.corewebsite.com/bu2/home.aspx but also bu2.corewebsite.com/bu1/home.aspx. Therre are 5 different business units so you have this duplicate content scenario for all microsites.
This situation is being addressed in the medium term development roadmap and will be rectified in the next iteration of the site but that is still a ways out.
The issue
About 6 weeks ago we noticed a drop off in search rankings for two of our microsites (bu1.corewebsite.com and bu2.corewebsite.com) over a period of 2-3 weeks pretty much all our terms dropped out of the rankings and search visibility dropped to essentially 0.I can see that pages from the websites are still indexed but oddly it is the duplicate content pages so (bu1.corewebsite.com/bu3/home.aspx or (bu1.corewebsite.com/bu4/home.aspx is still indexed, similiarly on the bu2.corewebsite microsite bu2.corewebsite.com/bu3/home.aspx and bu4.corewebsite.com/bu3/home.aspx are indexed but no pages from the BU1 or BU2 content directories seem to be indexed under their own microsites.
Logging into webmaster tools I can see there is a "Google couldn't crawl your site because we were unable to access your site's robots.txt file." This was a bit odd as there was no robots.txt in the root directory but I got some weird results when I checked the BU1/BU2 microsites in technicalseo.com robots text tool.
Also due to the fact that there is a redirect from bu1.corewebsite.com/ to bu1.corewebsite.com/bu4.aspx I thought maybe there could be something there so consequently we removed the redirect and added a basic robots to the root directory for both microsites.
After this we saw a small pickup in site visibility, a few terms pop into our Moz campaign rankings but drop out again pretty quickly. Also the error message in GSC persisted.
Steps taken so far after that
- In Google Search Console, I confirmed there are no manual actions against the microsites.
- Confirmed there is no instances of noindex on any of the pages for BU1/BU2
- A number of the main links from the root domain to microsite BU1/BU2 have a rel="noopener noreferrer" attribute but we looked into this and found it has no impact on indexation
- Looking into this issue we saw some people had similar issues when using Cloudflare but our client doesn't use this service
- Using a response redirect header tool checker, we noticed a timeout when trying to mimic googlebot accessing the site
- Following on from point 5 we got a hold of a week of server logs from the client and I can see Googlebot successfully pinging the site and not getting 500 response codes from the server...but couldn't see any instance of it trying to index microsite BU1/BU2 content
So it seems to me that the issue could be something server side but I'm at a bit of a loss of next steps to take.
Any advice at all is much appreciated!
-
Hello ImpericMedia,
If you can share the site with me (private message is OK) I'll look into it. If you don't want to do that, here are some things I would look at:
1. If you have verified that the Robots.txt file is not blocking the pages you want indexed, and the pages are still not indexed (or indexed with a message about the Robots.txt file) you should check for a Robots Noindex meta tag on the page. If the source code looks strange you may have to use the Chrome Inspect tool to see the fully rendered page.
2. If there are no blocking robots meta tags on the page you should check the HTTP response for an X-Robots header.
3. If there is no X-Robots header, it's probably because of the duplicate content and spammy(seeming) subdomain setup.
Sorry about the wait. If you include the site URL it will get other community member's curious enough to check it out next time.
I hope this helps. If not, feel free to message me.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
ScreamingFrog won't crawl my site.
Hey guys, My site is Netspiren.dk and when I use a tool like Screaming Frog or Integrity, it only crawls my homepage and menu's - not product-pages. Examples
Intermediate & Advanced SEO | | FrederikTrovatten22
A menu: http://www.netspiren.dk/pl/Helse-Kosttilskud-Blandingsolie_57699.aspx
A product: http://www.netspiren.dk/pi/All-Omega-3-6-9-180-kapsler_1412956_57699.aspx Is it because the products are being loaded in Javascript?
What's your recommendation? All best,
Fred.0 -
Can I dissavow links on a 301'd website?
So we are performing link removal for a client on his old website (A), which is being 301 redirected to his new website (B). We have identified toxic links on site A and are removing, once complete we will undo the current 301, confirm a new GWT account for website A, and then submit the disavow report. We would then like to reapply the 301 redirect to site B while we are waiting for Google to process the disavow report, the logic being we can retain some current rankings on site B while waiting for the disavow to process on site A. Has anyone had experience with this method? I foresee some potential issues here but am interested to here from others on this. Thanks!
Intermediate & Advanced SEO | | SEOdub1 -
Can't find X-Robots tag!
Hi all. I've been checking out http://www.unthankbooks.com/ as it seems to have some indexing problems. I ran a server header check, and got a 200 response. However, it also shows the following: X-Robots-Tag:
Intermediate & Advanced SEO | | Blink-SEO
noindex, nofollow It's not in the page HTML though. Could it be being picked up from somewhere else?0 -
Should you give all the posts in a Forum an unique description? Or let it empty so Google can make one with the crawled keywords .... ...
To make all descriptions for all forum posts unique is a hell of a job.... One option is to crawl the first 165 characters and turn these automaticly into the meta description of the page.
Intermediate & Advanced SEO | | Zanox
If Google thinks the meta description is not suitable for the search query, Google will make a own description. In this case all te meta descriptions are unique, like the Google Guidlines want you to do. How will Google think off the fact when we delete the meta description tag so Google will make all the descriptions by herself?0 -
Tool that can retrieve mysite URL's
Hi, Tool that can retrieve mysite URL's I am not talking about href,open explorer, Majestic etc I have a list of 1000 site URL's where my site name is mentioned. I want to get the exact URL of my site next to the URL i want to query with Example http://moz.com/community is the URL i have and if this page has mysite name then i need to get the complete URL captured. Any software or tool that can do this? I used one for sure which got me this info but now i don't remember it Thanks
Intermediate & Advanced SEO | | mtthompsons0 -
Privacy Policy & T&C's SEO related question
With Adwords they request a Privacy Policy and T&C's sometimes for an Ad to be approved. Silly question I know but do you think Google looks out for pages like this to identity websites which are more genuine for organic? Thanks
Intermediate & Advanced SEO | | activitysuper0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Should I 301 Poorly Worded URL's which are indexed and driving traffic
Hi, I'm working on our sites structure and SEO at present and wondering when the benefit I may get from a well written URL, i.e ourDomain / keyword or keyphrase .html would be preferable to the downturn in traffic i may witness by 301 redirecting an existing, not as well structured, but indexed URL. We have a number of odd looking URL's i.e ourDomain / ourDomain_keyword_92.html alongside some others that will have a keyword followed by 20 underscores in a long line... My concern is although i would like to have a keyword or key phrase sitting on its own in a well targeted URL string I don't want to mess to much with pages that are driving say 2% or 3% of our traffic just because my OCD has kicked in.... Some further advice on strategies i could utilise would be great. My current thinking is that if a page is performing well then i should leave the URL alone. Then if I'm not 100% happy with the keyword or phrase it is targeting I could build another page to handle the new keyword / phrase with the aim of that moving up the rankings and eventually taking over from where the other page left off. Any advice is much appreciated, Guy
Intermediate & Advanced SEO | | guycampbell0