Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites
-
Hi,
My first time posting here, I am just looking for some feedback on a indexation issue we have with a client and any feedback on possible next steps or items I may have overlooked.
To give some background, our client operates a website for the core band and a also a number of microsites based on specific business units, so you have corewebsite.com along with bu1.corewebsite.com, bu2.corewebsite.com.
The content structure isn't ideal, as each microsite follows a structure of bu1.corewebsite.com/bu1/home.aspx, bu2.corewebsite.com/bu2/home.aspx and so on.
In addition to this each microsite has duplicate folders from the other microsites so bu1.corewebsite.com has indexable folders bu1.corewebsite.com/bu1/home.aspx but also bu1.corewebsite.com/bu2/home.aspx the same with bu2.corewebsite.com has bu2.corewebsite.com/bu2/home.aspx but also bu2.corewebsite.com/bu1/home.aspx. Therre are 5 different business units so you have this duplicate content scenario for all microsites.
This situation is being addressed in the medium term development roadmap and will be rectified in the next iteration of the site but that is still a ways out.
The issue
About 6 weeks ago we noticed a drop off in search rankings for two of our microsites (bu1.corewebsite.com and bu2.corewebsite.com) over a period of 2-3 weeks pretty much all our terms dropped out of the rankings and search visibility dropped to essentially 0.I can see that pages from the websites are still indexed but oddly it is the duplicate content pages so (bu1.corewebsite.com/bu3/home.aspx or (bu1.corewebsite.com/bu4/home.aspx is still indexed, similiarly on the bu2.corewebsite microsite bu2.corewebsite.com/bu3/home.aspx and bu4.corewebsite.com/bu3/home.aspx are indexed but no pages from the BU1 or BU2 content directories seem to be indexed under their own microsites.
Logging into webmaster tools I can see there is a "Google couldn't crawl your site because we were unable to access your site's robots.txt file." This was a bit odd as there was no robots.txt in the root directory but I got some weird results when I checked the BU1/BU2 microsites in technicalseo.com robots text tool.
Also due to the fact that there is a redirect from bu1.corewebsite.com/ to bu1.corewebsite.com/bu4.aspx I thought maybe there could be something there so consequently we removed the redirect and added a basic robots to the root directory for both microsites.
After this we saw a small pickup in site visibility, a few terms pop into our Moz campaign rankings but drop out again pretty quickly. Also the error message in GSC persisted.
Steps taken so far after that
- In Google Search Console, I confirmed there are no manual actions against the microsites.
- Confirmed there is no instances of noindex on any of the pages for BU1/BU2
- A number of the main links from the root domain to microsite BU1/BU2 have a rel="noopener noreferrer" attribute but we looked into this and found it has no impact on indexation
- Looking into this issue we saw some people had similar issues when using Cloudflare but our client doesn't use this service
- Using a response redirect header tool checker, we noticed a timeout when trying to mimic googlebot accessing the site
- Following on from point 5 we got a hold of a week of server logs from the client and I can see Googlebot successfully pinging the site and not getting 500 response codes from the server...but couldn't see any instance of it trying to index microsite BU1/BU2 content
So it seems to me that the issue could be something server side but I'm at a bit of a loss of next steps to take.
Any advice at all is much appreciated!
-
Hello ImpericMedia,
If you can share the site with me (private message is OK) I'll look into it. If you don't want to do that, here are some things I would look at:
1. If you have verified that the Robots.txt file is not blocking the pages you want indexed, and the pages are still not indexed (or indexed with a message about the Robots.txt file) you should check for a Robots Noindex meta tag on the page. If the source code looks strange you may have to use the Chrome Inspect tool to see the fully rendered page.
2. If there are no blocking robots meta tags on the page you should check the HTTP response for an X-Robots header.
3. If there is no X-Robots header, it's probably because of the duplicate content and spammy(seeming) subdomain setup.
Sorry about the wait. If you include the site URL it will get other community member's curious enough to check it out next time.
I hope this helps. If not, feel free to message me.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Specific page does not index
Hi, First question: Working on the indexation of all pages for a specific client, there's one page that refuses to index. Google Search console says there's a robots.txt file, but I can't seem to find any tracks of that in the backend, nor in the code itself. Could someone reach out to me and tell me why this is happening? The page: https://www.brody.be/nl/assistentiewoningen/ Second question: Google is showing another meta description than the one our client gave in in Yoast Premium snippet. Could it be there's another plugin overwriting this description? Or do we have to wait for it to change after a specific period of time? Hope you guys can help
Intermediate & Advanced SEO | | conversal0 -
Magento 1.9 SEO. I have product pages with identical On Page SEO score in the 90's. Some pull up Google page 1 some won't pull up at all. I am searching for the exact title on that page.
I have a website built on Magento 1.9. There are approximately 290,000 part numbers on the site. I am sampling Google SERP results. About 20% of the keywords show up on page 1 position 5 thru 10. 80% don't show up at all. When I do a MOZ page score I get high 80's to 90's. A page score of 89 on one part # may show up on page one, An identical page score on a different part # can't be found on Google. I am searching for the exact part # in the page title. Any thoughts on what may be going on? This seems to me like a Magento SEO issue.
Intermediate & Advanced SEO | | CTOPDS0 -
Site still indexed after request 'change of address' search console
Hello, A couple of weeks ago we requested a change of address in Search console. The new, correct url is already indexed. Yet when we search the old url (with site:www.) we find that the old url is still indexed. Is there another way to remove old urls?
Intermediate & Advanced SEO | | conversal0 -
Moving to https with a bunch of redirects my programmer can't handle
Hi Mozzers, I referred a client of mine (last time) to a programmer that can transition their site from http to https. They use a wordpress website and currently use EPS Redirects as a plugin that 301 redirects about 400 pages. Currently, the way EPS redirects is setup (as shown in the attachment) is simple: On the left side you enter your old url, and on the the right side is the newly 301'd url. But here's the issue, since my client made the transition to https, the whole wordpress backend is setup that way as well. What this means is, if my client finds another old http url that he wants to redirect, this plugin only allows them to redirect https to https. As of now, all old http to https redirects STILL work even though the left side of the plugin switched all url's to a default HTTPS. But my client is worried the next plugin update he will lose all http to https redirects. While asking our programmer to add all 400 redirects to .htaccess, he states that's too many redirects and could slow down the website. Well, we don't want to lose all 400 301's and jeopardize our SEO. Question: what does everyone suggest as an alternative solution/plugin to redirect old http urls to https and future https to https urls? Thank you all! Ol8km
Intermediate & Advanced SEO | | Shawn1240 -
Link earning for local businesses who can't afford content marketing
What are some of the best ways to earn and build quality relevant links that will increase exposure to your target market in addition to assisting search rankings? I personally find that local niche directories and PR are the best ways to accomplish this without having content to "earn links"..what else works? Any interesting ideas??
Intermediate & Advanced SEO | | RickyShockley0 -
Hide H1 tags on pages. Don't chuckle-Need assistance.
I redesigned my companies website and I am first and foremost an SEO person so I know the importance of a well laid out website. Furthermore, I know realistically you should NEVER hide text whether it's with WH or BH intentions but here is my problem. For every page I have all the details taken care of except proper placement of H1 tags. My website is responsive designed VERY competitive industry I have to make sure it is properly developed both design wise and seo wise It's an INC 5000 company so NO BH intentions On phones and tablet devices I have the header images hidden and in the place of header images I have the information as in location, service,etc of whatever that page may be. This makes it look good on desktops and serves up information quickly to people using phones and tablets. My question is: Would it be bad to turn that text seen on tablets and phones into an h1 tag as it's hidden on desktops with CSS but available on mobile devices. My problem is making the h1 tag's work with the desktop versions visually as placement doesn't make since. Any opinions are appreciated. Thanks Ballanrk
Intermediate & Advanced SEO | | ballanrk0 -
SEO and marketing for a company that doesn't want to promote their primary website
Hi All! One of my new clients is in a semi-grey-hat industry, and is in perpetual danger of having their real websites (of which they have several), blocked by the Chinese firewall (which is where their target market is). So their idea is to use neutral sites to write information (Squidoo, article site, maybe a stand-alone WP site with a few pages) and promote those pages. The idea being that China is less likely to block those sites, and then the link to the actual website from those pages could always be changed if China blocks the website listed. I'm a little dubious as to how feasible this is - how do you promote a Squidoo page? Or an article on an article site for semi-competitive keywords? Besides on-page SEO (which may not be enough), is there anything you can really do post-Penguin? If anyone has any ideas as to the above - or as to how else to effectively market sites when you can't market the site and brand directly, I'd be very happy to hear. Thanks!
Intermediate & Advanced SEO | | debi_zyx0 -
Can some brilliant mozzer out there teach a moron/newbie like me how to 301 redirect several URL's I have?
Okay - I am a supermodel. I look pretty. My legs are amazing. My cheekbones are high. But when it comes to 301 redirects I am the ugliest supermodel on the block. Crap, here is the truth: I am not even a supermodel. I am just a middle-aged, goofy looking dude who is a newbie to fixing websites. I have inherited several sites from a friend and I have been helping by creating solid contextual links internally and externally for a while. But, when Roger the wondrous SEOMoz robot talks to me, he says, "oops, it looks like your foolish freak self has a site that has both a www. and a non-www, which can create competition for yourself." What do I do when he says that? I just whisper a "thank-you" but gently press the skip this step button and go on with my life because I do not know how to make my non-www.'s redirect into the www. sites... Now, I have sort of asked this question on the site before, but I was answered by someone who does not understand my level of ignorance. any use of the word canonical or just put this lfwjkshj.htp/php inside the left ear of your mom, does not tell me anything so, is there any willing and kind soul who can walk me through redirecting several of my sites to their proper home - kind of like Carl Chubbs Weathers did for Happy Gilmore in that Academy Award winning classic? Thanks for the help in advance best, dumbhead
Intermediate & Advanced SEO | | creativeguy0