Will a Robots.txt 'disallow' of a directory, keep Google from seeing 301 redirects for pages/files within the directory?
-
Hi- I have a client that had thousands of dynamic php pages indexed by Google that shouldn't have been. He has since blocked these php pages via robots.txt disallow. Unfortunately, many of those php pages were linked to by high quality sites mulitiple times (instead of the static urls) before he put up the php 'disallow'.
If we create 301 redirects for some of these php URLs that area still showing high value backlinks and send them to the correct static URLs, will Google even see these 301 redirects and pass link value to the proper static URLs? Or will the robots.txt keep Google away and we lose all these high quality backlinks? I guess the same question applies if we use the canonical tag instead of the 301. Will the robots.txt keep Google from seeing the canonical tags on the php pages?
Thanks very much,
V
-
No problem
-
Hello Dmitrii,
Yes, that clarifies things perfectly. Thanks very much for your explanation. And I missed this particular WBF, so I will give it a close look as well.
Thanks again for your quick help.
-
Hello, my friend.
You should realize how exactly htaccess' 301 redirects work. They are server side commands/operations. So, when bots request a page, they wait until server response. In case of 301s - they get response "Don't go here, go there". Now, they also may get response from robots.txt saying "you're not allowed to look at the contents of this file/directory", however this will not prevent the server response. That's why sometimes you can see indexed pages, which are saying "blocked by robots". They are indexed though.
Now, in case of canonical links you are correct, since canonical is IN the content of the page, then robots won't be able to read it, therefore won't be able to be told that there is a canonical page.
There is a recent WBF on this subject - https://moz.com/blog/controlling-search-engine-crawlers-for-better-indexation-and-rankings-whiteboard-friday
Hope this clarifies some things.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
Google indexing staging / development site that is redirected...
Hi Moz Fans! - Please help. We had a acme.stagingdomain.com while a site was in development, when it went live it redirected (302) to acmeprofessionalservices.com (real names redacted!!) no known external links to staging site although staging site url has been emailed from Google Apps(!!!) now found that staging site is in the index even though it redirects to the proper public site. and some (but not all) of the pages are in the index too. They all redirect to the proper public site when visited. It is convenient to have a redirect from the staging site to the new one for the team, Chrome etc. remember frequently visited sites. Be a shame to lose that. Yes, these pages can be removed using webmaster tools.
Technical SEO | | mozroadjan
But how did they get in the index to start with? And if we're building a new site, and a customer has an existing site is there a danger of duplicate content etc. penalties caused by the staging site? We had a similar incident recently when a PDF that was not linked anywhere on the site appeared in the index. The link had been emailed through Google Apps, and visited in Chrome, but that was it. So 3 questions. Why is the staging site still in the index despite the redirects? How did they get in the index in the first place? Will the new staging site affect the rank of the existing site, eg. duplicate content penalties?0 -
To 301 redirect or not...
Hi guys i'd like to get your opinion on this. We currently have two sites, site A is the old one with PA44 and DA33. Site B is the new one which is going to replace site A it currently has PA37 and DA24 Our plan for the future is to shut down site A and redirect all pages using 301 to the relevant pages on side B. Currently we have some links in place for a couple of keywords on site A to site B which seems to be working great for our ranking. Now i'm wondering if this is maybe a good option, to give back links from A to B or will i pass through more link juice when redirecting everything? (ps. both are e commerce sites hosted and registred with different companies)
Technical SEO | | Immanuel0 -
Robots.txt crawling URL's we dont want it to
Hello We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to. Does anyone have an ideas how we can stop this happening?
Technical SEO | | ShearingsGroup0 -
Robots.txt
Hello Everyone, The problem I'm having is not knowing where to have the robots.txt file on our server. We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes... Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file? Thanks for your insight on this matter.
Technical SEO | | BailHotline0 -
Will rankings for my micro site rank better if I 301 redirect it to my main site?
This is my first time asking so I will try to be as clear as possible. Ok, I have a micro site that is an (exact match domain) and the domain is a couple 3-4 years old and ranks very well for several search terms. The main two terms it ranks for are like this. houses for rent in XXXXX XXXXX homes for rent (XXXXX equals a city name) The issue is this site has no backlinks, zero advanced SEO, I only did basic optimization to it when i set the site up. Even site structure, url structure all are not good.
Technical SEO | | Robbie8299
The only page I have ever even seen rank is the main root url. But with all that the site does really good in the top 1-2 results for key search terms. Now, I have a main site that is a very big site that has steadily been climbing in search terms every month with great backlinks, optimized for the city and all.
It currently ranks on second page for the listed search terms listed above. What I want to do is 301 redirect this microsite to my city page on my main site that is much better optimized for the key city terms.
The 301 redirect would point this "root domain" (mymicrosite.com) to my city page that looks like this. www.mymaindomain.com/city/XXXXXXX If I do this will Google rank my main URL city page as well as it ranks this microsite with zero links, seo, etc, etc. What happens if it does not? Will I be able to turn off the 301 redirect and keep the microsite rankings? My main reason for wanting this is I want this city page to rank well and I only want to optimize one site instead of both. Any help would be great!0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0 -
Switching ecommerce CMS's - Best Way to write URL 301's and sub pages?
Hey guys, What a headache i've been going through the last few days trying to make sure my upcoming move is near-perfect. Right now all my urls are written like this /page-name (all lowercase, exact, no forward slash at end). In the new CMS they will be written like this: /Page-Name/ (with the forward slash at the end). When I generate an XML sitemap in the new ecomm CMS internally it lists the category pages with a forward slash at the end, just like they show up through out the CMS. This seems sloppy to me, but I have no control over it. Is this OK for SEO? I'm worried my PR 4, well built ecommerce website is going to lose value to small (but potentially large) errors like this. If this is indeed not good practice, is there a resource about not using the forward slash at the end of URLS in sitemaps i can present to the community at the platform? They are usually real quick to make fixes if something is not up to standards. Thanks in advance, -First Time Ecommerce Platform Transition Guy
Technical SEO | | Hyrule0