Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Crawl solutions for landing pages that don't contain a robots.txt file?
-
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
-
No problem Tom. Thanks for the additional info — that is helpful to know.
-
Bryan,
I’m glad that you found what you where looking for.
I must have missed the part about it being 100% Instapage when you said CMS I thought meant something on else with instapage I think of it as landing pages not a CMS
I want to help so you asked about Google search console how often you need to request google index your site.
First make sure
You should have 5 urls in Google search console
your domain, http://www. , http:// , https://www. & https://
- nomader.com
- https://www.nomader.com
- https://nomader.com
- http;//www.nomader.com
- http://nomader.com
you should not have to requests google index once you’re pages are in googles index. There is no time line to make you need to requests google index.
Use search consoles index system to see if you need to make a request and look for notifications
Times you should request google crawl when adding new unlinked pages , when making big changes to your site , whatever adding pages with out a xml sitemap or fixing problems / testing.
I want to help so as you said you’re going to be using Shopify.
Just before you go live running on Shopify in the future you should make a xml sitemap of the Instapage site
You can do it for free using https://www.screamingfrog.co.uk/seo-spider/
you’re running now name it /sitemap_ip.xml or /sitemap2.xml upload it to Shopify
& make sure it’s not the same name so it will work with your Shopify xml sitemap /sitemap.xml
submit the /sitemap._ip.xml to search console then add the Shopify /sitemap.xml
You can run multiple xml sitemaps as long as they are not overlapping
just remember never add non-200 page, 404s, 300sno flow , no index or redirects to a xml sitemap ScreamingFrog will ask if you want to when you’re making the sitemap.
Shopify will make its own xml sitemaps and and having the current site as a second xml sitemap will help to make sure your change to the site will not hurt the intipage par of the Shopify site
https://support.google.com/webmasters/answer/34592?hl=en
know adding a XML Sitemap is a smart move
I hope that was of help I’m so about miss what you meant.
respectfully,
Tom
-
Thanks so much for your thoughtful, detailed response. That answers my question.
-
Bryan,
If I understand your intent, you want your pages indexed. I see that your site has 5 pages indexed (/, /help, /influencers, /wholesale, /co-brand). And that you have some other pages (e.g. /donations), which are not indexed, but these have "noindex" tags explicitly in their HEAD sections.
Not having a robots.txt file is equal to having a robots.txt file with a directive to allow crawling of all pages. This is per http://www.robotstxt.org/orig.html, where they say "The presence of an empty "/robots.txt" file has no explicit associated semantics, it will be treated as if it was not present, i.e. all robots will consider themselves welcome."
So, if you have no robots.txt file, the search engine will feel free to crawl everything it discovers, and then whether or not it indexes those pages will be guided by presence or absence of NOINDEX tags in your HEAD sections. From a quick browse of your site and its indexed pages, this seems to be working properly.
Note that I'm referencing a distinction between "crawling" and "indexing". The robots.txt file provides directives for crawling (i.e. access discovered pages, and discovering pages linked to those). Whereas the meta robots tags in the head provide directives for indexing (i.e. including the discovered pages in search index and displaying those as results to searchers). And in this context, absence of a robots.txt file simply allows the search engine to crawl all of your content, discover all linked pages, and then rely on meta robots directives in those pages for any guidance on whether or not to index those pages it finds.
As for a sitemap, while they are helpful for monitoring indexation, and also provide help to search engines to discover all desired pages, in your case it doesn't look especially necessary. Again, I only took a quick look, but it seems you have your key pages all linked from your home page, and you have meta directives in pages you wish to keep out of the index. And you have a very small number of pages. So, it looks like you are meeting your crawl and indexation desires.
-
Hi Tom,
Unfortunately, Instapage is a proprietary CMS that does not currently support robots.txt or site maps. Instapage is primarily built for landing pages, and not actual websites so that's their reasoning for not adding SEO support for basics like robots.txt and site maps.
Thanks anyway for your help.
Best,
-Bryan
-
hi
so I see the problem now
https://www.nomader.com/robots.txt
Does not have a robots.txt file upload it to the root of your server or specific place where Developer and/or CMS / Hosting company recommends I could not figure out what to type of CMS you’re useing if you’re using one
make a robots.txt file using
http://tools.seobook.com/robots-txt/generator/
https://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/exportrobots.php
https://moz.com/learn/seo/robotstxt
It will look like this below.
User-Agent: *
Disallow:Sitemap: https://www.nomader.com/sitemap.xml
it looks like you’re using Java for your website?
https://builtwith.com/detailed/nomader.com
I am guessing you’re not using a subdomain to host the Landing Pages?
If you are using a subdomain you would have to create a robots.txt file for that but from everything I can see you’re using your regular domain. So you would simply create these files ( i’m in a car on a cell phone so I did quick to see check if you have a XML site map file but I do think you do
https://www.nomader.com/sitemap.xml
You can purchase a tool called Screaming Frog SEO spider if your site is over 500 pages you will need to pay for it it’s approximately $200 however you will be able to create a wonderful site map you can also create a XML site map by googling xml sitemap generators. However I would recommend Screaming Prod because you can separate the images and it’s a very good tool to have.
Because you will need to generate a new site map whenever you update your site or add Landing Pages it will be done using screaming frog and uploaded to the same place in the server. Unless you can create a dynamic sitemap using whatever website of the infrastructure structure using.
Here are the directions to add your site Google Search Console / Google Webmaster Tools
https://support.google.com/webmasters/answer/34592?hl=en
If you need any help with any of this please do not hesitate to ask I am more than happy to help you can also generate a site map in the old version of Google Webmaster Tools / Google Search Console.
Hope this helps,
Tom
-
Thanks for the reply Thomas. Where do you see that my site has the robots.txt file? As far as I can tell, it is missing. Instapage does not offer robots.txt as I mentioned in my post. Here's a community help page of theirs where this question was asked and answered: https://help.instapage.com/hc/en-us/community/posts/213622968-Sitemap-and-Robotx-txt
So in the absence of having a robots.txt file, I guess the only way to counter this is to manually request a fetch/index from Google console? How often do you recommend I do this?
-
You don’t need to worry about instapage & robot.txt your site has the robots.txt & instapage is not set to no index.
so yes use google search console to fetch / index the pages it’s very easy if you read the help information I posted below
https://help.instapage.com/hc/en-us#
hope that helps,
Tom
-
If you cannot turn off “Meta Noindex“ you cannot fix it with robots.txt I suggest you contact the developer of the Instapage landing pages app. If it’s locked to no index as you said that is the only of for countering a pre coded by the company Meta Noindex issue?
I will look into this for you I bet that you can change it but not via robots.txt. I
will update it in the morning for you.
All the best,
Tom
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
Multiple robots.txt files on server
Hi! I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step. One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names: robots.txt (original dupplicate)
Technical SEO | | mjukhud
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate) Would really appreciate help and expertise suggestions. Thanks!0 -
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
Technical SEO | | Pete40 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Block Domain in robots.txt
Hi. We had some URLs that were indexed in Google from a www1-subdomain. We have now disabled the URLs (returning a 404 - for other reasons we cannot do a redirect from www1 to www) and blocked via robots.txt. But the amount of indexed pages keeps increasing (for 2 weeks now). Unfortunately, I cannot install Webmaster Tools for this subdomain to tell Google to back off... Any ideas why this could be and whether it's normal? I can send you more domain infos by personal message if you want to have a look at it.
Technical SEO | | zeepartner0 -
Can too many pages hurt crawling and ranking?
Hi, I work for local yellow pages in Belgium, over the last months we introduced a succesfull technique to boost SEO traffic: we have created over 150k of new pages, all targeting specific keywords and all containing unique content, a site architecture to enable google to find these pages through crawling, xml sitemaps, .... All signs (traffic, indexation of xml sitemaps, rankings, ...) are positive. So far so good. We are able to quickly build more unique pages, and I wonder how google will react to this type of "large scale operation": can it hurt crawling and ranking if google notices big volumes of content (unique content)? Please advice
Technical SEO | | TruvoDirectories0 -
Landing Page URL Structure
We are finally setting up landing pages to support our PPC campaigns. There has been some debate internally about the URL structure. Originally we were planning on URL's like: domain.com /california /florida /ny I would prefer to have the URL's for each state inside a "state" folder like: domain.com /state /california /florida /ny I like having the folders and pages for each state under a parent folder to keep the root folder as clean as possible. Having a folder or file for each state in the root will be very messy. Before you scream URL rewriting :-). Our current site is still running under Classic ASP which doesn't support URL rewriting. We have tried to use HeliconTech's ISAPI rewrite module for IIS but had to remove it because of too many configuration issues. Next year when our coding to MVC is complete we will use URL rewriting. So the question for now: Is there any advantage or disadvantage to one URL structure over the other?
Technical SEO | | briankb0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0