How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Category Page as Shopping Aggregator Page
Hi, I have been reviewing the info from Google on structured data for products and started to ponder.
Intermediate & Advanced SEO | | Alexcox6
https://developers.google.com/search/docs/data-types/products Here is the scenario.
You have a Category Page and it lists 8 products, each products shows an image, price and review rating. As the individual products pages are already marked up they display Rich Snippets in the serps.
I wonder how do we get the rich snippets for the category page. Now Google suggest a markup for shopping aggregator pages that lists a single product, along with information about different sellers offering that product but nothing for categories. My ponder is this, Can we use the shopping aggregator markup for category pages to achieve the coveted rich results (from and to price, average reviews)? Keen to hear from anyone who has had any thoughts on the matter or had already tried this.0 -
Should I use noindex or robots to remove pages from the Google index?
I have a Magento site and just realized we have about 800 review pages indexed. The /review directory is disallowed in robots.txt but the pages are still indexed. From my understanding robots means it will not crawl the pages BUT if the pages are still indexed if they are linked from somewhere else. I can add the noindex tag to the review pages but they wont be crawled. https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html Should I remove the robots.txt and add the noindex? Or just add the noindex to what I already have?
Intermediate & Advanced SEO | | Tylerj0 -
After adding a ssl certificate to my site I encountered problems with duplicate pages and page titles
Hey everyone! After adding a ssl certificate to my site it seems that every page on my site has duplicated it's self. I think that is because it has combined the www.domainname.com and domainname.com. I would really hate to add a rel canonical to every page to solve this issue. I am sure there is another way but I am not sure how to do it. Has anyone else ran into this problem and if so how did you solve it? Thanks and any and all ideas are very appreciated.
Intermediate & Advanced SEO | | LovingatYourBest0 -
Why would one of our section pages NOT be indexed by Google?
One of our higher traffic section pages is not being indexed by Google. The products that reside on this section page ARE indexed by Google and are on page 1. So why wouldn't the section page be even listed and indexed? The meta title is accurate, meta description is good. I haven't received any notices in Webmaster Tools. Is there a way to check to see if OTHER pages might also not be indexed? What should a small ecom site do to see about getting it listed? SOS in Modesto. Ron
Intermediate & Advanced SEO | | yatesandcojewelers0 -
Create different pages with keyword variations VS. Add keyword variations in 1 page
For searches involving keywords like "lessons", "courses", "classes" I see frequently pages in the top rankings which do not contain the search term in the title tag, despite these terms being quite competitive. It seems that when searching for "classes", google detects that pages about "courses" may be just as relevant. What do you recommend? option 1: creating 10 pages optimized on 10 different keyword variations, each with a significant part of unique content or option 2: one page and dropping throughout the page 10 keyword variations in body and headlines Given that keywords are all synonyms and website has already high domain authority in the niche. thanks
Intermediate & Advanced SEO | | lcourse0 -
Dynamic pages - ecommerce product pages
Hi guys, Before I dive into my question, let me give you some background.. I manage an ecommerce site and we're got thousands of product pages. The pages contain dynamic blocks and information in these blocks are fed by another system. So in a nutshell, our product team enters the data in a software and boom, the information is generated in these page blocks. But that's not all, these pages then redirect to a duplicate version with a custom URL. This is cached and this is what the end user sees. This was done to speed up load, rather than the system generate a dynamic page on the fly, the cache page is loaded and the user sees it super fast. Another benefit happened as well, after going live with the cached pages, they started getting indexed and ranking in Google. The problem is that, the redirect to the duplicate cached page isn't a permanent one, it's a meta refresh, a 302 that happens in a second. So yeah, I've got 302s kicking about. The development team can set up 301 but then there won't be any caching, pages will just load dynamically. Google records pages that are cached but does it cache a dynamic page though? Without a cached page, I'm wondering if I would drop in traffic. The view source might just show a list of dynamic blocks, no content! How would you tackle this? I've already setup canonical tags on the cached pages but removing cache.. Thanks
Intermediate & Advanced SEO | | Bio-RadAbs0 -
Amount of pages indexed for classified (number of pages for the same query)
I've notice that classified usually has a lots of pages indexed and that's because for each query/kw they index the first 100 results pages, normally they have 10 results per page. As an example imagine the site www.classified.com, for the query/kw "house for rent new york" there is the page www.classified.com/houses/house-for-rent-new-york and the "index" is set for the first 100 SERP pages, so www.classified.com/houses/house-for-rent-new-york www.classified.com/houses/house-for-rent-new-york-1 www.classified.com/houses/house-for-rent-new-york-2 ...and so on. Wouldn't it better to index only the 1st result page? I mean in the first 100 pages lots of ads are very similar so why should Google be happy by indexing lots of similar pages? Could Google penalyze this behaviour? What's your suggestions? Many tahnks in advance for your help.
Intermediate & Advanced SEO | | nuroa-2467120 -
Getting a site to rank in both google.com and google.co.uk
I have a client who runs a yacht delivery company. He gets business from the US and the UK but due to the nature of his business, he isn't really based anywhere except in the middle of the ocean somewhere! His site is hosted in the US, and it's a .com. I haven't set any geographical targeting in webmaster tools either. We're starting to get some rankings in google US, but very little in google UK. It's a small site anyway, and he'd prefer not to have too much content on the site saying he's UK based as he's not really based anywhere. Any ideas on how best to approach this?
Intermediate & Advanced SEO | | PerchDigital0