How Google Carwler Cached Orphan pages and directory?
-
I have website www.test.com
I have made some changes in live website and upload it to "demo" directory (which is recently created) for client approval.
Now, my demo link will be www.test.com/demo/
I am not doing any type of link building or any activity which pass referral link to www.test.com/demo/
Then how Google crawler find it and cached some pages or entire directory?
Thanks
-
Try putting the URL into Google and see if you find any pages linking to it.
I knew a company that created a test site that was a copy of a live site (made with a specific hosted CMS). Didn't exclude the test site in robots because "we all know we won't link to it so it'll be ok". Site got indexed, and it was because a person at the company was having problems with the implementation of the test site, went to the help forum (which person didn't think would be indexed) and posted the URL to the test site.
I found the above by just putting in the URL of the test site into Google, and I saw the post in the help desk. You might try the same to see if somehow there is a rogue link.
-
Is google crawling our mails?
Is it possible?
-
Yup, correct.
I was certain I'd replied to this
Anyway, you ever notice how the ads in gmail are always relevant to the content of your emails? Google are totally reading them
-
The <conspiracy hat="">side of things was him commenting that Google is sometimes accused of processing everything in Gmail and could have possibly pulled your link to the demo directory from that.</conspiracy>
-
Hi Barry,
Yes, We were used Gmail for reporting.
Is it make any sense??
-
<conspiracy-hat></conspiracy-hat>
Did either you or your client use gmail when you sent him the demo link?
Regardless, Dan's advice to noindex and block the directory from spiders is the future when doing development work.
-
Hi JoelHit,
NO, There is not any single refferal link to "Demo" directory from entire website and also from third party websites.
I am aware about Google Crawling and Indexing Systems.
Thanks.
-
Hi Thetjo,
I know about it.
My question is that how Google Crawl it without any referral link?
Thanks.
-
Hi Dan,
No, i am not exclude "demo" directory from robots.txt for any search engine.
I am not using wordpress its simple stattic HTML website (Not using any type of CMS).
-
Did this actually happen or are we talking about a hypothetical situation here? It could be that there is a link to the demo directory you've overlooked? Has the /demo folder perhaps been used in the past and there were still old links to it?
As a meta-solution to this problem: prevent crawlers and nosy people from accessing the content by adding a .htpasswd login to the area used for client approval.
-
Did you block the /demo/ directory in your robots.txt file? This is step number one to try and ensure they don't get crawled. Also, are you using wordpress? If so, wordpress automatically pings search engines when you add a post and if you use the common sitemap plugin, when it creates the sitemap it submits it automatically to Google, so that's another way Google could have found it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Alternate page with proper canonical tag Status: Excluded in Google webmaster tools.
In Google Webmaster Tools, I have a coverage issue. I am getting this error message: Alternate page with proper canonical tag Status: Excluded. It gives the below blog post page as an example. Any idea how to resolve? At one time, I was using handl utm grabber, but the plugin is deactivated on my website. https://www.savacations.com/turrialba-costa-ricas-garden-city/?utm_source=deleted&utm_medium=deleted&utm_term=deleted&utm_content=deleted&utm_campaign=deleted&gclid=deleted5.
Intermediate & Advanced SEO | | Alancito0 -
Should I worry about rendering problems of my pages in google search console fetch as google?
Some elements are not properly shown when I preview our pages in search console (fetch as google), e.g.
Intermediate & Advanced SEO | | lcourse
google maps, css tables etc. and some parts are not showing up since we load them asynchroneously for best page speed. Is this something should pay attention to and try to fix?0 -
Google Indexing & Caching Some Other Domain In Place of Mine-Lost Ranking -Sucuri.net Found nothing
Again I am facing same Problem with another wordpress blog. Google has suddenly started to Cache a different domain in place of mine & caching my domain in place of that domain. Here is an example page of my site which is wrongly cached on google, same thing happening with many other pages as well - http://goo.gl/57uluq That duplicate site ( protestage.xyz) is showing fully copied from my client's site but showing all pages as 404 now but on google cache its showing my sites. site:protestage.xyz showing all pages of my site only but when we try to open any page its showing 404 error My site has been scanned by sucuri.net Senior Support for any malware & there is none, they scanned all files, database etc & there is no malware found on my site. As per Sucuri.net Senior Support It's a known Google bug. Sometimes they incorrectly identify the original and the duplicate URLs, which results in messed ranking and query results. As you can see, the "protestage.xyz" site was hacked, not yours. And the hackers created "copies" of your pages on that hacked site. And this is why they do it - the "copy" (doorway) redirects websearchers to a third-party site [http://www.unmaskparasites.com/security-report/?page=protestage.xyz](http://www.unmaskparasites.com/security-report/?page=protestage.xyz) It was not the only site they hacked, so they placed many links to that "copy" from other sites. As a result Google desided that that copy might actually be the original, not the duplicate. So they basically hijacked some of your pages in search results for some queries that don't include your site domain. Nonetheless your site still does quite well and outperform the spammers. For example in this query: [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 But overall, I think both the Google bug and the spammy duplicates have the negative effect on your site. We see such hacks every now and then (both sides: the hacked sites and the copied sites) and here's what you can do in this situation: It's not a hack of your site, so you should focus on preventing copying the pages: 1\. Contact the protestage.xyz site and tell them that their site is hacked and that and show the hacked pages. [https://www.google.com/search?q=](https://www.google.com/search?q=)%22We+offer+personalized+sweatshirts%22%2C+every+bride#q=%22GenF20+Plus+Review+Worth+Reading+If+You+are+Planning+to+Buy+It%22 Hopefully they clean their site up and your site will have the unique content again. Here's their email flang.juliette@yandex.com 2\. You might want to send one more complain to their hosting provider (OVH.NET) abuse@ovh.net, and explain that the site they host stole content from your site (show the evidence) and that you suspect the the site is hacked. 3\. Try blocking IPs of the Aruba hosting (real visitors don't use server IPs) on your site. This well prevent that site from copying your site content (if they do it via a script on the same server). I currently see that sites using these two IP address: 149.202.120.102\. I think it would be safe to block anything that begins with 149.202 This .htaccess snippet should help (you might want to test it) #-------------- Order Deny,Allow Deny from 149.202.120.102 #-------------- 4\. Use rel=canonical to tell Google that your pages are the original ones. [https://support.google.com/webmasters/answer/139066?hl=en](https://support.google.com/webmasters/answer/139066?hl=en) It won't help much if the hackers still copy your pages because they usually replace your rel=canonical with their, so Google can' decide which one is real. But without the rel=canonical, hackers have more chances to hijack your search results especially if they use rel=canonical and you don't. I should admit that this process may be quite long. Google will not return your previous ranking overnight even if you manage to shut down the malicious copies of your pages on the hacked site. Their indexes would still have some mixed signals (side effects of the black hat SEO campaign) and it may take weeks before things normalize. The same thing is correct for the opposite situation. The traffic wasn't lost right after hackers created the duplicates on other sites. The effect build up with time as Google collects more and more signals. Plus sometimes they run scheduled spam/duplicate cleanups of their index. It's really hard to tell what was the last drop since we don't have access to Google internals. However, in practice, if you see some significant changes in Google search results, it's not because of something you just did. In most cases, it's because of something that Google observed for some period of time. Kindly help me if we can actually do anything to get the site indexed properly again, PS it happened with this site earlier as well & that time I had to change Domain to get rid of this problem after I could not find any solution after months & now it happened again. Looking forward for possible solution Ankit
Intermediate & Advanced SEO | | killthebillion0 -
Google Search Analytics How to Get Search Keywords for a Page?
How do I get the keywords coming into a page on the new Google Webmaster Tools Search Analytics? Used to be there in the old version. You would just view your most popular urls and when you expanded the urls you would see the terms driving the traffic. How do I see the most popular keyword queries for a given page in the new tool? Alternatively can I still use the old tool somehow?
Intermediate & Advanced SEO | | K-WINTER0 -
Will I lose traffic from Google for re-directing a page?
I’m currently planning to a retire a discontinued product and put a 301 redirect to a related product (although not identical). The thing is, I’m still getting significant traffic from people searching for the old product by name. Would Google send this traffic to the new pages via the re-direct? Is Google likely to display the new page in place of the old page for similar queries or will it serve other content? I’d like to answer this question so that I can decide between the two following approaches: 1) Retiring the old page immediately and putting a 301 redirect to the new related pages. This will have the advantage of transferring the value of any link signals / referring traffic. Traffic will also land on the new pages directly without having to click through from another page. We would have a dynamic message telling users that the old product had been retired depending on whether they had visited out site before. 2) Keep the old product pages temporarily so that we don’t lose the traffic from the search engines. We would then change the old pages to advise users that the old product was now retired, but that we have other products that might solve their problems. When this organic traffic decreases over time, then we will proceed with the re-direct as above. I am worried though that the old product pages might outrank the new product pages. I’d really appreciate some advice with this. I’ve been reading lots of articles, but it seems like there are different opinions on this. I understand that I will lose between 10% - 15% of page rank as per the Matt Cutts video.
Intermediate & Advanced SEO | | RG_SEO0 -
Orphan My Home Page
I want to orphan a home page on a site that I own so that the start page becomes site.com/home (or whatever) as opposed to site.com/. I need to accomplish this without associating the former with the latter...meaning no 301. Since this will not be a temporary move, 302 does not seem to work either. And even if I could use it, I don't want to credit / with anything from /home. Is there any way to default the Apache handler to /home without rewriting the URL? Or is there any other solution? The bottom line is, at the end of the day, I need Google to forget about / and anything associated with it, without interrupting the user experience when they request /. Thanks in advance.
Intermediate & Advanced SEO | | NTGproducts0 -
Rel canonical on every page, pointing to home page
I've just started working with a client and have been surprised to find that every page of their site (using Concrete5 CMS) has a rel=canonical pointing to their home page. I'm feeling really dumb, because this seems like a fatal flaw which would keep Google from ranking any page other than the home page... but when I look at Google Analytics, Content > Site Content > Landing Pages, using Secondary Dimension = Source, it seems that Google is delivering users to numerous pages on their site. Can anyone help me out?! Thanks very much!!
Intermediate & Advanced SEO | | measurableROI0 -
Will Google Visit Non-Canonicalized Page Again and Return Its Page's Original Ranking?
I have 2 questions about canonicalization. 1. Will Google ever visit Page A again if after it has been canonicalized to Page B? 2. If Google will still visit Page A and found that it is not canonicalizing to Page B already, will the original rankings and traffic of Page A returned to the way before it's canonicalized? Thanks.
Intermediate & Advanced SEO | | globalsources.com0