Crawling password protected sites such as dev or staging areas to look at sites b4 going live ?
-
Hi
Ive instructed clients to password protect dev areas so dont get crawled and indexed but how do we set up Moz crawl software so we can crawl theses sites for final check of any issues before going live ?
Is there an option i havnt seen to add logins/passwords for crawl software to access ?
cheers
dan
-
ok thanks Chiaryn
is that the actual name of the moz crawler (to allow in Robots) simply rogerbot ? or any other characters etc ?
Also is it not the case that even when blocked by robots.txt G can still crawl/index it once password removed, think i read few comments somewhere on Moz that can still happen somehow ?
Please advise asap ?
Many Thanks
Dan
-
Hey Dan,
Unfortunately, our crawler is not able to access password protected content on your site. If you create a staging subdomain that is not password protected, you could use the robots.txt file to allow rogerbot and block other crawlers, but I'm afraid our crawler will not crawl anything that a normal search engine crawl would not be able to crawl so we cannot crawl password protected pages.
I hope this helps.
Chiaryn
-
i dont suppose either of you are able to help at all with this related question:
http://moz.com/community/q/site-crawl-errors-download-list-of-all-urls
-
i dont suppose either of you are able to help at all with this related question:
http://moz.com/community/q/site-crawl-errors-download-list-of-all-urls
-
Hi Andy
Screaming Frog does have password access feature for your info i have just tried it
All Best
Dan
-
Thanks Matt
I have got screaming frog and can confirm that it has password access feature, but i really want Moz to be able to access too, i would have thought they should have this option somewhere. Are you saying Moz crawls have more info than SF (re 'moz level' analysis) ?
Dev site better password prtected than robots arnt they i think ?
Cheers
Dan
-
Hi Dan
I was about to ask the exact same question, so will keep an eye out for an answer.
I hope it is possible, but I couldn't work it out.
-
I don't know if there's a way to do this in Moz but you could always get Screaming Frog & tell it to ignore robots.txt - that will definitely crawl it. You can check titles, descriptions, canonicals, H1s, etc. that way. It doesn't give the Moz level analysis but it's a start that def works. You can also see if you have parameter issues that way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Need help fixing a duplicate content issue for my website. The moz crawl is show OMG my website with https:// and https://www. But I have never used the url https:// so I don’t understand why moz is showing this
Moz is showing my url with two different starts. Https:// and then the one I use https://www. The problem is I don’t think I have ever used the url without the www. at the start. How do I fix this?
Moz Bar | | jdp_uk0 -
How To Stop Moz Crawl From Prepending /blog/ on all our site urls that it crawls
Hello, At some time in the past our WP site had urls like this: www.oursite.com/blog/post-title-pretty-link The site has not used that url structure for quite some time, but Moz crawl is still hitting every post with /blog/prepended and as a result is generating thousands of 404s. When the /blog/ is removed from the url, then the urls work fine. Where are those old urls being stored and how can we update them? How do we address this issue? Any assistance will be appreciated. Thanks!
Moz Bar | | dbcooper1 -
Moz is only crawling 2 pages
Hi, I found a similar thread, but it did not provide a clear-cut answer. We have had this campaign running for over a year, and we are always adding content to the website, but Moz is only ever able to crawl 2 pages, Screaming Frog only picks up 12, but I know there is a lot more than that. None of our pages are set to no-index, so I do not know what is causing this. Welcoming any ideas/solutions. Thanks
Moz Bar | | GavinAdv0 -
What are the best tools to help analyse on page optimisation for pages on development server and not currently live
currently using seo quake and moz tool bar but wondered if there is a better suggestion that will look at pages that are only accessible on the internal network on development server. Very restricted in what can be installed
Moz Bar | | Dan-Moz0 -
Www.site.com linking to pages www10.site.com
The root domain of the website in question is www.site.com but all subpages are on the subdomain www10.site.com (I'm pretty sure it's a subdomain, at least, used for load balancing?). A funny thing happens on this site with the moz toolbar. I visit a subpage, www10.site.com/articles/articletopic1 That page has a lot of links on it, all of them visibly going to the subdomain www10.site.com. However, the moz toolbar shows some of them as Internal links and most of them as External links. As far as I can tell, there is no real rhyme or reason to the difference between the links that are highlighted as Internal vs. External. The link structures vary greatly: Some are properly structured www10.site.com/blogs/category
Moz Bar | | Motava
And some are poor like www10.site.com/articles/show_articles.php?section=category1 So a couple questions here: Does this subdomain www10 have a detriment on the rankings of subpages?
What could possibly cause the internal links on these subpages to be highlighted as external pages with the moz toolbar?1 -
Does SEOMOZ bot not know where to look for AJAX site snapshots?
snapshot://www.fubo.tv/?escaped_fragment=video/Nigeria_out_to_stop_Messi page: http://www.fubo.tv/video/Nigeria_out_to_stop_Messi
Moz Bar | | FuboTV0 -
Confusing Moz Crawl?
Hi there, I am not sure if I am missing on something but the moz crawls are rather confusing. After singing in I have received 11 emails with crawls and today I have received again new, When I go to check there to the dashboard it shows 26 pages with issues. When I scroll down I see the pages with issue. Then when I click on the first page listed, to view the issues it says this: Rel Canonical
Moz Bar | | Rebeca1
Using rel=canonical suggests to search engines which URL should be seen as canonical. For this site: http://villasdiani.com/ but we have sorted out the canonical issues a long time ago. Is this a wrong information or is it really true that we do not specify the canonical for our site? Then the second page with issue is there listed http://villasdiani.com/beach-villas/ and it says: Duplicate Page Title
You should use unique titles for your different pages to ensure that they describe each page uniquely and don't compete with each other for keyword relevance. But it does not point out which page is duplicate with this one! I do not have any other page named the same way. It also says in Issues overview 26pages with issues, but it shows on the bottom only 5 under and when I click on view more it brings me to high priority issues where is 0. The most is freaking me out this report: When I click on links, there are listed on the bottom the pages with highest authority among which I found this http://villasdiani.com/db I have never created this kind of page! Funny enough when I click on it it really open that page! How this can be??? In issues overview it also shows on the bottom, right corner 11 page with duplicate content but when I click on it to review it it brings me to high priority issues windows where is not displayed anything Can somebody advice me regarding of this. I have sign up here to learn and sort out the problems with the site but so far I am only getting more confused here. Thank you very much for looking into this.0 -
Moz Dupe content crawl anomaly
Hi Moz has completed a crawl for a site i'm working on which also has a development area (hence with lots of dupe content) on a sub domain (and this dev area hasn't been hidden from crawlers via password, robots, gwt etc etc). Moz dupe content report is not showing any of these urls though even though my campaign setting is on 'root' domain so i would have thought report should be listing the subdomain urls as dupe content (because they are dupe content). Any ideas ? Cheers Dan
Moz Bar | | Dan-Lawrence0