Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Best way to block a search engine from crawling a link?
-
If we have one page on our site that is is only linked to by one other page, what is the best way to block crawler access to that page?
I know we could set the link to "nofollow" and that would prevent the crawler from passing any authority, and we can set the page to "noindex" to prevent it from appearing in search results, but what is the best way to prevent the crawler from accessing that one link?
-
Hi there,
I'm assuming you are trying to do pagerank sculpting (or something related..) - which was made a little more tough in recent years. I'll base my answer around this assumption, so feel free to correct me if this isn't the case.
There are several methods to make a link uncrawlable:
- AJAX - Googlebot will not read any calls through AJAX. If you can load your link through an external call, it would be completely hidden.
- Javascript - Obfuscate links with Javascript that masks the link. You can do any number of solutions here, including using tags with a title of your URL, which upon clicking, goes that that URL. Simple and effective.
- Redirects - I haven't tested this last idea, and it may not work. You might be able to redirect to another page in your website, which is then set to not be indexed. Then redirect to the intended page through a query string. In theory it should work, but obviously not as good as the previous methods I described.
Let me know if you have questions. I'd be glad to help further.
Cheers!
-
Noindex/nofollow should be good enough, but if you want to be sure it doesn't get indexed, you could can also include <meta name="robots" content="NOINDEX, NOFOLLOW"> in the head section of the page to be blocked. You can also exclude the page in your robots.txt file. </meta name="robots">
You can find a simple robots.txt generator in Google Webmaster Tools if you need to block particular pages or directories. The robots.txt file should be in the root directory of your site and look something like this:
User-agent: * Disallow: /file-you-want-to-hide.html
You can also request removal of specific URLs in Webmaster Tools if it has already been indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
New Subdomain & Best Way To Index
We have an ecommerce site, we'll say at https://example.com. We have created a series of brand new landing pages, mainly for PPC and Social at https://sub.example.com, but would also like for these to get indexed. These are built on Unbounce so there is an easy option to simply uncheck the box that says "block page from search engines", however I am trying to speed up this process but also do this the best/correct way. I've read a lot about how we should build landing pages as a sub-directory, but one of the main issues we are dealing with is long page load time on https://example.com, so I wanted a kind of fresh start. I was thinking a potential solution to index these quickly/correctly was to make a redirect such as https://example.com/forward-1 -> https:sub.example.com/forward-1 then submit https://example.com/forward-1 to Search Console but I am not sure if that will even work. Another possible solution was to put some of the subdomain links accessed on the root domain say right on the pages or in the navigation. Also, will I definitely be hurt by 'starting over' with a new website? Even though my MozBar on my subdomain https://sub.example.com has the same domain authority (DA) as the root domain https://example.com? Recommendations and steps to be taken are welcome!
Intermediate & Advanced SEO | | Markbwc0 -
What is best practice for "Sorting" URLs to prevent indexing and for best link juice ?
We are now introducing 5 links in all our category pages for different sorting options of category listings.
Intermediate & Advanced SEO | | lcourse
The site has about 100.000 pages and with this change the number of URLs may go up to over 350.000 pages.
Until now google is indexing well our site but I would like to prevent the "sorting URLS" leading to less complete crawling of our core pages, especially since we are planning further huge expansion of pages soon. Apart from blocking the paramter in the search console (which did not really work well for me in the past to prevent indexing) what do you suggest to minimize indexing of these URLs also taking into consideration link juice optimization? On a technical level the sorting is implemented in a way that the whole page is reloaded, for which may be better options as well.0 -
Crawled page count in Search console
Hi Guys, I'm working on a project (premium-hookahs.nl) where I stumble upon a situation I can’t address. Attached is a screenshot of the crawled pages in Search Console. History: Doing to technical difficulties this webshop didn’t always no index filterpages resulting in thousands of duplicated pages. In reality this webshops has less than 1000 individual pages. At this point we took the following steps to result this: Noindex filterpages. Exclude those filterspages in Search Console and robots.txt. Canonical the filterpages to the relevant categoriepages. This however didn’t result in Google crawling less pages. Although the implementation wasn’t always sound (technical problems during updates) I’m sure this setup has been the same for the last two weeks. Personally I expected a drop of crawled pages but they are still sky high. Can’t imagine Google visits this site 40 times a day. To complicate the situation: We’re running an experiment to gain positions on around 250 long term searches. A few filters will be indexed (size, color, number of hoses and flavors) and three of them can be combined. This results in around 250 extra pages. Meta titles, descriptions, h1 and texts are unique as well. Questions: - Excluding in robots.txt should result in Google not crawling those pages right? - Is this number of crawled pages normal for a website with around 1000 unique pages? - What am I missing? BxlESTT
Intermediate & Advanced SEO | | Bob_van_Biezen0 -
How can I make sure Google is crawling a link from an iframe (video)?
Do they crawl backlinks from an iframe example from a Youtube video embedded in a blog post? TIA!
Intermediate & Advanced SEO | | zpm20140 -
What is the best way to get anchor text cloud in line?
So I am working on a website, and it has been doing seo with keyword links for a a few years. The first branded terms comes in a 7% in 10th in the list on Ahefs. The keyword terms are upwards of 14%. What is the best way to get this back in line? It would take several months to build keyword branded terms to make any difference - but it is doable. I could try link removal, but less than 10% seem to actually get removed -- which won't make a difference. The disavow file doesn't really seem to do anything either. What are your suggestions?
Intermediate & Advanced SEO | | netviper0 -
Best way to get pages indexed fast?
Any suggestion on best ways to get new sites pages indexed? Was thinking getting high pr inbound links on fiverr but always a little risky right? Thanks for your opinions.
Intermediate & Advanced SEO | | mweidner27820 -
Best possible linking on site with 100K indexed pages
Hello All, First of all I would like to thank everybody here for sharing such great knowledge with such amazing and heartfelt passion.It really is good to see. Thank you. My story / question: I recently sold a site with more than 100k pages indexed in Google. I was allowed to keep links on the site.These links being actual anchor text links on both the home page as well on the 100k news articles. On top of that, my site syndicates its rss feed (Just links and titles, no content) to this page. However, the new owner made a mess, and now the site could possibly be seen as bad linking to my site. Google tells me within webmasters that this particular site gives me more than 400K backlinks. I have NEVER received one single notice from Google that I have bad links. That first. But, I was worried that this page could have been the reason why MY site tanked as bad as it did. It's the only source linking so massive to me. Just a few days ago, I got in contact with the new site owner. And he has taken my offer to help him 'better' his site. Although getting the site up to date for him is my main purpose, since I am there, I will also put effort in to optimizing the links back to my site. My question: What would be the best to do for my 'most SEO gain' out of this? The site is a news paper type of site, catering for news within the exact niche my site is trying to rank. Difference being, his is a news site, mine is not. It is commercial. Once I fix his site, there will be regular news updates all within the niche we both are in. Regularly as in several times per day. It's news. In the niche. Should I leave my rss feed in the side bars of all the content? Should I leave an achor text link on the sidebar (on all news etc.) If so: there can be just one keyword... 407K pages linking with just 1 kw?? Should I keep it to just one link on the home page? I would love to hear what you guys think. (My domain is from 2001. Like a quality wine. However, still tanked like a submarine.) ALL SEO reports I got here are now Grade A. The site is finally fully optimized. Truly nice to have that confirmation. Now I hope someone will be able to tell me what is best to do, in order to get the most SEO gain out of this for my site. Thank you.
Intermediate & Advanced SEO | | richardo24hr0