Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to allow bots to crawl all but WP-content
- 
					
					
					
					
 Hello, I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others. User-agent: * 
 Disallow: /wp-admin/
 Disallow: /wp-includes/
 Disallow: /wp-content/User-agent: GoogleBot 
 Allow: /User-agent: GoogleBot-Mobile 
 Allow: /User-agent: GoogleBot-Image 
 Allow: /User-agent: Bingbot 
 Allow: /User-agent: Slurp 
 Allow: /
- 
					
					
					
					
 Thank you for the help, Gaston! 
- 
					
					
					
					
 Yeap, with that you are allowing every file ending with that extension 
- 
					
					
					
					
 Can I do so with: Allow: *.jpg Allow: *.png 
- 
					
					
					
					
 Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress. These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable. Best 
- 
					
					
					
					
 Oh god, my mistake! 
 Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself. 
 Thanks for pointing it out!
- 
					
					
					
					
 Gaston, Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder. Thanks! 
- 
					
					
					
					
 Hi Tom, Yes, this config will allow images to be crawled, No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure. Hope it helps. 
 Best luck.
 GR
- 
					
					
					
					
 Hi Gaston, I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still? Thanks! 
- 
					
					
					
					
 Awesome. Thanks, Gaston! 
- 
					
					
					
					
 Yes it does. As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing. 
 Check the image i've attached.Hope it helps. 
 Best luck
 GR
- 
					
					
					
					
 Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well? Best 
- 
					
					
					
					
 Hi Tom! That Robots.txt config is pretty redundant. 
 To acheive what you what, thy this:User-agent: * 
 Disallow: /wp-admin/
 Disallow: /wp-includes/
 Disallow: /wp-content/
 Allow: *.js
 Allow: *.cssJust 3 things to note here: 
 1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
 2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
 3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu. Remember that by default google considers that you are not blocking nothing. 
 More info here: The web robots.tat pageHope it helps. 
 Best luck.
 GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Personalized Content Vs. Cloaking
 Hi Moz Community, I have a question about personalization of content, can we serve personalized content without being penalized for serving different content to robots vs. users? If content starts in the same initial state for all users, including crawlers, is it safe to assume there should be no impact on SEO because personalization will not happen for anyone until there is some interaction? Thanks, Technical SEO | | znotes0
- 
		
		
		
		
		
		Duplicate Content Issues with Pagination
 Hi Moz Community, We're an eCommerce site so we have a lot of pagination issues but we were able to fix them using the rel=next and rel=prev tags. However, our pages have an option to view 60 items or 180 items at a time. This is now causing duplicate content problems when for example page 2 of the 180 item view is the same as page 4 of the 60 item view. (URL examples below) Wondering if we should just add a canonical tag going to the the main view all page to every page in the paginated series to get ride of this issue. https://www.example.com/gifts/for-the-couple?view=all&n=180&p=2 https://www.example.com/gifts/for-the-couple?view=all&n=60&p=4 Thoughts, ideas or suggestions are welcome. Thanks Technical SEO | | znotes0
- 
		
		
		
		
		
		SEO for User Authenticated Content
 Hi Everyone - I have a potential client who is seeking SEO for a site that contains about 95% of content only accessible through user authentication . Does anyone have tips for getting this indexed without having to open it up to the public? I was considering adding "snippets" into the robots.txt or creating an additional page with snippets linking to the login page. I'd appreciate any thoughts! Thanks! Technical SEO | | manutx0
- 
		
		
		
		
		
		Duplicate Content
 We have a ton of duplicate content/title errors on our reports, many of them showing errors of: http://www.mysite.com/(page title) and http://mysite.com/(page title) Our site has been set up so that mysite.com 301 redirects to www.mysite.com (we did this a couple years ago). Is it possible that I set up my campaign the wrong way in SEOMoz? I'm thinking it must be a user error when I set up the campaign since we already have the 301 Redirect. Any advice is appreciated! Technical SEO | | Ditigal_Taylor0
- 
		
		
		
		
		
		Does Google Bot accept Cookies
 I am working with a per page results refinement that stores a cookie on the users computer and then keeps that same per page as the user goes around the site. I was just wondering if that was true for Google bot or Bing bot as well. Will they keep the cookie or would they not be able to accept it. I just want to know as I dont want different urls created if they can keep the cookie. Thanks! Technical SEO | | Gordian0
- 
		
		
		
		
		
		Duplicate content and http and https
 Within my Moz crawl report, I have a ton of duplicate content caused by identical pages due to identical pages of http and https URL's. For example: http://www.bigcompany.com/accomodations https://www.bigcompany.com/accomodations The strange thing is that 99% of these URL's are not sensitive in nature and do not require any security features. No credit card information, booking, or carts. The web developer cannot explain where these extra URL's came from or provide any further information. Advice or suggestions are welcome! How do I solve this issue? THANKS MOZZERS Technical SEO | | hawkvt10
- 
		
		
		
		
		
		How to tell if PDF content is being indexed?
 I've searched extensively for this, but could not find a definitive answer. We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines. When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed? I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct? Since WordPress has you upload PDFs like they are an image could this be causing the problem? Would it make sense to take the time and extract all of the PDF content to html? Thanks for any assistance, this has been driving me crazy. Technical SEO | | zazo0
- 
		
		
		
		
		
		Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
 The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs. Technical SEO | | surveygizmo0
 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				