Blocking out specific URLs with robots.txt
-
I've been trying to block out a few URLs using robots.txt, but I can't seem to get the specific one I'm trying to block. Here is an example.
I'm trying to block
but not block
It seems if it setup my robots.txt as so..
Disallow: /cats
It's blocking both urls. When I crawl the site with screaming flog, that Disallow is causing both urls to be blocked. How can I set up my robots.txt to specifically block /cats? I thought it was by doing it the way I was, but that doesn't seem to solve it.
Any help is much appreciated, thanks in advance.
-
Do not play with Robots as it may block out series of pages and folders out of index
Correct command as stated by Lesley is /cats/ . Refer official documentation
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
-
You can either use /cats/ or /cats/* that should just block the cats folder and not the other folder. Note the first use is the preferred one.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt was set to disallow for 14 days
We updated our website and accidentally overwrote our robots file with a version that prevented crawling ( "Disallow: /") We realized the issue 14 days later and replaced after our organic visits began to drop significantly and we quickly replace the robots file with the correct version to begin crawling again. With the impact to our organic visits, we have a few and any help would be greatly appreciated - Will the site get back to its original status/ranking ? If so .. how long would that take? Is there anything we can do to speed up the process ? Thanks
Intermediate & Advanced SEO | | jc42540 -
URL Structure for geo location for specific page
On hackerearth.com/challenges page, there is an option to select languages. This option is in the footer. Once you select the language the url changes. Ex - if we select French, the URL changes to hackereath.com/fr/challenges. In case we decide to change the URL of this page with Geo, what should be the URL structure which accommodates languages as well. My research says that it would good to keep the url like domainname.com/page/language.
Intermediate & Advanced SEO | | Rajnish_HE0 -
Specific KW question...
Hi, I have this site: http://www.aerlawgroup.com. It's ranking very well overall for all targeted KWs. However, I have seen a drop for one main KW: "Los Angeles criminal defense attorney." It currently ranks #8 (it used to be as high as #2). What's interesting is that for similar (yet slightly less competitive KWs, he ranks much better - "Los Angeles Criminal Defense Lawyer." I'm not trying to be greedy with rankings, but I would love feedback and/or tips regarding any issues that could be contributing to this drop. Thanks.
Intermediate & Advanced SEO | | mrodriguez14400 -
Using folder blocked by robots.txt before uploaded to indexed folder - is that OK?
I have a folder "testing" within my domain which is a folder added to the robots.txt. My web developers use that folder "testing" when we are creating new content before uploading to an indexed folder. So the content is uploaded to the "testing" folder at first (which is blocked by robots.txt) and later uploaded to an indexed folder, yet permanently keeping the content in the "testing" folder. Actually, my entire website's content is located within the "testing" - so same URL structure for all pages as indexed pages, except it starts with the "testing/" folder. Question: even though the "testing" folder will not be indexed by search engines, is there a chance search engines notice that the content is at first uploaded to the "testing" folder and therefore the indexed folder is not guaranteed to get the content credit, since search engines see the content in the "testing" folder, despite the "testing" folder being blocked by robots.txt? Would it be better that I password protecting this "testing" folder? Thx
Intermediate & Advanced SEO | | khi50 -
Image URL Change Catastrophe
We have a site with over 3mm pages indexed, and an XML sitemap with over 12mm images (312k indexed at peak). Last week our traffic dropped off a cliff. The only major change we made to the site in that time period was adding a DNS record for all of our images that moved them from a SoftLayer Object Storage domain to a subdomain of our site. The old URLs still work, but we changed all the links from across our site to the new subdomain. The big mistake we made was that we didn't update our XML sitemap to the new URLs until almost a week after the switch (totally forgot that they were served from a process with a different config file). We believe this was the cause of the issue because: The pages that dropped in traffic were the ones where the images moved, while other pages stayed more or less the same. We have some sections of our property where the images are, and have always been, hosted by Amazon and their rankings didn't crater. Same with pages that do not have images in the XML sitemap (like list pages). There wasn't a change in geographic breakdown of our traffic, which we looked at because the timing was around the same time as Pigeon. There were no warnings or messages in Webmaster Tools, to indicate a manual action around something unrelated. The number of images indexed in our sitemap according Webmaster Tools dropped from 312k to 10k over the past week. The gap between the change and the drop was 5 days. It takes Google >10 to crawl our entire site, so the timing seems plausible. Of course, it could be something totally unrelated and just coincidence, but we can't come up with any other plausible theory that makes sense given the timing and pages affected. The XML sitemap was updated last Thursday, and we resubmitted it to Google, but still no real change. Anyone had a similar experience? Any way to expedite the climb back to normal traffic levels? Screen%20Shot%202014-07-29%20at%203.38.34%20PM.png
Intermediate & Advanced SEO | | wantering0 -
URL stucture like Zappos?
Hi, My site structure looks like this. domainname.com/nl/holidayhouses/villa-costa
Intermediate & Advanced SEO | | remcozwaan
domainname.com/nl/apartments/apartment-caifem ect. I just went to zappos to research the site and het notice me that zappos.com has no directories. If i implement this my structure looks like this. domainname.com/nl/holidayhouse-villa-costa
domainname.com/nl/apartments-apartment-caifem Is this a better approach? Ciao, Remco0 -
Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?
I currently have a site that was recently restructured, causing much of its content to be reposted, creating new URL's for each page. To avoid duplicates, all of the existing pages were added to the robots file. That said, it has now been over a week - I know Google has recrawled the site - and when I search for term X, it is stil the old page that is ranking, with the new one nowhere to be seen. I'm assuming it's a cached version, but why are so many of the old pages still appearing in the index? Furthermore, all "tags" pages (it's a Q&A site, like this one) were also added to the robots a few months ago, yet I think they are all still appearing in the index. Anyone got any ideas about why this is happening, and how I can get my new pages indexed?
Intermediate & Advanced SEO | | corp08030 -
My URLs are a mess!
Hi all, I am having some SEO done on my website and I have been asked to tidy up my URLs. They show the word 'brand' or 'item' and an ID number in every one. http://www.societyboardshop.co.uk/brand/Girl-Skateboards/153/ http://www.societyboardshop.co.uk/item/Girl%20Skateboards%20Guy%20Mariano%20OG%20Guy%20Skateboards/898/ My developer says that we cannot remove these words as they 'form part of a routing table' for each url. How do I fix these URLs? Many thanks in advance. Paul.
Intermediate & Advanced SEO | | Paul530