Robots.txt anomaly
-
Hi,
I'm monitoring a site thats had a new design relaunch and new robots.txt added.
Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14).
In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example:
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themesetc, etc
And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ?
Thanks in advance for any help/advice/clarity why this may be happening ?
Cheers
Dan
-
many thanks for that Dan !
-
As far as I thought, the important thing is that your feed shows up in feed readers. Can you subscribe to and view your RSS feed in a variety of different feed readers?
Yes, so long as the ? is utilized only in ways in which would result in duplicate content, or content that would not be desirable to crawl, it will have that effect.
-Dan
-
Many Thanks for your comments Dan !
So it doesnt matter that the feeds not going to be crawled, dont we want feeds to be crawled usually?
Blocking anything with a ? is surely good then isnt it since prevents all the dupe content etc one gets from search results ?
Yes my clients webmaster set it up
-
Hi Dan
I see no reason to disallow the feed like that by default, unless there is some reason I don't know about. But it won't harm anything either.
The second part blocks any URL which begins with a ? (question mark). This would block anything that has a parameter in the URL - most commonly a search word, pagination, filtering settings etc.
As far as I'm aware this is not going to be damaging to the site, but it's not the default setting. Did someone set it up that way for you?
My robots.txt shows the default WordPress settings: http://www.evolvingseo.com/robots.txt
-
Hi Dan
Yes please find below, please can you also confirm if the bottom 2 lines refer to blocking internal search results ?:
Disallow: /feed
Disallow: */feedDisallow: /?
Disallow: /*?Many Thanks
Dan
-
Hi Dan
Can you share the exact line disallowing RSS?
Thanks!
-Dan
-
sorry 1 more question, i see that the webmaster has disallowed the feeds in the robots.txt file is this normal/desirable, i would have thought one would want rss feeds crawled by Google ?
-
nice 1 cheers Jesse !
-
Your assumption is correct. The disallows you listed are directories, not pages. Therefore, anything within the Plugins folder will be disallowed, same with the cache and themes folder.
So you may have multiple files (and I'm sure you do) within each of those folders.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to stop robots.txt restricting access to sitemap?
I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself? The robots.txt page shows User-agent: * Disallow: / And then sitemap: with the correct sitemap link
Technical SEO | | Ad-Rank0 -
Do I have a robots.txt problem?
I have the little yellow exclamation point under my robots.txt fetch as you can see here- http://imgur.com/wuWdtvO This version shows no errors or warnings- http://imgur.com/uqbmbug Under the tester I can currently see the latest version. This site hasn't changed URLs recently, and we haven't made any changes to the robots.txt file for two years. This problem just started in the last month. Should I worry?
Technical SEO | | EcommerceSite0 -
Why is robots.txt blocking URL's in sitemap?
Hi Folks, Any ideas why Google Webmaster Tools is indicating that my robots.txt is blocking URL's linked in my sitemap.xml, when in fact it isn't? I have checked the current robots.txt declarations and they are fine and I've also tested it in the 'robots.txt Tester' tool, which indicates for the URL's it's suggesting are blocked in the sitemap, in fact work fine. Is this a temporary issue that will be resolved over a few days or should I be concerned. I have recently removed the declaration from the robots.txt that would have been blocking them and then uploaded a new updated sitemap.xml. I'm assuming this issue is due to some sort of crossover. Thanks Gaz
Technical SEO | | PurpleGriffon0 -
Two META Robots tags on a page - which will win?
Hi, Does anybody know which meta-robots tag will "win" if there is more than one on a page? The situation:
Technical SEO | | jmueller
our CMS is not very flexible and so we have segments of META-Tags on the page that originate from templates.
Now any author can add any meta-tag from within his article-editor.
The logic delivering the pages does not care if there might be more than one meta-robots tag present (one from template, one from within the article). Now we could end up with something like this: Which one will be regarded by google & co?
First?
Last?
None? Thanks a lot,
Jan0 -
Block or remove pages using a robots.txt
I want to use robots.txt to prevent googlebot access the specific folder on the server, Please tell me if the syntax below is correct User-Agent: Googlebot Disallow: /folder/ I want to use robots.txt to prevent google image index the images of my website , Please tell me if the syntax below is correct User-agent: Googlebot-Image Disallow: /
Technical SEO | | semer0 -
Wordpress Robots.txt Sitemap submission?
Alright, my question comes directly from this article by SEOmoz http://www.seomoz.org/learn-seo/r... Yes, I have submitted the sitemap to google, bing's webmaster tools and and I want to add the location of our site's sitemaps and does it mean that I erase everything in the robots.txt right now and replace it with? <code>User-agent: * Disallow: Sitemap: http://www.example.com/none-standard-location/sitemap.xml</code> <code>???</code> because Wordpress comes with some default disallows like wp-admin, trackback, plugins. I have also read this, but was wondering if this is the correct way to add sitemap on Wordpress Robots.txt. [http://www.seomoz.org/q/removing-...](http://www.seomoz.org/q/removing-robots-txt-on-wordpress-site-problem) I am using Multisite with Yoast plugin so I have more than one sitemap.xml to submit Do I erase everything in Robots.txt and replace it with how SEOmoz recommended? hmm that sounds not right. like <code> <code>
Technical SEO | | joony2008
<code>User-agent: *
Disallow: </code> Sitemap: http://www.example.com/sitemap_index.xml</code> <code>``` Sitemap: http://www.example.com/sub/sitemap_index.xml ```</code> <code>?????????</code> ```</code>0 -
Restricted by robots.txt does this cause problems?
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
Technical SEO | | ocelot0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220