Robots.txt anomaly
-
Hi,
I'm monitoring a site thats had a new design relaunch and new robots.txt added.
Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14).
In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example:
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themesetc, etc
And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ?
Thanks in advance for any help/advice/clarity why this may be happening ?
Cheers
Dan
-
many thanks for that Dan !
-
As far as I thought, the important thing is that your feed shows up in feed readers. Can you subscribe to and view your RSS feed in a variety of different feed readers?
Yes, so long as the ? is utilized only in ways in which would result in duplicate content, or content that would not be desirable to crawl, it will have that effect.
-Dan
-
Many Thanks for your comments Dan !
So it doesnt matter that the feeds not going to be crawled, dont we want feeds to be crawled usually?
Blocking anything with a ? is surely good then isnt it since prevents all the dupe content etc one gets from search results ?
Yes my clients webmaster set it up
-
Hi Dan
I see no reason to disallow the feed like that by default, unless there is some reason I don't know about. But it won't harm anything either.
The second part blocks any URL which begins with a ? (question mark). This would block anything that has a parameter in the URL - most commonly a search word, pagination, filtering settings etc.
As far as I'm aware this is not going to be damaging to the site, but it's not the default setting. Did someone set it up that way for you?
My robots.txt shows the default WordPress settings: http://www.evolvingseo.com/robots.txt
-
Hi Dan
Yes please find below, please can you also confirm if the bottom 2 lines refer to blocking internal search results ?:
Disallow: /feed
Disallow: */feedDisallow: /?
Disallow: /*?Many Thanks
Dan
-
Hi Dan
Can you share the exact line disallowing RSS?
Thanks!
-Dan
-
sorry 1 more question, i see that the webmaster has disallowed the feeds in the robots.txt file is this normal/desirable, i would have thought one would want rss feeds crawled by Google ?
-
nice 1 cheers Jesse !
-
Your assumption is correct. The disallows you listed are directories, not pages. Therefore, anything within the Plugins folder will be disallowed, same with the cache and themes folder.
So you may have multiple files (and I'm sure you do) within each of those folders.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No: 'noindex' detected in 'robots' meta tag
I'm getting an error in Search Console that pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. Unfortunately I can't post images on here but I've linked some url's below. The page below in search console shows the error above... https://mixeddigitaleduconsulting.com/ As does this one. https://mixeddigitaleduconsulting.com/independent-school-marketing-communications/ However, this page does not have the error and is indexed by Google. The meta robots tag looks identical. https://mixeddigitaleduconsulting.com/blog/leadership-team/jill-goodman/ Any and all help is appreciated.
Technical SEO | | Sean_White_Consult0 -
No: 'noindex' detected in 'robots' meta tag
Pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. The page below in search console shows the error above...
Technical SEO | | Sean_White_Consult0 -
Should a login page for a payroll / timekeeping comp[any be no follow for robots.txt?
I am managing a Timekeeping/Payroll company. My question is about the customer login page. Would this typically be nofollow for robots?
Technical SEO | | donsilvernail0 -
Robots File
For some reason the robots file on this site: http://rushhour.net.au/robots.txt Is giving this in Google: <cite class="_Rm">www.rushhour.net.au/bootcamp.html</cite>A description for this result is not available because of this site's robots.txtLearn moreCan anyone tell me why please?thanks.
Technical SEO | | SuitsAdmin0 -
Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?
Hi guys, I've already added the following syntax in robots.txt to prevent search engines in crawling dynamic pages produce by my website's search feature: Disallow: /search/. But soft 404s are still showing in Google Webmaster Tools. Do I need to wait(it's been almost a week since I've added the following syntax in my robots.txt)? Thanks, JC
Technical SEO | | esiow20130 -
Restricted by robots.txt does this cause problems?
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
Technical SEO | | ocelot0 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0 -
Is robots.txt a must-have for 150 page well-structured site?
By looking in my logs I see dozens of 404 errors each day from different bots trying to load robots.txt. I have a small site (150 pages) with clean navigation that allows the bots to index the whole site (which they are doing). There are no secret areas I don't want the bots to find (the secret areas are behind a Login so the bots won't see them). I have used rel=nofollow for internal links that point to my Login page. Is there any reason to include a generic robots.txt file that contains "user-agent: *"? I have a minor reason: to stop getting 404 errors and clean up my error logs so I can find other issues that may exist. But I'm wondering if not having a robots.txt file is the same as some default blank file (or 1-line file giving all bots all access)?
Technical SEO | | scanlin0