A few misc Webmaster tools questions & Robots.txt etc
-
Hi
I have a few general misc questions re Robots.tx & GWT:
1) In the Robots.txt file what do the below lines block, internal search ?
Disallow: /?
Disallow: /*?2) Also the sites feeds are blocked in robots.txt, why would you want to block a sites feeds ?
**3) **What's the best way to deal with the below:
- old removed page thats returning a 500 response code ?
- a soft 404 for an old removed page that has no current replacement
- old removed pages returning a 404
The old pages didn't have any authority or inbound links hence is it best/ok to simply create a url removal request in GWT ?
Cheers
Dan
-
Many Thanks Stufroguk !!
-
-
It depends if Google had index these 'empty' pages. You need to check. Remember that every page is also give page authority. Best to redirect them before removing them as best practice. You can get Google to fetch the pages in GWTs so that the crawlers follow the redirect. Then remove them.
-
Your old pages - fetch them in GWT's, then remove them if you already have the 301's set up. Once google has indexed the new pages, you know the link juice has passed and can remove.
The blocking is used as a back up.
-
-
Thanks Stufroguk,
1) does this still apply if the pages had no content - they were just overview pages/folders without any copy, links or authority hence why i think its ok to just remove urls without 301'ing ?
2) i do have other old content pages that i have 301'd to new replacement but hadnt planned to do anything else with them, but your saying after 2 weeks should nofollow or block them ? wont that stop the link equity passing ?
Cheers
Dan
-
To manage old pages it's best practice to simply 301 redirect them, leave them for a couple of weeks then tag them with no follow and/or block them with robots. That way you've passed on the link equity. Then you can remove them from GWT's.
In answer to 1. yes But not all SE's read the "*" wildcard in file names. You might need to tinker with this a bit.
Use this to help:http://tool.motoricerca.info/robots-checker.phtml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sub Domains and Robot.txt files...
This is going to seem like a stupid question, and perhaps it is but I am pulling out what little hair I have left. I have a sub level domain on which a website sits. The Main domain has a robots.txt file that disallows all robots. It has been two weeks, I submitted the sitemap through webmaster tools and still, Google has not indexed the sub domain website. My question is, could the robots.txt file on the main domain be affecting the crawlability of the website on the sub domain? I wouldn't have thought so but I can find nothing else. Thanks in advance.
Technical SEO | | Vizergy0 -
Rel no follow question
Hello, I probably already know the answer to this question. But, When you use a rel no follow tag on an internal link or external link. Will the google bot still navigate to the link, in question? Thanks for your help.
Technical SEO | | PeterRota0 -
Any need to worry about spammy links in Webmaster Tools from sites that no longer exist?
I own an ecommerce website that had some spammy stuff done on it by an SEO firm through SEOLinkVine a few years ago. I'm working on removing all those links, but some of the sites no longer exist. I'm assuming I don't have to worry about disavowing those in Webmaster Tools? Thanks!
Technical SEO | | CobraJones950 -
Robots.txt best practices & tips
Hey, I was wondering if someone could give me some advice on whether I should block the robots.txt file from the average user (not from googlebot, yandex, etc)? If so, how would I go about doing this? With .htaccess I'm guessing - but not an expert. What can people do with the information in the file? Maybe someone can give me some "best practices"? (I have a wordpress based website) Thanks in advance!
Technical SEO | | JonathanRolande0 -
Google webmaster tools says access denied for 77 urls
Hi i am looking in google webmaster tools and i have seen a major problem which i hope people can help me sort out. The problem is, i am being told that 77 urls are being denied access. The message when i look for more information says the below Googlebot couldn't crawl your URL because your server either requires login to access the page, or is blocking Googlebot from accessing your site. the responce code is 403 here is a couple of examples http://www.in2town.co.uk/Entertainment-Magazine http://www.in2town.co.uk/Weight-Loss-Hypnotherapy-helped-woman-lose-3-stone i think the problem could be that i have sent them to another url in my httaccess file using the 403 re-direct but why would it bring up that google bot could not crawl them any help would be great
Technical SEO | | ClaireH-1848860 -
Webmaster Tools Links To Your Site
I logged onto webmaster tools today for my site and the section 'Links to Your Site' is showing no data. Also if I search using link:babskibaby.com it only shows 1 link. My site had been showing 500+ links previously. Does anyone know why this is?
Technical SEO | | babski0 -
Ask a Question
Using SEOmoz for the first time, the initial crawl said we have 9,00 errors which were basically 4,500 duplicate pages and 4,500 dupllicate page titles. (ie http://domainname/etc .html, and http://www.domainmname/etc .html
Technical SEO | | FFTCOUK
We altered our website accordingly by changing all internal links to http://www.domainmname/etc .html as Google and all other rngines are listing us using the www. prefix. On the next crawl we now have even more of these duplicate errors. How d we go about removing them as we only have one file for each on the server. Google has down graded our website in April by 35% and ass this is a retail site we are losing a lot of business. I would very much appreciate it if anyone has the time to amswer. Howard0 -
Trying to reduce pages crawled to within 10K limit via robots.txt
Our site has far too many pages for our 10K page PRO account which are not SEO worthy. In fact, only about 2000 pages qualify for SEO value. Limitations of the store software only permit me to use robots.txt to sculpt the rogerbot site crawl. However, I am having trouble getting this to work. Our biggest problem is the 35K individual product pages and the related shopping cart links (at least another 35K); these aren't needed as they duplicate the SEO-worthy content in the product category pages. The signature of a product page is that it is contained within a folder ending in -p. So I made the following addition to robots.txt: User-agent: rogerbot
Technical SEO | | AspenFasteners
Disallow: /-p/ However, the latest crawl results show the 10K limit is still being exceeded. I went to Crawl Diagnostics and clicked on Export Latest Crawl to CSV. To my dismay I saw the report was overflowing with product page links: e.g. www.aspenfasteners.com/3-Star-tm-Bulbing-Type-Blind-Rivets-Anodized-p/rv006-316x039354-coan.htm The value for the column "Search Engine blocked by robots.txt" = FALSE; does this mean blocked for all search engines? Then it's correct. If it means "blocked for rogerbot? Then it shouldn't even be in the report, as the report seems to only contain 10K pages. Any thoughts or hints on trying to attain my goal would REALLY be appreciated, I've been trying for weeks now. Honestly - virtual beers for everyone! Carlo0