A few misc Webmaster tools questions & Robots.txt etc
-
Hi
I have a few general misc questions re Robots.tx & GWT:
1) In the Robots.txt file what do the below lines block, internal search ?
Disallow: /?
Disallow: /*?2) Also the sites feeds are blocked in robots.txt, why would you want to block a sites feeds ?
**3) **What's the best way to deal with the below:
- old removed page thats returning a 500 response code ?
- a soft 404 for an old removed page that has no current replacement
- old removed pages returning a 404
The old pages didn't have any authority or inbound links hence is it best/ok to simply create a url removal request in GWT ?
Cheers
Dan
-
Many Thanks Stufroguk !!
-
-
It depends if Google had index these 'empty' pages. You need to check. Remember that every page is also give page authority. Best to redirect them before removing them as best practice. You can get Google to fetch the pages in GWTs so that the crawlers follow the redirect. Then remove them.
-
Your old pages - fetch them in GWT's, then remove them if you already have the 301's set up. Once google has indexed the new pages, you know the link juice has passed and can remove.
The blocking is used as a back up.
-
-
Thanks Stufroguk,
1) does this still apply if the pages had no content - they were just overview pages/folders without any copy, links or authority hence why i think its ok to just remove urls without 301'ing ?
2) i do have other old content pages that i have 301'd to new replacement but hadnt planned to do anything else with them, but your saying after 2 weeks should nofollow or block them ? wont that stop the link equity passing ?
Cheers
Dan
-
To manage old pages it's best practice to simply 301 redirect them, leave them for a couple of weeks then tag them with no follow and/or block them with robots. That way you've passed on the link equity. Then you can remove them from GWT's.
In answer to 1. yes But not all SE's read the "*" wildcard in file names. You might need to tinker with this a bit.
Use this to help:http://tool.motoricerca.info/robots-checker.phtml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site was infected with spam webmaster tools still reporting it
I have recently been working with a site that was hacked. It suffered from a pharma injection into Joomla. The site has been cleaned for several months, but WMT is still reporting "pharmacy" as occuring 421 times. The url it gives reports a 500 error. I also removed it in Google. Can this still be hurting the site? How can I clean this up?
Technical SEO | | smcmark0 -
DNS error on webmaster tool
Google webmaster tool is showing DNS error and that is leading to many server error (502,500) almost 50+ in every crawl. Recently Google crawled one of our sub domains that we did not want google to crawl. We blocked it via Robots.txt and also removed all the URL's and since then we are having this issue. Any suggestions how to fix this DNS error? Thanks in advance.
Technical SEO | | tpt.com0 -
Popup Question
Hi Everyone, I have a question. Your input will be very much appreciated. My company's new website design is using a popup. I have some reservation about it and I want to know what your thoughts are. Ok, some information on what this popup is like. When a user clicks on a subcategory page, there's a popup that would ask for size, color, etc - it's like a form and those are the criteria. If nothing is selected, the product list on the subcategory page doesn't load - so the only thing is showing is the the H1 and description but everything else is empty. When a user does select a criteria the landing page is no longer the subcategory but another page with that ID. So basically the user never really land on the subcategory page but to another page with a different query string. Is this bad for SEO? Would you recommend to keep the popup? Thanks,
Technical SEO | | truckguy770 -
Persistent Unnatural Links in Webmaster tools
We recently were notified about unnatural links from two websites (totalling a few thousands links each). We went to the websites and asked them to remove the links, which they apparently did. After this we applied for reconsideration to Google, explaining the situation, however they came back and said we still have links. We noticed there were still links, however there were less than before, and so we once again asked the sites to remove all the links. Now we are sure all the links are gone as when we click a random link and view the page source there is no reference to our site, however WebMaster tools is not updating the link list, claiming we still have thousands of links. Do we have to apply for another reconsideration request to get them to re-crawl the sites to get rid of the links, or should it happen automatically?
Technical SEO | | eXia0 -
What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT. I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap. Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
Technical SEO | | DotCar0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050 -
Robots.txt file question? NEver seen this command before
Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay0