Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Using the Google Remove URL Tool to remove https pages
-
I have found a way to get a list of 'some' of my 180,000+ garbage URLs now, and I'm going through the tedious task of using the URL removal tool to put them in one at a time. Between that and my robots.txt file and the URL Parameters, I'm hoping to see some change each week.
I have noticed when I put URL's starting with https:// in to the removal tool, it adds the http:// main URL at the front.
For example, I add to the removal tool:-
https://www.mydomain.com/blah.html?search_garbage_url_addition
On the confirmation page, the URL actually shows as:-
http://www.mydomain.com/https://www.mydomain.com/blah.html?search_garbage_url_addition
I don't want to accidentally remove my main URL or cause problems. Is this the right way this should look?
AND PART 2 OF MY QUESTION
If you see the search description in Google for a page you want removed that says the following in the SERP results, should I still go to the trouble of putting in the removal request?
www.domain.com/url.html?xsearch_...
A description for this result is not available because of this site's robots.txt – learn more.
-
Thanks so much for taking the time to respond.
I think I will add the https to WMT and remove them that way.
I will take a look through the .htaccess file and the creation of the ssl robots file. A while back, it seemed that Google was indexing a lot of my site as https and then the dropped it and went mainly back to http. I will get that sorted to make it clear.
-
Hi there
I'll start with question 2 first as it's a bit easier to answer. Robots.txt blocks the crawling of a page, but not necessarily indexing. Of course, if the page cannot be crawled it will be deindexed eventually anyway, but if you're getting that description for one of your URLs, Google has not been able to access it and will stop trying to. So that is usually enough, although if you want to remove it as well, you can by all means.
For question 1 - GWT is a bit awkward in the sense that it treats http and https versions of your site as different webmaster properties. Furthermore, if you want to remove a URL on your site, it will always prefix it with the http/https version of your site, no matter how you enter it.
If you added another WMT property that was https://www.yourdomain.com - you would be able to manage that domain as well and thus you would be able to remove any URLs under that prefix.
Incidentally, if you want to block all HTTPS pages from being accessed, you can do that with a special instruction in your htaccess file and robots txt. You can instruct the Googlebot and other bots to read a specific robots.txt file if they visit an HTTPS URL. To do that, you would first add this to your htaccess file:
RewriteCond %{HTTPS} ^on$
RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule ^(.*)$ /robots_ssl.txt [L]This command basically says "if the URL has https, read the robots_ssl.txt file". You then upload a file called robots_ssl.txt to your root domain. In the txt file you just add:
User-agent: *
Disallow: /So now, if a bot reaches an https URL, it has to read the robots_ssl.txt file and upon reading that, they are denied access. That would prevent all of your https URLs from being indexed.
That might be useful to you, but if you go ahead and use it please take care to backup all your files in case anything goes wrong - your htaccess file is very important!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
What happens when you replace a page with a new version that has the same URL?
a new page template was created the plan is to publish the new page (which has the same URL as before) to web and delete the old page that has the URL , will that have an SEO implications ?
Technical SEO | | lina_digital1 -
Category URL Pagination where URLs don't change between pages
Hello, I am working on an e-commerce site where there are categories with multiple pages. In order to avoid pagination issues I was thinking of using rel=next and rel=prev and cannonical tags. I noticed a site where the URL doesn't change between pages, so whether you're on page 1,2, or 3 of the same category, the URL doesn't change. Would this be a cleaner way of dealing with pagination?
Technical SEO | | whiteonlySEO0 -
What punctuation can you use in meta tags? Are there any Google does not like?
So I know you can use dashes and | in meta tags, but can anyone tell me what other punctuation you can use? Also, it'd be great to know what punctuation you can't use. Thanks!
Technical SEO | | Trevorneo1 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | | rhoadesjohn0 -
How long does it take for Google for deindexing pages?
Hi mozzers, We just launched a mobile website(parallel) and realized that it created many duplicate content with desktop URLs. I decided to add name="robots" content="No index, No follow" /> to the entire mobile site. My only concern is that I am still seeing the mobile site indexed when it's been almost a week I added these tags. Does anyone know how long it takes google to deindex your content? Thanks
Technical SEO | | Ideas-Money-Art0 -
Landing Page URL Structure
We are finally setting up landing pages to support our PPC campaigns. There has been some debate internally about the URL structure. Originally we were planning on URL's like: domain.com /california /florida /ny I would prefer to have the URL's for each state inside a "state" folder like: domain.com /state /california /florida /ny I like having the folders and pages for each state under a parent folder to keep the root folder as clean as possible. Having a folder or file for each state in the root will be very messy. Before you scream URL rewriting :-). Our current site is still running under Classic ASP which doesn't support URL rewriting. We have tried to use HeliconTech's ISAPI rewrite module for IIS but had to remove it because of too many configuration issues. Next year when our coding to MVC is complete we will use URL rewriting. So the question for now: Is there any advantage or disadvantage to one URL structure over the other?
Technical SEO | | briankb0 -
Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs.
Technical SEO | | surveygizmo0