Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Blocked URL parameters can still be crawled and indexed by google?
-
Hy guys,
I have two questions and one might be a dumb question but there it goes. I just want to be sure that I understand:
IF I tell webmaster tools to ignore an URL Parameter, will google still index and rank my url?
IS it ok if I don't append in the url structure the brand filter?, will I still rank for that brand?
Thanks,
PS: ok 3 questions :)...
-
If you want to permanently remove URLs from the index, this is the basic process:
Have your developer implement NoIndex, Follow to all pages that have the URL parameter you want removed. For example, if the URL contains categoryFilter= (like above), then add the NoIndex, Follow tag to the of the page. Do this for all URL paramters you want removed from the index.
Make sure Google is allowed to crawl those pages. If they are blocked by robots.txt or told not to crawl them via Google Webmaster Tools, Google will not be able to see the newly implement NoIndex, Follow tag.
Then, give it some time and wait. It may take Google a long time to crawl all of these paramtered URLs again. Fallout of the index might be slow.
Once the URLs are gone, consider blocking the crawling of them via robots.txt or in GWT parameter handling.
-
Hi Anthony,
What if we are trying to permanently remove e-commerce website URL's that have multiple parameters from (Google) index. How would we apply noindex to all these URL's with parameters??
The aim is to recrawl and rebuild the index of the whole website using appropriate robots, canonical's & meta-tags, rather than using GWT.
Many thanks
-
Parameter handling in Google Webmaster Tools won't get a URL out of the index if it is already indexed.
You need to use the NoIndex robots meta tag in the of your page. Once you add this tag, be sure you are allowing Google to crawl the page. Make sure it is Not blocked via robots.txt or with Parameter handling.
Once the pages have left the index, you can block them from being crawled.
-
If you want a page or url not crawled then you should use the robots.txt file and robots meta tags. Then, in WMT, make sure those same pages are actually not being crawled
Hope that answers your question
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can I still monitor noindex, nofollow pages with Google Analytics?
I have a private/login site where all pages are noindex, nofollow. Can I still monitor external site links with Google Analytics?
Technical SEO | | jasmine.silver0 -
Is there a way to get a list of all pages of your website that are indexed in Google?
I am trying to put together a comprehensive list of all pages that are indexed in Google and have differing opinions on how to do this.
Technical SEO | | SpodekandCo0 -
How can I stop a tracking link from being indexed while still passing link equity?
I have a marketing campaign landing page and it uses a tracking URL to track clicks. The tracking links look something like this: http://this-is-the-origin-url.com/clkn/http/destination-url.com/ The problem is that Google is indexing these links as pages in the SERPs. Of course when they get indexed and then clicked, they show a 400 error because the /clkn/ link doesn't represent an actual page with content on it. The tracking link is set up to instantly 301 redirect to http://destination-url.com. Right now my dev team has blocked these links from crawlers by adding Disallow: /clkn/ in the robots.txt file, however, this blocks the flow of link equity to the destination page. How can I stop these links from being indexed without blocking the flow of link equity to the destination URL?
Technical SEO | | UnbounceVan0 -
Removed Subdomain Sites Still in Google Index
Hey guys, I've got kind of a strange situation going on and I can't seem to find it addressed anywhere. I have a site that at one point had several development sites set up at subdomains. Those sites have since launched on their own domains, but the subdomain sites are still showing up in the Google index. However, if you look at the cached version of pages on these non-existent subdomains, it lists the NEW url, not the dev one in the little blurb that says "This is Google's cached version of www.correcturl.com." Clearly Google recognizes that the content resides at the new location, so how come the old pages are still in the index? Attempting to visit one of them gives a "Server Not Found" error, so they are definitely gone. This is happening to a couple of sites, one that was launched over a year ago so it doesn't appear to be a "wait and see" solution. Any suggestions would be a huge help. Thanks!!
Technical SEO | | SarahLK0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
I accidentally blocked Google with Robots.txt. What next?
Last week I uploaded my site and forgot to remove the robots.txt file with this text: User-agent: * Disallow: / I dropped from page 11 on my main keywords to past page 50. I caught it 2-3 days later and have now fixed it. I re-imported my site map with Webmaster Tools and I also did a Fetch as Google through Webmaster Tools. I tweeted out my URL to hopefully get Google to crawl it faster too. Webmaster Tools no longer says that the site is experiencing outages, but when I look at my blocked URLs it still says 249 are blocked. That's actually gone up since I made the fix. In the Google search results, it still no longer has my page title and the description still says "A description for this result is not available because of this site's robots.txt – learn more." How will this affect me long-term? When will I recover my rankings? Is there anything else I can do? Thanks for your input! www.decalsforthewall.com
Technical SEO | | Webmaster1230 -
Google is indexing my directories
I'm sure this has been asked before, but I was looking at all of Google's results for my site and I found dozens of results for directories such as: Index of /scouting/blog/wp-includes/js/swfupload/plugins Obviously I don't want those indexed. How do I prevent Google from indexing those? Also, it only seems to be doing it with Wordpress, not any of the directories on my main site. (We have a wordpress blog, which is only a portion of the site)
Technical SEO | | UnderRugSwept0 -
How to get Google to index another page
Hi, I will try to make my question clear, although it is a bit complex. For my site the most important keyword is "Insurance" or at least the danish variation of this. My problem is that Google are'nt indexing my frontpage on this, but are indexing a subpage - www.mydomain.dk/insurance instead of www.mydomain.dk. My link bulding will be to subpages and to my main domain, but i wont be able to get that many links to www.mydomain.dk/insurance. So im interested in making my frontpage the page that is my main page for the keyword insurance, but without just blowing the traffic im getting from the subpage at the moment. Is there any solutions to do this? Thanks in advance.
Technical SEO | | Petersen110