Google indexing thousands crazy search results with %25253
-
In GWT I started seeing very strange pages indexed a few weeks, and Google is no reporting over 21,000 of pages (blocked by robots.txt) with weird URLs like this:
The current robots.txt looks like this:
User-agent: *
Disallow: /wp-contentDisallow: /wp-admin
Disallow: /wp-includes
Disallow: /data
Disallow: /slideshows
Disallow: /page/*/?s=
Disallow: /?s=
Disallow: /searchThis website is running an up to date WP install with Yoast's Google Analytics and SEO plug-in. I can't point to anything specific that happened with the site when these URLs started appearing even after I modified the robots.txt.
What can be done to try and stop Google from creating and indexing these goofy URLs?
I see lots of sites having this issue when I search in Google, but no one seems to have a solution.
-
As it turns out the problem is with Yoast's Google Analytics plug-in per Yoast. However, he has not yet released a fix nor given a date for the fix as of yet. So one either needs to deal with it until fixed or switch plug-ins.
-
Hi Sha,
Well, that is a new possible lead, but unfortunately Pictage is basically worthless when it comes to any technological issues.
Hmm, is there some way I could add "noindex" tags to anything link that appears on the Proof page as they are dynamic in appearance?
Thanks,
Joe
-
Hi again Joe,
After a more detailed look at your site (which has no obvious search box available to users) I was curious as to why all of the things that you are doing on the site seem to have no effect upon the issues you are trying to resolve...and why your site is generating thousands of search queries without a search box!
This says to me "do you have control of all of the content?" ... and it appears that you are using an external service called Pictage to upload and display client portfolios.
So, are you pulling content into your site from Pictage? Is it some kind of white label add-on to your site?
If the pages from Pictage are being generated externally, then the yoast plugin cannot add the "noindex" tag to those pages...if this is the case then I would say you need to contact the Pictage help people and advise them that there is a problem they need to attend to.
Hope that helps,
Sha
-
Hi Egol,
Hmm, I have never heard of that possibility.
How can I change the resultant search URL with a Wordpress install?
Thanks.
-
Hi Sha,
I made the changes weeks ago, but more pages keep appearing which tells me Google is still trying to index them?
There is already an "s" parameter set in GWT, but I don't really see many options in this screen - are there some settings I'm missing?
There are also page URLs like this one, can they be blocked as well?
-
In addition to the suggestions already given... if this was my site I would change the URL of the search results page. Someone might have a robot that is tossing crap queries into your search box.
-
Hi Joe,
A couple of things:
- If you have made the change to noindex search results recently, it may take some time for the errors to disappear from GWT. If the number of pages continues to grow, then clearly the noindex is not implemented as you expect.
- You could try using the parameter handling feature in GWT to tell googlebot to ignore all pages with the parameter in question. In your search string, the ? says "here come some parameters" and the "s" is the parameter that you want to ignore.
Incidentally, there is definitely something funky happening with the generation of those search strings which should be investigated and resolved as well.
Hope that helps,
Sha
-
Yoast's WordPress SEO plug-in automatically does the following:
- RSS feeds are now always noindex, followed. No search engine should ever list an RSS feed as a result in the resultpages.
- Admin, login and registration pages are always noindexed now for the same reason.
- Search result pages are now always noindex, follow.
-
This is in your own website's search, right?
I've always heard that you should do on page robots that make it:
no-index, follow
So that all of the links on the page can be followed, but Google will not index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google indexing is slowing down?
I have up to 20 million unique pages, and so far I've only submitted about 30k of them on my sitemap. We had a few load related errors during googles initial visits, and it thought some were duplicates, but we fixed all that. We haven't gotten a crawl related error for 2 weeks now. Google appears to be indexing fewer and fewer urls every time it visits. Any ideas why? I am not sure how to get all our pages indexed if its going to operate like this... love some help thanks! HnJaXSM.png
Technical SEO | | RyanTheMoz0 -
How to de-index a page with a search string with the structure domain.com/?"spam"
The site in question was hacked years ago. All the security scans come up clean but the seo crawlers like semrush and ahrefs still show it as an indexed page. I can even click through on it and it takes me to the homepage with no 301. Where is the page and how to deindex it? domain/com/?spam There are multiple instances of this. http://www.clipular.com/c/5579083284217856.png?k=Q173VG9pkRrxBl0b5prNqIozPZI
Technical SEO | | Miamirealestatetrendsguy1 -
What should i do to index images in google webmaster?
My website onlineplants.com.au. It's a shopping cart website. I do have nearly 1200 images but none of the images are indexed in google webmaster? what should i do. Thanks
Technical SEO | | Verve-Innovation1 -
Google instant results different to results shown when press enter
A client's site, www.duorol.co.uk is top (or second if a youtube video makes an appearance) for the term duorol if you press enter after typing it in to google UK. Before you press enter though, their site is not listed in the results bought back for instant search. It's the same behaviour in incognito mode too. Very weird I thought. Does anyone have any ideas please? Their site's only been live about a month. Could that be anything to do with it?
Technical SEO | | OffSightIT0 -
Http VS https and google crawl and indexing ?
Is it true that https pages are not crawled and indexed by Google and other search engines as well as http pages?
Technical SEO | | sherohass0 -
How to handle (internal) search result pages?
Hi Mozers, I'm not quite sure what the best way is to handle internal search pages. In this case it's for an ecommerce website with about 8.000+ products and search pages currently look like: example.com/search.php?search=QUERY+HERE. I'm leaning towards making them follow, noindex. Since pages like this can be easily abused for duplicate content and because I'd rather have the category pages ranked. How would you handle this?
Technical SEO | | Qon0 -
Pages not indexed by Google
We recently deleted all the nofollow values on our website. (2 weeks ago) The number of pages indexed by google is the same as before? Do you have explanations for this? website : www.probikeshop.fr
Technical SEO | | Probikeshop0 -
Will duplicate content on ecommerce cause harm in search results?
First off, SEO learner, not a professional, therefore question is not for any client. A new (less 1 yr) ecommerce site in a particular sector is now moving into partnership with relevant websites to be their online store. A 'store' link on the partner site will redirect to the ecommerce domain to a dedicated area (on a domain/directory path) with the partner's branding. To do this though means duplicating the entire catalogue for each partner that comes on board for this scheme. So the original ecommercesite.com/categories also delivers ecommercesite.com/partner1/categories (with partner's brand identity) ecommercesite.com/partner2/categories (with partner's brand identity) and so on Won't duplicating the product catalogue in directories cause problems in delivering effective SERPs for the original ecommerce site?
Technical SEO | | BeIntermedia0