My website's pages are not being indexed correctly
-
Hi,
One of our websites, which is actually a price comparison engine, facing indexing problem at Google.
When we check “site:mywebsite.com “, there are lots of pages indexed which are not from mywebsite.com but from merchants websites. The index result page also shows merchant’s page title. In some cases the title is from merchant’s site but when the given link is accessed it points to mywebsite.com/index. Also the cache displays the merchant’s product page as the last indexed version rather than showing ours.
The mywebsite.com has quite few Merchants that send us their product feed. Those products are listed on comparison page with prices. The merchant’s links on comparison page are all no-follow links but some of the (not all) merchant’s product pages are indexed against mywebsite.com as mentioned above instead of product comparison page of mywebsite.com
How can we fix the issue?
Thanks!
-
Yeah i was thinking the same....
The interesting thing is we've removed the redirect page a week ago and replaced it with javascript redirect code. is that a good practice?
-
Ah. Regarding #3: If you have a disallow in the robots.txt the search engines won't pick up the noindex. Ensure the noindex code is in place on the applicable pages, remove the disallow, and the pages should be removed after they're crawled. getting that relationship straightened out might help with some of the other things as well. Cheers!
-
Thanks Ryan for the response. We'll surely prevent crawling of search result pages. Please check below points too. Thanks!!!
- The cache page shows merchant product page in full version as well as in text-only version.
- The title shown on the result page is also of the merchant's product page title.
- One thing on the comparison price page is merchants are redirected to their respective websites, the links are nofollow, but redirect page is indexed even after having it on robots.txt and noindex on redirect page.
- The redirect page is indexed like mywebsite.com/redirect-50187889-0
- Comparison listing is not similar to internal search result page but result pages are crawl-able.
-
no iFrames being used.
-
Thumbs up to Don's rec. Also when you look at the text only cache what kind of page are you seeing, if any? Sometimes the site: search is a little inconsistent so you can try forcing the delivery of certain pages with the inurl: modifier. One last caveat that comes to mind is that if the comparison listing is similar to an internal search results page, Google may not ever list it, "Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines." from: https://support.google.com/webmasters/answer/35769 Cheers!
-
How are you merchant prices / info being displayed on your site? From your site or using IFrames?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is my page being indexed?
To put you all in context, here is the situation, I have pages that are only accessible via an intern search tool that shows the best results for the request. Let's say i want to see the result on page 2, the page 2 will have a request in the url like this: ?p=2&s=12&lang=1&seed=3688 The situation is that we've disallowed every URL's that contains a "?" in the robots.txt file which means that Google doesn't crawl the page 2,3,4 and so on. If a page is only accessible via page 2, do you think Google will be able to access it? The url of the page is included in the sitemap. Thank you in advance for the help!
Technical SEO | | alexrbrg0 -
What's with the redirects?
Hi there,
Technical SEO | | HeadStud
I have a strange issue where pages are redirecting to the homepage.Let me explain - my website is http://thedj.com.au Now when I type in www.thedj.com.au/payments it redirects to https://thedj.com.au (even though it should be going to the page https://thedj.com.au/payments). Any idea why this is and how to fix? My htaccess file is below: BEGIN HTTPS Redirection Plugin <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteRule ^home.htm$ https://thedj.com.au/ [R=301,L]
RewriteRule ^photos.htm$ http://photos.thedj.com.au/ [R=301,L]
RewriteRule ^contacts.htm$ https://thedj.com.au/contact-us/ [R=301,L]
RewriteRule ^booking.htm$ https://thedj.com.au/book-dj/ [R=301,L]
RewriteRule ^downloads.htm$ https://thedj.com.au/downloads/ [R=301,L]
RewriteRule ^payonline.htm$ https://thedj.com.au/payments/ [R=301,L]
RewriteRule ^price.htm$ https://thedj.com.au/pricing/ [R=301,L]
RewriteRule ^questions.htm$ https://thedj.com.au/faq/ [R=301,L]
RewriteRule ^links.htm$ https://thedj.com.au/links/ [R=301,L]
RewriteRule ^thankyous/index.htm$ https://thedj.com.au/testimonials/ [R=301,L]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://thedj.com.au/ [L,R=301]</ifmodule> END HTTPS Redirection Plugin BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress RewriteCond %{HTTP_HOST} ^mrdj.net.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mrdj.net.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^mrdj.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mrdj.com.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^thedjs.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.thedjs.com.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^theperthweddingdjs.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.theperthweddingdjs.com$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^thedjs.net.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.thedjs.net.au$
RewriteRule ^/?$ "https://thedj.com.au" [R=301,L]0 -
How can I index several systems used for my website?
My site is built on PHP, but has a help.website.com page based on a helpdesk platform. I also have a wordpress blog. So, these are three "different systems" under the same domain. When I crawl my site, neither the blog nor the help page show up. How can I make them show up? Thanks!
Technical SEO | | rodelmo880 -
My website pages are not crawled, what to do?
Hi all. I have made some changes on the website so i like to crawled them by the search engines Google especially. I have made these changes around 2 weeks ago. I have submitted my website on good bookmarking websites. Also i used a tool available in Google webmasters "Fetch as Google", Resubmitted a sitemap.xml. Still my pages are not crawled your opinion please. Thanks
Technical SEO | | lucidsoftech0 -
Noindex Pages indexed
I'm having problem that gogole is index my search results pages even though i have added the "noindex" metatag. Is the best thing to block the robot from crawling that file using robots.txt?
Technical SEO | | Tedred0 -
Https-pages still in the SERP's
Hi all, my problem is the following: our CMS (self-developed) produces https-versions of our "normal" web pages, which means duplicate content. Our it-department put the <noindex,nofollow>on the https pages, that was like 6 weeks ago.</noindex,nofollow> I check the number of indexed pages once a week and still see a lot of these https pages in the Google index. I know that I may hit different data center and that these numbers aren't 100% valid, but still... sometimes the number of indexed https even moves up. Any ideas/suggestions? Wait for a longer time? Or take the time and go to Webmaster Tools to kick them out of the index? Another question: for a nice query, one https page ranks No. 1. If I kick the page out of the index, do you think that the http page replaces the No. 1 position? Or will the ranking be lost? (sends some nice traffic :-))... thanx in advance 😉
Technical SEO | | accessKellyOCG0 -
Non-Canonical Pages still Indexed. Is this normal?
I have a website that contains some products and the old structure of the URL's was definitely not optimal for SEO purposes. So I created new SEO friendly URL's on my site and decided that I would use the canonical tags to transfer all the weight of the old URL's to the New URL's and ensure that the old ones would not show up in the SERP's. Problem is this has not quite worked. I implemented the canonical tags about a month ago but I am still seeing the old URL's indexed in Google and I am noticing that the cache date of these pages was only about a week ago. This leads me to believe that the spiders have been to the pages and seen the new canonical tags but are not following them. Is this normal behavior and if so, can somebody explain to me why? I know I could have just 301 redirected these old URL's to the new ones but the process I would need to go through to have that done is much more of a battle than to just add the canonical tags and I felt that the canonical tags would have done the job. Needless to say the client is not too happy right now and insists that I should have just used the 301's. In this case the client appears to be correct but I do not quite understand why my canonical tags did not work. Examples Below- Old Pages: www.awebsite.com/something/something/productid.3254235 New Pages: www.awebsite.com/something/something/keyword-rich-product-name Canonical tag on both pages: rel="canonical" href="http://www.awebsite.com/something/something/keyword-rich-product-name"/> Thanks guys for the help on this.
Technical SEO | | DRSearchEngOpt0 -
404 errors on a 301'd page
I current have a site that when run though a site map tool (screaming frog or xenu) returns a 404 error on a number of pages The pages are indexed in Google and when visited they do 301 to the correct page? why would the sitemap tool be giving me a different result? is it not reading the page correctly?
Technical SEO | | EAOM0