Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Unnecessary pages getting indexed in Google for my blog
-
I have a blog dapazze.com and I am suffering from a problem for a long time. I found out that Google have indexed hundreds of replytocom links and images attachment pages for my blog.
I had to remove these pages manually using the URL removal tool. I had used "Disallow: ?replytocom" in my robots.txt, but Google disobeyed it. After that, I removed the parameter from my blog completely using the SEO by Yoast plugin.
But now I see that Google has again started indexing these links even after they are not present in my blog (I use #comment). Google have also indexed many of my admin and plugin pages, whereas they are disallowed in my robots.txt file.
Have a look at my robots.txt file here: http://dapazze.com/robots.txt
Please help me out to solve this problem permanently?
-
Me too have the same issue ! but not indexed in the Google ! but URL parameters in Google Webmasters shows there are 5K errors !
Should i use the URL Parameters settings or which one ?
Also make sure replytocom links are not blocked using Robots.txt, as it will stop Google bots from crawling and this your links won’t get deindexed. This is one mistake which I did, and later after removing replytocom parameter from robots.txt file, I was able to get most of my replytocom links deindexed. These are warning by the blogger ! http://www.shoutmeloud.com/how-to-fix-replytocom-links-issue-in-wordpress.html - he showed how to do that ! but my problem is different - It's Good that it's not indexed but i don't want to take any risk ! how to avooid them for future !
Someone else told me here that some plugins are doing/helping for you ! and not seen in your Robot.txt !
Confused confused ! so much confused ! Please help me !
-
Actually previously I had removed the links manually. But I am seeing them come up again even after removing the parameter completely.
Can you please point our the problem for me?
-
Please check that the comment pages are blocked by robots.txt file -
However, the blocked pages are now getting redirected to the main landing page of the blog posts.
Seems like it will take a while for Google to recrawl these pages and sort the issue.
In the mean time, could you please show some pages that are getting indexed by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should search pages be indexed?
Hey guys, I've always believed that search pages should be no-indexed but now I'm wondering if there is an argument to index them? Appreciate any thoughts!
Technical SEO | | RebekahVP0 -
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results. Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/ Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page. I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed. Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
Technical SEO | | d.bird0 -
How can I get a photo album indexed by Google?
We have a lot of photos on our website. Unfortunately most of them don't seem to be indexed by Google. We run a party website. One of the things we do, is take pictures at events and put them on the site. An event page with a photo album, can have anywhere between 100 and 750 photo's. For each foto's there is a thumbnail on the page. The thumbnails are lazy loaded by showing a placeholder and loading the picture right before it comes onscreen. There is no pagination of infinite scrolling. Thumbnails don't have an alt text. Each thumbnail links to a picture page. This page only shows the base HTML structure (menu, etc), the image and a close button. The image has a src attribute with full size image, a srcset with several sizes for responsive design and an alt text. There is no real textual content on an image page. (Note that when a user clicks on the thumbnail, the large image is loaded using JavaScript and we mimic the page change. I think it doesn't matter, but am unsure.) I'd like that full size images should be indexed by Google and found with Google image search. Thumbnails should not be indexed (or ignored). Unfortunately most pictures aren't found or their thumbnail is shown. Moz is giving telling me that all the picture pages are duplicate content (19,521 issues), as they are all the same with the exception of the image. The page title isn't the same but similar for all images of an album. Example: On the "A day at the park" event page, we have 136 pictures. A site search on "a day at the park" foto, only reveals two photo's of the albums. 3QolbbI.png QTQVxqY.jpg mwEG90S.jpg
Technical SEO | | jasny0 -
Does Google index internal anchors as separate pages?
Hi, Back in September, I added a function that sets an anchor on each subheading (h[2-6]) and creates a Table of content that links to each of those anchors. These anchors did show up in the SERPs as JumpTo Links. Fine. Back then I also changed the canonicals to a slightly different structur and meanwhile there was some massive increase in the number of indexed pages - WAY over the top - which has since been fixed by removing (410) a complete section of the site. However ... there are still ~34.000 pages indexed to what really are more like 4.000 plus (all properly canonicalised). Naturally I am wondering, what google thinks it is indexing. The number is just way of and quite inexplainable. So I was wondering: Does Google save JumpTo links as unique pages? Also, does anybody know any method of actually getting all the pages in the google index? (Not actually existing sites via Screaming Frog etc, but actual pages in the index - all methods I found sadly do not work.) Finally: Does somebody have any other explanation for the incongruency in indexed vs. actual pages? Thanks for your replies! Nico
Technical SEO | | netzkern_AG0 -
Site indexed by Google, but (almost) never gets impressions
Hi there, I have a question that I wasn't able to give it a reasonable answer yet, so I'm going to trust on all of you. Basically a site has all its pages indexed by Google (I verified with site:sitename.com) and it also has great and unique content. All on-page grades are A with absolutely no negative factors at all. However its pages do not get impressions almost at all. Of course I didn't expect it to be on page 1 since it has been launched on Dec, 1st, but it looks like Google is ignoring (or giving it bad scores) for some reason. Only things that can contribute to that could be: domain privacy on the domain, redirect from the www to the subdomain we use (we did this because it will be a multi-language site, so we'll assign to each country a subdomain), recency (it has been put online on Dec 1st and the domain is just a couple of months old). Or maybe because we blocked crawlers for a few days before the launch? Exactly a few days before Dec 1st. What do you think? What could be the reason for that? Thanks guys!
Technical SEO | | ruggero0 -
How to check if an individual page is indexed by Google?
So my understanding is that you can use site: [page url without http] to check if a page is indexed by Google, is this 100% reliable though? Just recently Ive worked on a few pages that have not shown up when Ive checked them using site: but they do show up when using info: and also show their cached versions, also the rest of the site and pages above it (the url I was checking was quite deep) are indexed just fine. What does this mean? thank you p.s I do not have WMT or GA access for these sites
Technical SEO | | linklander0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
How to Stop Google from Indexing Old Pages
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact). These pages no longer exist on the site and there are no internal or external links pointing to these pages. Google has crawled the site since the go live, but continues to try and crawl these pages. What are my next steps?
Technical SEO | | rhoadesjohn0