Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to find all indexed pages in Google?
-
Hi,
We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools.
How can I get a list of all pages indexed of our domain? trying to locate the duplicate content.
Doing a "site:www.mydomain.com" only returns up to 676 results...
Any ideas?
Thanks,
Ben
-
You are absolutely right. But if you think that you have duplicate content issues, then Screaming Frog can help you tease that out.
That is also why I suggested the SEOmoz tool, since it is supposed to mimick a SE spider, it can give you a pretty good idea of any issues that you might have.
Using the advanced operator of site:domain makes sense, but if there are issues there like eyepaq said, it is going to be tough sledding.
My suggestion would be to download take a closer look at what GWT is telling you. Are there duplicates there? Is your CMS auto-generating URL's? That is probably going to be your best bet IMO.
Best of luck!
-
@BJS, I would export a file from GWT and filter the results. If your URLs are in GWT, then most likely it's indexed in Google.
-
Thank you to everyone that contributed.
@Zeph and @Francisco - I do use Screaming Frog, but actually, correct me if I am wrong, but it does not show a list of pages indexed, but rather pages that exist in the site - not what Google has already indexed. Thanks anyway
What I wanted was a way of creating a list of all indexed pages in Google - not a count.
But thank you all the same!
-
Hey Zeph! Hope your company is doing great.
@Ben, screaming frog is good for this. You will need to get the paid version of it. There is a video on the site http://www.screamingfrog.co.uk/seo-spider/. Use filters to get to your real URLs.
-
Hi,
There are tools that you can use - though for close 50k pages is harder to crawl. Best bet is the Web master tools count - although is not 100% exact either.
The site:domain is a good indicator but it's generated "on the fly" but it will show you a better result if you go "deeper" and click on page 10-20 and so on.
However right now it looks like there is an issue with site:domain. for more info see: http://www.seroundtable.com/google-site-command-cluster-16829.html
Cheers.
-
Use the tool Screaming Frog to see all your pages, that should help. Also, the SEOmoz toolset has a function that will show you all duplicate content (if you are a pro subscriber).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing Of Pages As HTTPS vs HTTP
We recently updated our site to be mobile optimized. As part of the update, we had also planned on adding SSL security to the site. However, we use an iframe on a lot of our site pages from a third party vendor for real estate listings and that iframe was not SSL friendly and the vendor does not have that solution yet. So, those iframes weren't displaying the content. As a result, we had to shift gears and go back to just being http and not the new https that we were hoping for. However, google seems to have indexed a lot of our pages as https and gives a security error to any visitors. The new site was launched about a week ago and there was code in the htaccess file that was pushing to www and https. I have fixed the htaccess file to no longer have https. My questions is will google "reindex" the site once it recognizes the new htaccess commands in the next couple weeks?
Intermediate & Advanced SEO | | vikasnwu1 -
Why is Google ranking irrelevant / not preferred pages for keywords?
Over the past few months we have been chipping away at duplicate content issues. We know this is our biggest issue and is working against us. However, it is due to this client also owning the competitor site. Therefore, product merchandise and top level categories are highly similar, including a shared server. Our rank is suffering major for this, which we understand. However, as we make changes, and I track and perform test searches, the pages that Google ranks for keywords never seems to match or make sense, at all. For example, I search for "solid scrub tops" and it ranks the "print scrub tops" category. Or the "Men Clearance" page is ranking for keyword "Women Scrub Pants". Or, I will search for a specific brand, and it ranks a completely different brand. Has anyone else seen this behavior with duplicate content issues? Or is it an issue with some other penalty? At this point, our only option is to test something and see what impact it has, but it is difficult to do when keywords do not align with content.
Intermediate & Advanced SEO | | lunavista-comm0 -
Google indexing pages from chrome history ?
We have pages that are not linked from site yet they are indexed in Google. It could be possible if Google got these pages from browser. Does Google takes data from chrome?
Intermediate & Advanced SEO | | vivekrathore0 -
Can I tell Google to Ignore Parts of a Page?
Hi all, I was wondering if there was some sort of html trick that I could use to selectively tell a search engine to ignore texts on certain parts of a page. Thanks!
Intermediate & Advanced SEO | | Charles_Murdock
Charles0 -
Links from non-indexed pages
Whilst looking for link opportunities, I have noticed that the website has a few profiles from suppliers or accredited organisations. However, a search form is required to access these pages and when I type cache:"webpage.com" the page is showing up as non-indexed. These are good websites, not spammy directory sites, but is it worth trying to get Google to index the pages? If so, what is the best method to use?
Intermediate & Advanced SEO | | maxweb0 -
Why does Google add my domain as a suffix to page title in SERPS?
Hi, If I do a search in Google - for one our products on our site, our site comes up - but it would appear that google is adding our domain name as a suffix to our title in the results... Anyone else seen this? Can I do anything about it? I would prefer it not to appear. Thanks!
Intermediate & Advanced SEO | | bjs20100 -
Should pages of old news articles be indexed?
My website published about 3 news articles a day and is set up so that old news articles can be accessed through a "back" button with articles going to page 2 then page 3 then page 4, etc... as new articles push them down. The pages include a link to the article and a short snippet. I was thinking I would want Google to index the first 3 pages of articles, but after that the pages are not worthwhile. Could these pages harm me and should they be noindexed and/or added as a canonical URL to the main news page - or is leaving them as is fine because they are so deep into the site that Google won't see them, but I also won't be penalized for having week content? Thanks for the help!
Intermediate & Advanced SEO | | theLotter0 -
Google is indexing wordpress attachment pages
Hey, I have a bit of a problem/issue what is freaking me out a bit. I hope you can help me. If i do site:www.somesitename.com search in Google i see that Google is indexing my attachment pages. I want to redirect attachment URL's to parent post and stop google from indexing them. I have used different redirect plugins in hope that i can fix it myself but plugins don't work. I get a error:"too many redirects occurred trying to open www.somesitename.com/?attachment_id=1982 ". Do i need to change something in my attachment.php fail? Any idea what is causing this problem? get_header(); ?> /* Run the loop to output the attachment. * If you want to overload this in a child theme then include a file * called loop-attachment.php and that will be used instead. */ get_template_part( 'loop', 'attachment' ); ?>
Intermediate & Advanced SEO | | TauriU0