Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Indexed Pages in Google, How do I find Out?
-
Is there a way to get a list of pages that google has indexed?
Is there some software that can do this?
I do not have access to webmaster tools, so hoping there is another way to do this.
Would be great if I could also see if the indexed page is a 404 or other
Thanks for your help, sorry if its basic question
-
If you want to find all your indexed pages in Google just type: site:yourdomain.com or .co.uk or other without the www.
-
Hi John,
Hope I'm not too late to the party! When checking URL's for their cache status I suggest using Scrapebox (with proxies).
Be warned, it was created as a black-hat tool, and as such is frowned upon, but there are a number of excellent white-hat uses for it! Costs $57 one off
-
sorry to keep sending you messages but I wanted to make sure that you know SEOmoz does have a fantastic tool for what you are requesting. Please look at this link and then click on the bottom where it should says show more and I believe you will agree it does everything you've asked and more.
http://pro.seomoz.org/tools/crawl-test
Sincerely,
Thomas
does this answer your question?
-
What giving you a 100 limit?
try using Raven tools or spider mate they both have excellent free trials and allow you quite a bit of information.
-
Neil you are correct I agree with screaming frog is excellent they definitely will show you your site. Here is a link from SEOmoz associate that I believe will benefit you
http://www.seomoz.org/q/404-error-but-i-can-t-find-any-broken-links-on-the-referrer-pages
sincerely,
Thomas
-
this is what I am looking for
Thanks
Strange that there is no tool I can buy to do this in full without the 100 limit
Anyway, i will give that a go
-
can I get your sites URL? By the way this might be a better way into Google Webmaster tools
if you have a Gmail account use that if you don't just sign up using your regular e-mail.
Of course using SEOmoz via http://pro.seomoz.org/tools/crawl-test will give you a full rundown of all of your links and how they're running. Are you not seen all of them?
Another tool I have found very useful. Is website analysis as well as their midsize product from Alexia
I hope I have helped,
Tom
-
If you don't have access to Webmaster Tools, the most basic way to see which pages Google has indexed is obviously to do a site: search on Google itself - like "site:google.com" - to return pages of SERPs containing the pages from your site which Google has indexed.
Problem is, how do you get the data from those SERPs in a useful format to run through Screaming Frog or similar?
Enter Chris Le's Google Scraper for Google Docs
It will let scrape the first 100 results, then let you offset your search by 100 and get the next 100, etc.. slightly cumbersome, but it will achieve what you want to do.
Then you can crawl the URLs using Screaming Frog or another crawler.
-
just thought I might add these links these might help explain it better than I did.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1352276
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=2409443&topic=2446029&ctx=topic
http://pro.seomoz.org/tools/crawl-test
you should definitely sign up for Google Webmaster tools it is free here is a link all you need to do is add an e-mail address and password
http://support.google.com/webmasters/bin/topic.py?hl=en&topic=1724121
I hope I have been of help to you sincerely,
Thomas
-
Thanks for the reply.
I do not have access to webmaster tools and the seomoz tools do not show a great deal of the pages on my site for some reason
Majestic shows up to 100 pages. Ahrefs shows some also.
I need to compare what google has indexed and the status of the page
Does screaming frog do thiss?
-
Google Webmaster tools should supply you with this information. In addition Seomoz tools will tell you that and more. Run your website through the campaign section of seomoz you will then see any issues with your website.
You may also want to of course use Google Webmaster tools run a test as a Google bot the Google but should show you any issues you are having such is 404's or other fun things that websites do.
If you're running WordPress there are plenty of plug-ins I recommend 404 returned
sincerely,
Thomas
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Mass Removal Request from Google Index
Hi, I am trying to cleanse a news website. When this website was first made, the people that set it up copied all kinds of articles they had as a newspaper, including tests, internal communication, and drafts. This site has lots of junk, but this kind of junk was on the initial backup, aka before 1st-June-2012. So, removing all mixed content prior to that date, we can have pure articles starting June 1st, 2012! Therefore My dynamic sitemap now contains only articles with release date between 1st-June-2012 and now Any article that has release date prior to 1st-June-2012 returns a custom 404 page with "noindex" metatag, instead of the actual content of the article. The question is how I can remove from the google index all this junk as fast as possible that is not on the site anymore, but still appears in google results? I know that for individual URLs I need to request removal from this link
Intermediate & Advanced SEO | | ioannisa
https://www.google.com/webmasters/tools/removals The problem is doing this in bulk, as there are tens of thousands of URLs I want to remove. Should I put the articles back to the sitemap so the search engines crawl the sitemap and see all the 404? I believe this is very wrong. As far as I know this will cause problems because search engines will try to access non existent content that is declared as existent by the sitemap, and return errors on the webmasters tools. Should I submit a DELETED ITEMS SITEMAP using the <expires>tag? I think this is for custom search engines only, and not for the generic google search engine.
https://developers.google.com/custom-search/docs/indexing#on-demand-indexing</expires> The site unfortunatelly doesn't use any kind of "folder" hierarchy in its URLs, but instead the ugly GET params, and a kind of folder based pattern is impossible since all articles (removed junk and actual articles) are of the form:
http://www.example.com/docid=123456 So, how can I bulk remove from the google index all the junk... relatively fast?0 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
Why is Google ranking irrelevant / not preferred pages for keywords?
Over the past few months we have been chipping away at duplicate content issues. We know this is our biggest issue and is working against us. However, it is due to this client also owning the competitor site. Therefore, product merchandise and top level categories are highly similar, including a shared server. Our rank is suffering major for this, which we understand. However, as we make changes, and I track and perform test searches, the pages that Google ranks for keywords never seems to match or make sense, at all. For example, I search for "solid scrub tops" and it ranks the "print scrub tops" category. Or the "Men Clearance" page is ranking for keyword "Women Scrub Pants". Or, I will search for a specific brand, and it ranks a completely different brand. Has anyone else seen this behavior with duplicate content issues? Or is it an issue with some other penalty? At this point, our only option is to test something and see what impact it has, but it is difficult to do when keywords do not align with content.
Intermediate & Advanced SEO | | lunavista-comm0 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
How do you check the google cache for hashbang pages?
So we use http://webcache.googleusercontent.com/search?q=cache:x.com/#!/hashbangpage to check what googlebot has cached but when we try to use this method for hashbang pages, we get the x.com's cache... not x.com/#!/hashbangpage That actually makes sense because the hashbang is part of the homepage in that case so I get why the cache returns back the homepage. My question is - how can you actually look up the cache for hashbang page?
Intermediate & Advanced SEO | | navidash0 -
No-index pages with duplicate content?
Hello, I have an e-commerce website selling about 20 000 different products. For the most used of those products, I created unique high quality content. The content has been written by a professional player that describes how and why those are useful which is of huge interest to buyers. It would cost too much to write that high quality content for 20 000 different products, but we still have to sell them. Therefore, our idea was to no-index the products that only have the same copy-paste descriptions all other websites have. Do you think it's better to do that or to just let everything indexed normally since we might get search traffic from those pages? Thanks a lot for your help!
Intermediate & Advanced SEO | | EndeR-0 -
Malicious site pointed A-Record to my IP, Google Indexed
Hello All, I launched my site on May 1 and as it turns out, another domain was pointing it's A-Record to my IP. This site is coming up as malicious, but worst of all, it's ranking on keywords for my business objectives with my content and metadata, therefore I'm losing traffic. I've had the domain host remove the incorrect A-Record and I've submitted numerous malware reports to Google, and attempted to request removal of this site from the index. I've resubmitted my sitemap, but it seems as though this offending domain is still being indexed more thoroughly than my legitimate domain. Can anyone offer any advice? Anything would be greatly appreciated! Best regards, Doug
Intermediate & Advanced SEO | | FranGen0