Indexed Pages in Google, How do I find Out?
-
Is there a way to get a list of pages that google has indexed?
Is there some software that can do this?
I do not have access to webmaster tools, so hoping there is another way to do this.
Would be great if I could also see if the indexed page is a 404 or other
Thanks for your help, sorry if its basic question
-
If you want to find all your indexed pages in Google just type: site:yourdomain.com or .co.uk or other without the www.
-
Hi John,
Hope I'm not too late to the party! When checking URL's for their cache status I suggest using Scrapebox (with proxies).
Be warned, it was created as a black-hat tool, and as such is frowned upon, but there are a number of excellent white-hat uses for it! Costs $57 one off
-
sorry to keep sending you messages but I wanted to make sure that you know SEOmoz does have a fantastic tool for what you are requesting. Please look at this link and then click on the bottom where it should says show more and I believe you will agree it does everything you've asked and more.
http://pro.seomoz.org/tools/crawl-test
Sincerely,
Thomas
does this answer your question?
-
What giving you a 100 limit?
try using Raven tools or spider mate they both have excellent free trials and allow you quite a bit of information.
-
Neil you are correct I agree with screaming frog is excellent they definitely will show you your site. Here is a link from SEOmoz associate that I believe will benefit you
http://www.seomoz.org/q/404-error-but-i-can-t-find-any-broken-links-on-the-referrer-pages
sincerely,
Thomas
-
this is what I am looking for Thanks
Strange that there is no tool I can buy to do this in full without the 100 limit
Anyway, i will give that a go
-
can I get your sites URL? By the way this might be a better way into Google Webmaster tools
if you have a Gmail account use that if you don't just sign up using your regular e-mail.
Of course using SEOmoz via http://pro.seomoz.org/tools/crawl-test will give you a full rundown of all of your links and how they're running. Are you not seen all of them?
Another tool I have found very useful. Is website analysis as well as their midsize product from Alexia
I hope I have helped,
Tom
-
If you don't have access to Webmaster Tools, the most basic way to see which pages Google has indexed is obviously to do a site: search on Google itself - like "site:google.com" - to return pages of SERPs containing the pages from your site which Google has indexed.
Problem is, how do you get the data from those SERPs in a useful format to run through Screaming Frog or similar?
Enter Chris Le's Google Scraper for Google Docs
It will let scrape the first 100 results, then let you offset your search by 100 and get the next 100, etc.. slightly cumbersome, but it will achieve what you want to do.
Then you can crawl the URLs using Screaming Frog or another crawler.
-
just thought I might add these links these might help explain it better than I did.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1352276
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=2409443&topic=2446029&ctx=topic
http://pro.seomoz.org/tools/crawl-test
you should definitely sign up for Google Webmaster tools it is free here is a link all you need to do is add an e-mail address and password
http://support.google.com/webmasters/bin/topic.py?hl=en&topic=1724121
I hope I have been of help to you sincerely,
Thomas
-
Thanks for the reply.
I do not have access to webmaster tools and the seomoz tools do not show a great deal of the pages on my site for some reason
Majestic shows up to 100 pages. Ahrefs shows some also.
I need to compare what google has indexed and the status of the page
Does screaming frog do thiss?
-
Google Webmaster tools should supply you with this information. In addition Seomoz tools will tell you that and more. Run your website through the campaign section of seomoz you will then see any issues with your website.
You may also want to of course use Google Webmaster tools run a test as a Google bot the Google but should show you any issues you are having such is 404's or other fun things that websites do.
If you're running WordPress there are plenty of plug-ins I recommend 404 returned
sincerely,
Thomas
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages excluded from Google's index due to "different canonicalization than user"
Hi MOZ community, A few weeks ago we noticed a complete collapse in traffic on some of our pages (7 out of around 150 blog posts in question). We were able to confirm that those pages disappeared for good from Google's index at the end of January '18, they were still findable via all other major search engines. Using Google's Search Console (previously Webmastertools) we found the unindexed URLs in the list of pages being excluded because "Google chose different canonical than user". Content-wise, the page that Google falsely determines as canonical instead has little to no similarity to the pages it thereby excludes from the index. False canonicalization About our setup: We are a SPA, delivering our pages pre-rendered, each with an (empty) rel=canonical tag in the HTTP header that's then dynamically filled with a self-referential link to the pages own URL via Javascript. This seemed and seems to work fine for 99% of our pages but happens to fail for one of our top performing ones (which is why the hassle 😉 ). What we tried so far: going through every step of this handy guide: https://moz.com/blog/panic-stations-how-to-handle-an-important-page-disappearing-from-google-case-study --> inconclusive (healthy pages, no penalties etc.) manually requesting re-indexation via Search Console --> immediately brought back some pages, others shortly re-appeared in the index then got kicked again for the aforementioned reasons checking other search engines --> pages are only gone from Google, can still be found via Bing, DuckDuckGo and other search engines Questions to you: How does the Googlebot operate with Javascript and does anybody know if their setup has changed in that respect around the end of January? Could you think of any other reason to cause the behavior described above? Eternally thankful for any help! ldWB9
Intermediate & Advanced SEO | | SvenRi1 -
Trying to find example of in app indexing in SERPs
My colleague who is a developer is trying to find an example of in apps being indexed in the SERPs. Does anybody know of any examples? Thanks+
Intermediate & Advanced SEO | | RosemaryB0 -
Problem with Google finding our website
We have an issue with Google finding our website: (URL removed) When we google "(keyword removed)" in google.com.au, our website doesn't come up anywhere. This is despite inserting the suitable title tag and onsite copy for SEO. We found this strange, and thought we'd investigate further. We decided to just google the website URL in google.com.au, to see if it was being properly found. Our site appeared at the top but with this description: A description for this result is not available because of this site's robots.txt – learn more. We also can see that the incorrect title tag is appearing. From this, we assumed that there must be an issue with the robot.txt file. We decided to put a new robot.txt file up: (URL removed) This hasn't solved the problem though and we still have the same issue. If someone could get to the bottom of this for us, we would be most appreciative. We are thinking that there may possibly be another robot.txt file that we can't find that is causing issues, or something else we're not sure of! We want to get to the bottom of it so that the site can be appropriately found. Any help here would be most appreciated!
Intermediate & Advanced SEO | | Gavo0 -
Google indexing only 1 page out of 2 similar pages made for different cities
We have created two category pages, in which we are showing products which could be delivered in separate cities. Both pages are related to cake delivery in that city. But out of these two category pages only 1 got indexed in google and other has not. Its been around 1 month but still only Bangalore category page got indexed. We have submitted sitemap and google is not giving any crawl error. We have also submitted for indexing from "Fetch as google" option in webmasters. www.winni.in/c/4/cakes (Indexed - Bangalore page - http://www.winni.in/sitemap/sitemap_blr_cakes.xml) 2. http://www.winni.in/hyderabad/cakes/c/4 (Not indexed - Hyderabad page - http://www.winni.in/sitemap/sitemap_hyd_cakes.xml) I tried searching for "hyderabad site:www.winni.in" in google but there also http://www.winni.in/hyderabad/cakes/c/4 this link is not coming, instead of this only www.winni.in/c/4/cakes is coming. Can anyone please let me know what could be the possible issue with this?
Intermediate & Advanced SEO | | abhihan0 -
Why is this site not indexed by Google?
Hi all and thanks for your help in advance. I've been asked to take a look at a site, http://www.yourdairygold.ie as it currently does not appear for its brand name, Your Dairygold on Google Ireland even though it's been live for a few months now. I've checked all the usual issues such as robots.txt (doesn't have one) and the robots meta tag (doesn't have them). The even stranger thing is that the site does rank on Yahoo! and Bing. Google Webmaster Tools shows that Googlebot is crawling around 150 pages a day but the total number of pages indexed is zero. It does appear if you carry out a site: search on Google however. The site is very poorly optimised in terms of title tags, unnecessary redirects etc which I'm working on now but I wondered if you guys had any further insights. Thanks again for your help.
Intermediate & Advanced SEO | | iProspect-Ireland0 -
Google is Really Slow to Index my New Website
(Sorry for my english!) A quick background: I had a website at thewebhostinghero.com which had been slapped left and right by Google (both Panda & Penguin). It also had a manual penalty for unnatural links which had been lifted in late april / early may this year. I also had another domain, webhostinghero.com, which was redirecting to thewebhostinghero.com. When I realized I would be better off starting a new website than trying to salvage thewebhostinghero.com, I removed the redirection from webhostinghero.com and started building a new website. I waited about 5 or 6 weeks before putting any content on webhostinghero.com so Google had time to notice that the domain wasn't redirecting anymore. So about a month ago, I launched http://www.webhostinghero.com with 100% new content but I left thewebhostinghero.com online because it still brings a little (necessary) income. There are no links between the websites except on one page (www.thewebhostinghero.com/speed/) which is set to "noindex,nofollow" and is disallowed to search engines in robots.txt. I made sure the web page was deindexed before adding a "nofollow" link from thewebhostinghero.com/speed => webhostinghero.com/speed Since the new website launch, I've been publishing new content (from 2 to 5 posts) daily. It's getting some traction from social networks but it gets barely any clicks from Google search. It seems to take at least a week before Google indexes new posts and not all posts are indexed. The cached copy of the homepage is 12 days old. In Google Webmaster Tools, it looks like Google isn't getting the latest sitemap version unless I resubmit it manually. It's always 4 or 5 days old. So is my website just too young or could it have some kind of penalty related to the old website? The domain has 4 or 5 really old spammy links from the previous domain owner which I couldn't get rid of but otherwise I don't think there's anything tragic.
Intermediate & Advanced SEO | | sbrault740 -
Ranking with other pages not index
The site ranks on page 4-5 with other page like privacy, about us, term pages. I encounter this problem allot in the last weeks; this usually occurs after the page sits 1-2 months on page 1 for the terms. I'm thinking of to much use the same anchor as a primary issue. The sites in questions are 1-5 pages microniche sites. Any suggestions is appreciated. Thank You
Intermediate & Advanced SEO | | m3fan0 -
How to find what Googlebot actually sees on a page?
1. When I disable java-script in Firefox and load our home page, it is missing entire middle section. 2. Also, the global nav dropdown menu does not display at all. (with java-script disabled) I believe this is not good. 3. But when type in <website name="">in Google search and click on the cached version of home page > and then click on text only version, It displays the Global nav links fine.</website> 4. When I switch the user agent to Googlebot(using Firefox plugin "User Agent Swticher)), the home page and global nav displays fine. Should I be worried about#1 and #2 then? How to find what Googlebot actually sees on a page? (I have tried "Fetch as Googlebot" from GWT. It displays source code.) Thanks for the help! Supriya.
Intermediate & Advanced SEO | | Amjath0