Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Indexed pages
-
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers...
- Google Search Console: 237 indexed pages
- Google search using site command: 468 results
- MOZ site crawl: 1013 unique URLs
- Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500)
Can anyone shed any light on why they differ so much? And where lies the truth?
-
Another option is if the site uses a CMS. If so, then you can create a sitemap for content pages/posts etc,.
Personally, I'm with Krzysztof Furtak on SF. Screaming Frog rocks. It'll find most pages, except perhaps Orphan pages as it wouldn't be able to find a link to crawl to discover the page.
If it's really important to get as many pages as possible, I'd do the following (I've put an Astrix (*) next to ones that some people may think are a tad extreme)
- Run a Screaming Frog crawl
- Grab a sitemap from your CMS
- Check any server-based analytics (AWSTATS etc)
- Check your access_log file & parse out URLs in there**(*)**
- site: queries, with & without www, and also using * as a subdomain (use something like Moz's toolbar to export)
- As Krzysztof suggests, Scrapebox would extract data too, but be careful scraping, you may get an IP slap.(*)
- Export crawl data from Moz & a tool such as Deep Crawl
- Throw the pages from all into Excel and de-dupe.
- Once you have a de-duped list, as an optional last step, go back to Screaming Frog and enter list mode (I have the paid version, not sure if it's possible with the free one) and run a crawl over all the de-duped URLs to get status codes etc
If you're going to do this sort of thing a fair bit - buy a Screaming Frog license, it's an awesome tool and can be useful in a multitude of situations.
-
The site: command is handy for asking Google what pages it knows about, however if Muzzmoz wants to know the number of pages on a site, you'd need more than this.
Also, re: your different ways or querying, I like to use:
site:*.domain.com - This can show other subdomains too, that may otherwise be missed
-
Ok so check with site something under 1000 pages and go to the last results page. You'll see that there'll be different number (in almost all cases).
-
I Will Always Prefer To Check Manually Using Site Command Because, site: operator, which will show us how many pages Google currently has indexed for the domain.
There Will Be Difference Between Index status in search console and current index as search console update the data after few days.
The number of indexed URLs is almost always significantly smaller than the number of crawled URLs, because Total indexed excludes URLs identified as duplicates, non-canonical or those that contain a meta no index tag.
Also, Check For Index(Preferred) Version Of Your Site
For E.g-
You can check More About this Here - https://support.google.com/webmasters/answer/2642366?hl=en
-
Hi
Most accurate number is from screaming frog (if you have less than 500 pages or paid version if more than 500).
Google indexes what it wants and if good enough to show in google index. If some pages are similar, got quality issues, blocked by robots etc then it won't show all. BTW don't think number in GSC or google index is good, check it manually because there can be 468 but in fact 200 only.
Moz can have "historical" pages that now don't exists or don't care about quality issues.
The truth is in screaming frog - most accurate number. If you used google user agent then number is the max that can appear in google index. If screaming frog user agent with turned off robots then you'll see bigger number (but google won't show it because of blocks).
If you want to check what's indexed then use tool like scrapebox. First get all urls (maybe without images if you don't care), then check indexed with sb. What's not indexed, can have some issues.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Why My site pages getting video index viewport issue?
Hello, I have been publishing a good number of blogs on my site Flooring Flow. Though, there's been an error of the video viewport on some of my articles. I have tried fixing it but the error is still showing in Google Search Console. Can anyone help me fix it out?
Technical SEO | | mitty270 -
Should I "no-index" two exact pages on Google results?
Hello everyone, I recently started a new wordpress website and created a static homepage. I noticed that on Google search results, there are two different URLs landing on same content page. I've attached an image to explain what I saw. Should I "no-index" the page url? Google url.JPG In this picture, the first result is the homepage and I try to rank for that page. The last result is landing on same content with different URL. So, should I no-index last result as shown in image?
Technical SEO | | amanda59640 -
URLs dropping from index (Crawled, currently not indexed)
I've noticed that some of our URLs have recently dropped completely out of Google's index. When carrying out a URL inspection in GSC, it comes up with 'Crawled, currently not indexed'. Strangely, I've also noticed that under referring page it says 'None detected', which is definitely not the case. I wonder if it could be something to do with the following? https://www.seroundtable.com/google-ranking-index-drop-30192.html - It seems to be a bug affecting quite a few people. Here are a few examples of the URLs that have gone missing: https://www.ihasco.co.uk/courses/detail/sexual-harassment-awareness-training https://www.ihasco.co.uk/courses/detail/conflict-resolution-training https://www.ihasco.co.uk/courses/detail/prevent-duty-training Any help here would be massively appreciated!
Technical SEO | | iHasco0 -
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Indexing Issue of Dynamic Pages
Hi All, I have a query for which i am struggling to find out the answer. I unable to retrieve URL using "site:" query on Google SERP. However, when i enter the direct URL or with "info:" query then a snippet appears. I am not able to understand why google is not showing URL with "site:" query. Whether the page is indexed or not? Or it's soon going to be deindexed. Secondly, I would like to mention that this is a dynamic URL. The index file which we are using to generate this URL is not available to Google Bot. For instance, There are two different URL's. http://www.abc.com/browse/ --- It's a parent page.
Technical SEO | | SameerBhatia
http://www.abc.com/browse/?q=123 --- This is the URL, generated at run time using browse index file. Google unable to crawl index file of browse page as it is unable to run independently until some value will get passed in the parameter and is not indexed by Google. Earlier the dynamic URL's were indexed and was showing up in Google for "site:" query but now it is not showing up. Can anyone help me what is happening here? Please advise. Thanks0 -
Home Page Ranking Instead of Service Pages
Hi everyone! I've noticed that many of our clients have pages addressing specific queries related to specific services on their websites, but that the Home Page is increasingly showing as the "ranking" page. For example, a plastic surgeon we work with has a page specifically talking about his breast augmentation procedure for Miami, FL but instead of THAT page showing in the search results, Google is using his home page. Noticing this across the board. Any insights? Should we still be optimizing these specific service pages? Should I be spending time trying to make sure Google ranks the page specifically addressing that query because it SHOULD perform better? Thanks for the help. Confused SEO :/, Ricky Shockley
Technical SEO | | RickyShockley0 -
Where to put Schema On Page
What part of my page should I put Schema data? Header? Footer? Also All pages? or just home page?
Technical SEO | | bozzie3114 -
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
Hi, I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much. Sincerely SEOmoz Pro Member
Technical SEO | | fugu0