How to check if the page is indexable for SEs?
-
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on.
So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods:
- Check the URL in robots.txt file (if it's not disallowed)
- Check page metas (if there are not noindex meta)
- Check if page is the same for unregistered users (for those pages only available for registered users of the site)
Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines?
Thanks in advance!
-
I understand the difference between what you're doing and what Google shows, I guess I'm just not sure when I'd want to know that something could technically be indexed, but isn't?
I guess I'm not your target market! Good luck with your tool.
-
With "site:site.com" you can only see if the page is indexED, but to know if it's indexABLE you need to dig deeper. That is why I've decided to automate this process.
As I already told, this gonna be a browser extension, once you got on any page, this ext. automatically checks the page, and show the status (with color, I guess), if this page indexed, if not - it shows if its indexABLE. When I'm looking for linkbuilding resources, this little tool should help a lot
-
Ah, gotcha. Personally, I use Google itself to find out if something is indexable: if it's my own site, I can use Fetch as Google, and the robots.txt tester; if it's another site, you can search for "site:[URL]" to see if Google's indexed it.
I think this tool could be really good if you keep it as an icon and it glows or something if you've accidentally deindexed the page? Then it's helping you proactively.
Hope this helps!
Kristina
-
Actually I'm not. That's why I'm asking, to not to miss this basic stuff, so I really appreciate your advice. Thank you!
If I get your question correctly, you are asking why this extension is need for?
Well, 2 main aims:
-
When I want to check any of pages on my own websites, I just visit the page and see if it's ok with all the robots stuff. (or if it should be closed from robots, see if it really is)
-
For linkbuilding purposes. When I come to the page and see a link from it to external website and I know for sure that I can get the same link to my site, I'm asking myself, if it worth getting link from the page like this, if it's gonna be indexed. Why waste your time on getting links from pages that are closed from indexation.
-
-
Hello Peter,
First of all, thank you for the great ideas.
I don't think it's necessary to call the API, as this check references to only one URL (so no aggressiveness) , I need it to be done as fast as possible. But the idea with Structured Data - bravo!
Thanks a lot!
-
You're probably already doing this, but make sure that all of your tests are using the Googlebot user agent! That could cause different results, especially with the robots.txt check.
A sense check: what is your plugin going to offer over Google Search Console's Fetch as Google and robots.txt Tester?
-
You also can check for HTTP header results for crawling too:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tagAlso you can use some of Google services for this. Specially PageSpeed API:
https://developers.google.com/speed/docs/insights/v2/reference/Once you call this API it return JSON with list of blocked resources. It's little bit slower but i found that this is safe. Some hostings have IDS (intruder detection systems) and when some crawl them little bit aggressive they block whole IP or IP range. I know few cases when site is OK to be seen from users, but blocked from Google IP. Webmasters wasn't happy when they discover this. They call hosting few times and got "there isn't issues from our side, we didn't block anything". And 6 hours later they get "seems that another department was blocked this server for few specific IPs".
About checking for logged/nonloged users. You can use StructuredData Testing Tool. Also one call to get JSON with full HTTP response and then compare it with your result.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google suddenly indexing 1,000 fewer pages. Why?
We have a site, blog.example.org, and another site, www.example.org. The most visited pages on www.example.org were redesigned; the redesign landed May 8. I would expect this change to have some effect on organic rank and conversions. But what I see is surprising; I can't believe it's related, but I mention this just in case. Between April 30 and May 7, Google stopped indexing roughly 1,000 pages on www.example.org, and roughly 3,000 pages on blog.example.org. In both cases the number of pages that fell out of the index represents appx. 15% of the overall number of pages. What would cause Google to suddenly stop indexing thousands of pages on two different subdomains? I'm just looking for ideas to dig into; no suggestion would be too basic. FWIW, the site is localized into dozens of languages.
Intermediate & Advanced SEO | | hoosteeno0 -
Can noindexed pages accrue page authority?
My company's site has a large set of pages (tens of thousands) that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed?
Intermediate & Advanced SEO | | THandorf0 -
Viewing search results for 'We possibly have internal links that link to 404 pages. What is the most efficient way to check our sites internal links?
We possibly have internal links on our site that point to 404 pages as well as links that point to old pages. I need to tidy this up as efficiently as possible and would like some advice on the best way to go about this.
Intermediate & Advanced SEO | | andyheath0 -
A/B Testing - Should I add product descriptions on my category landing pages as well as on product pages and if so . how to do this to avoid duplicate content
Hi All, I recently relaunched a new design on my tool hire eCommerce website and now display my products in grid form on my category landing pages as opposed to just a list view which we previously had on the old design. My bounce rates are alot higher than they use to be and my gut instinct is telling me maybe this is wrong . I want to do some a/b testing using a list view. My question is , previously in our list views we just showed the images and pricing and had on page content on the bottom of the page. The user would click on the product image and they would then we taken to the product page which has the product description , t&c, etc etc.. If I was to do this in my a/b testing but change it so we also displayed the product descriptions as well on the category landing pages . Is there a special way to do this as in effect, we would have duplicate content as the product descriptions are also on the product page?. Does anyone have any thoughts on this as to whether its a No No from an SEO point of view ?... Heres a short url link to one of my category pages - http://goo.gl/QJv5gw Historically we use to rank well for the category landing pages and not for the product pages.Our Rankings are down , bounce rates are higher so I am trying to sort both. We have good content on pages etc. Any advice greatly appreciated as always thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
Thousands of Web Pages Disappered from Google Index
The site is - http://shop.riversideexports.com We checked webmaster tools, nothing strange. Then we manually resubmitted using webmaster tools about a month ago. Now only seeing about 15 pages indexed. The rest of the sites on our network are heavily indexed and ranking really well. BUT the sites that are using a sub domain are not. Could this be a sub domain issue? If so, how? If not, what is causing this? Please advise. UPDATE: What we can also share is that the site was cleared twice in it's lifetime - all pages deleted and re-generated. The first two times we had full indexing - now this site hovers at 15 results in the index. We have many other sites in the network that have very similar attributes (such as redundant or empty meta) and none have behaved this way. The broader question is how to do we get the indexing back ?
Intermediate & Advanced SEO | | suredone0 -
Duplicate Page Title/Content Issues on Product Review Submission Pages
Hi Everyone, I'm very green to SEO. I have a Volusion-based storefront and recently decided to dedicate more time and effort into improving my online presence. Admittedly, I'm mostly a lurker in the Q&A forum but I couldn't find any pre-existing info regarding my situation. It could be out there. But again, I'm a noob... So, in my recent SEOmoz report I noticed that over 1,000 Duplicate Content Errors and Duplicate Page Title Errors have been found since my last crawl. I can see that every error is tied to a product in my inventory - specifically each product page has an option to write a review. It looks like the subsequent page where a visitor can fill out their review is the stem of the problem. All of my products are shown to have the same issue: Duplicate Page Title - Review:New Duplicate Page Content - the form is already partially filled out with the corresponding product My first question - It makes sense that a page containing a submission form would have the same title and content. But why is it being indexed, or crawled (or both for that matter) under every parameter in which it could be accessed (product A, B, C, etc)? My second question (an obvious one) - What can I do to begin to resolve this? As far as I know, I haven't touched this option included in Volusion other than to simply implement it. If I'm missing any key information, please point me in the right direction and I'll respond with any additional relevant information on my end. Many thanks in advance!
Intermediate & Advanced SEO | | DakotahW0 -
Why is a page with a noindex code being indexed?
I was looking through the pages indexed by Google (with site:www.mywebsite.com) and one of the results was a page with "noindex, follow" in the code that seems to be a page generated by blog searches. Any ideas why it seems to be indexed or how to de-index it?
Intermediate & Advanced SEO | | theLotter0 -
NOINDEX listing pages: Page 2, Page 3... etc?
Would it be beneficial to NOINDEX category listing pages except for the first page. For example on this site: http://flyawaysimulation.com/downloads/101/fsx-missions/ Has lots of pages such as Page 2, Page 3, Page 4... etc: http://www.google.com/search?q=site%3Aflyawaysimulation.com+fsx+missions Would there be any SEO benefit of NOINDEX on these pages? Of course, FOLLOW is default, so links would still be followed and juice applied. Your thoughts and suggestions are much appreciated.
Intermediate & Advanced SEO | | Peter2640