Indexing isolated webpages
-
Hi all,
We are running a classifieds website.Due to technical limitations, we will probably not be able to list or search expired ads, but we still can view ad details view page if you landed on expired ad from external page (or google search results).Our concern is, if the ad page is still exists, but it's totally isolated from the website (i.e not found by search option on the website and no following site links) will google remove it from the index?Thanks,T
-
I agree with Hutch42, the isolated pages are what the industry calls "orphan pages". There is some good info about the subject you may want to dive into before you make your final decision.
-
You may want to be careful keeping pages live that are basically useless to visitors, if the ads are expired and it makes people leave your site (bounce) it will hurt your entire site, not just those pages.
-
Thanks Hutch42, I actually want to keep ranking for these expired ads despite not having them displated in the classified ads list since they have decent ranking on some long tail searches and I can't create specific landing pages with fresh content to target these searches yet.
-
It will not remove a page just because the link is gone. Your best bet would be to set up your back end to automaticly add a noindex meta tag into the of ads once they expire.
Example of tag
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index Bloat: Canonicalize, Redirect or Delete URLs?
I was doing some simple on-page recommendations for a client and realized that they have a bit of a website bloat problem. They are an ecommerce shoe store and for one product, there could be 10+ URLs. For example, this is what ONE product looks like: example.com/products/shoename-color1 example.com/products/shoename-color2 example.com/collections/style/products/shoename-color1 example.com/collections/style/products/shoename-color2 example.com/collections/adifferentstyle/products/shoename-color1 example.com/collections/adifferentstyle/products/shoename-color2 example.com/collections/shop-latest-styles/products/shoename-color1 example.com/collections/shop-latest-styles/products/shoename-color2 example.com/collections/all/products/shoename-color1 example.com/collections/all/products/shoename-color2 ...and so on... all for the same shoe. They have about 20-30 shoes altogether, and some come in 4-5 colors. This has caused some major bloat on their site and I assume some confusion for the search engine. That said, I'm trying to figure out what the best way to tackle this is from an SEO perspective. Here's where I've gotten to so far: Is it better to canonicalize all URLs, referencing back to one "main" one, delete all bloat pages re-link everything to the main one(s), or 301 redirect the bloat URLs back to the "main" one(s)? Or is there another option that I haven't considered? Thanks!
Intermediate & Advanced SEO | | AJTSEO0 -
This one is complicated... canonicals, href lang tags and no index
Bear with me, this is complicated (I REALLY hope one of you comes along and says, no it isn't!) Scenario A client has multiple english pages, as they have a unique product offering in AUS, US, UK, NZ and also have a global site in english. Obviously there is a lot of duplicate content and they have the relevant href lang tags set-up to help Google untangle what should be ranked where. They also have rel-canonical on each page. I've set-up search console for each of the folder structures, i.e. en-us, en-gb, en-au and so on. They have an optimised page for one of their primary keywords, which ranks nowhere for this exact keyword, but this page DOES rank for 40 similar keywords. For the exact keyword, they rank 52nd, and frustratingly, it's the homepage that ranks. We know the correct page is ranking and is indexed because search console tells us so and we see the exact page appear in SERPs for the other 40 keywords. When I look at the en-us site in Search Console, it tells me that the home page is not being indexed, because a rel canonical tag is prioritising an alternative page (probably the global site) - however, the en-us homepage is showing up in rankings for a lot of their important keywords. The site has been live for 6 months and the optimised page for about 3 months. Questions 1. If search console is saying the homepage is not ranking, how is it showing up in SERPs?
Intermediate & Advanced SEO | | Algorhythm_jT
2. Why is the homepage ranking for this important keyword, when there is virtually no mention of the keyword versus the page that is almost perfect according to Moz's on-page grader?
3. Do you need href lang tags AND rel canonical on a page?
4. How long before a new page that is optimised for a keyword take to replace (and hopefully surpass) the homepage?
5. If the US is the most important market, should we guide Google to that fact using rel-canonical? Really appreciate your feedback, hivemind. Thanks0 -
Fetch as Google -- Does not result in pages getting indexed
I run a exotic pet website which currently has several types of species of reptiles. It has done well in SERP for the first couple of types of reptiles, but I am continuing to add new species and for each of these comes the task of getting ranked and I need to figure out the best process. We just released our 4th species, "reticulated pythons", about 2 weeks ago, and I made these pages public and in Webmaster tools did a "Fetch as Google" and index page and child pages for this page: http://www.morphmarket.com/c/reptiles/pythons/reticulated-pythons/index While Google immediately indexed the index page, it did not really index the couple of dozen pages linked from this page despite me checking the option to crawl child pages. I know this by two ways: first, in Google Webmaster Tools, if I look at Search Analytics and Pages filtered by "retic", there are only 2 listed. This at least tells me it's not showing these pages to users. More directly though, if I look at Google search for "site:morphmarket.com/c/reptiles/pythons/reticulated-pythons" there are only 7 pages indexed. More details -- I've tested at least one of these URLs with the robot checker and they are not blocked. The canonical values look right. I have not monkeyed really with Crawl URL Parameters. I do NOT have these pages listed in my sitemap, but in my experience Google didn't care a lot about that -- I previously had about 100 pages there and google didn't index some of them for more than 1 year. Google has indexed "105k" pages from my site so it is very happy to do so, apparently just not the ones I want (this large value is due to permutations of search parameters, something I think I've since improved with canonical, robots, etc). I may have some nofollow links to the same URLs but NOT on this page, so assuming nofollow has only local effects, this shouldn't matter. Any advice on what could be going wrong here. I really want Google to index the top couple of links on this page (home, index, stores, calculator) as well as the couple dozen gene/tag links below.
Intermediate & Advanced SEO | | jplehmann0 -
Google is not indexing an updated website
We just relaunched a website that has 5 years old, we maintain all the old URLs and articles but for some reason google is not picking up the new website https://www.navisyachts.com. In Google Webmaster Tools we can see the sitemap with over 1000 pages submitted but shows nothing as indexed. The site is loosing traffic rapidly and positions, from the SEO side all looks fine for me. What can be wrong? I’ll appreciate any help. The new website is built over Joomla 3.4, we have it here at MOZ and other than some minor details it doesn't show that something can be wrong with the website. Thank you.
Intermediate & Advanced SEO | | FWC_SEO0 -
PDFs and webpages
If a website provides PDF versions of the page as a download option, should the PDF be no-indexed in your opinion? We have to offer PDF versions of the webpage as our customers want them, they are a group who will download/print the pdfs. I thought of leaving the pdfs alone as they site in a subdomain but the more I think about it, I should probably noindex them. My reasons They site in a subdomain, if users have linked to them, my main domain isn't getting the rank juice Duplication issues, they might be affecting the rank of the existing webpages I can't track the PDF as they are in a subdomain, I can see event clicks to them from the main site though On the flipside I could lose out on the traffic the pdfs bring when a user loads it from an organic search and any link existing on the pdf What are your experiences?
Intermediate & Advanced SEO | | Bio-RadAbs0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Google Re-Index or multiple 301 Redirects on the server?
Over a year ago we moved a site from Blogspot that was adding dates in the URL's (i.e.. blog/2012/08/10/) Additionally we've removed category folders (/category, /tag, etc). Overall if I add all these redirects (from the multiple date options, etc) I'm concerned it might be an overload on the server? After talking with the server team they had suggested using something like 'BWP Google Sitemaps' on our Wordpress site, which would allow Google some time to re-index our site. What do you suggest we do?
Intermediate & Advanced SEO | | seointern0 -
Problem of indexing
Hello, sorry, I'm French and my English is not necessarily correct. I have a problem indexing in Google. Only the home page is referenced: http://bit.ly/yKP4nD. I am looking for several days but I do not understand why. I looked at: The robots.txt file is ok The sitemap, although it is in ASP, is valid with Google No spam, no hidden text I made a request for reconsideration via Google Webmaster Tools and it has no penalties We do not have noindex So I'm stuck and I'd like your opinion. thank you very much A.
Intermediate & Advanced SEO | | android_lyon0