Best practices for types of pages not to index
-
Trying to better understand best practices for when and when not use a content="noindex". Are there certain types of pages that we shouldn't want Google to index? Contact form pages, privacy policy pages, internal search pages, archive pages (using wordpress). Any thoughts would be appreciated.
-
Certainly! When it comes to SEO (Search Engine Optimization), there are certain types of pages that you may want to prevent search engines from indexing. This can help ensure that only your most relevant and valuable content is displayed in search engine results. Here are some best practices for types of pages not to index:
Duplicate Content Pages:
Avoid indexing pages with duplicate content, as search engines prefer unique content.
Use canonical tags to indicate the preferred version of a page.
Thin or Low-Quality Content Pages:Pages with little to no valuable content may harm your site's overall SEO.
Consider adding substantial content to these pages or use meta tags to prevent indexing.
Internal Search Results Pages:Exclude internal search results pages from indexing, as they may lead to a poor user experience in search results.
Use the robots.txt file to disallow crawling of these pages.
Thank You and Confirmation Pages:Pages that users see after completing a form submission or transaction may not provide significant value to search engine users.
Use the noindex meta tag to prevent indexing of thank you and confirmation pages.
Login and Account Pages:Secure pages containing login forms or user account information to prevent unauthorized access.
Use the robots.txt file to disallow crawling of these pages.
Tag and Category Pages:Depending on your content management system, tag and category pages may be automatically generated. These can sometimes result in duplicate content issues.
Use the noindex meta tag or canonical tags as appropriate.
Paginated Pages:For large sets of paginated content, consider only indexing the main paginated page and using the rel="next" and rel="prev" tags to indicate the paginated structure.
Privacy Policy and Terms of Service Pages:While it's important to have these pages, they might not need to be indexed.
Use the noindex meta tag if you don't want search engines to index these legal pages.
Media Files and Non-HTML Content:Files like PDFs, images, and other non-HTML content may not need to be indexed.
Use appropriate meta tags or header directives to prevent indexing.
Test and Development Pages:Pages used for testing or development purposes should not be indexed.
Use authentication or the robots.txt file to block search engine bots from accessing these pages.
Always keep in mind that SEO best practices evolve, and it's essential to stay updated with the latest recommendations from search engines. Regularly check your website's performance in search engine results and adjust your indexing strategy accordingly. -
Best practices for determining which types of pages not to index involve strategic decisions to enhance the overall performance and relevance of your website on search engines. Here are some key considerations:
Thin or Low-Quality Content:
Recommendation: Identify and exclude pages with thin or low-quality content that doesn't provide substantial value to users. Focus on creating high-quality, informative content that aligns with user intent.
Duplicate Content:
Recommendation: Avoid indexing pages with duplicate content, as it can lead to confusion for search engines and may result in lower rankings. Use canonical tags to specify the preferred version of the content.
Internal Search Result Pages:
Recommendation: Exclude internal search result pages from indexing, as they often lead to duplicate content issues. Ensure that search engines focus on the primary content pages of your site.
Archive or Staging Pages:
Recommendation: Prevent search engines from indexing archive or staging pages. Use robots.txt or meta tags to disallow indexing of such pages to maintain the integrity of your live content.
Thank You and Confirmation Pages:
Recommendation: Non-essential pages like thank you or confirmation pages for form submissions may not need indexing. Exclude these pages to avoid unnecessary clutter in search engine results.
Login or Session-Specific Pages:
Recommendation: Exclude pages that require user authentication or are session-specific. This prevents search engines from indexing content that's not meant for public access.
Paginated Pages:
Recommendation: For paginated content, consider using rel="next" and rel="prev" tags to signal the relationship between pages. This helps search engines understand the structure without indexing each individual page.
Category or Tag Pages:
Recommendation: Depending on your website structure, category or tag pages may not need indexing. Ensure that these pages don't dilute the overall relevance of your site and use noindex tags if necessary.
Privacy Policy, Terms of Service, and Legal Pages:
Recommendation: While important for compliance, legal and policy pages may not require indexing in search results. Use noindex tags for these pages, allowing them to serve their purpose without being prominent in search listings.
Dynamic URLs with Parameters:
Recommendation: Exclude dynamically generated pages with URL parameters that don't represent unique content. Utilize canonical tags or parameter handling in Google Search Console to manage these pages.
Unnecessary Media or File Attachment Pages:
Recommendation: Media or file attachment pages may not need indexing. Use noindex tags to prevent these pages from appearing in search results while still providing access to the media itself.
Regularly audit and monitor your site's performance in search engine results to ensure that the selected pages for non-indexing align with your SEO strategy and user experience goals. Always consider the specific needs and structure of your website when implementing these best practices.
Read My Recent post here : PBU in Football
-
Here are some best practices for types of pages not to index:
Duplicate content pages. If you have multiple pages with the same or similar content, it's generally a good idea to avoid indexing all of them. This could include printer-friendly versions, alternate language versions, or slight variations of the same content.
Thin or low-quality content pages. Pages with little or no content, or pages with content that is poorly written or irrelevant to your target audience, should not be indexed.
Internal search results pages. These pages are typically not meant for users to land on directly, and they can clutter up search engine results pages (SERPs).
Privacy and policy pages. These pages are typically not relevant to search users, and they can contain sensitive information that should not be indexed.
Thank-you pages. These pages are typically displayed after a user submits a form or makes a purchase, and they are not meant for indexing.
Login and checkout pages. These pages are typically not relevant to search users, and they can contain sensitive information that should not be indexed.
Staging or test pages. These pages are not meant to be seen by the public, and they can clutter up SERPs.
Paginated pages. If your paginated pages contain the same content as your main product or category pages, you may want to consider noindexing them. This will help to avoid duplicate content issues.
You can use a number of methods to prevent pages from being indexed, including:
Robots.txt file. You can use your robots.txt file to block search engines from crawling certain pages on your website.
Noindex meta tag. You can add a noindex meta tag to the header of a page to prevent it from being indexed.
Canonical tags. You can use canonical tags to specify which version of a page is the preferred version for search engines. This can be helpful for preventing duplicate content issues.
It's important to note that noindexing pages is not always necessary. For example, if you have a blog with a lot of high-quality content, you may want to index all of your pages, even if they have similar content. However, if you have a lot of low-quality or irrelevant pages on your website, it's a good idea to noindex them to avoid harming your SEO.
-
Thanks to all of you for sending me valuable information.
-
To prevent specific types of web pages from being indexed by search engines, follow these best practices: Use the robots.txt file to disallow indexing for entire sections of your website or directories. On individual pages, utilize the meta robots tag to specify "noindex" or "nofollow" directives. Employ the X-Robots-Tag HTTP header to communicate indexing preferences, either at the server level or on specific pages. Password-protect pages that should be accessible only to authorized users. Implement canonical tags to indicate the preferred version of a page. Include only desired pages in your XML sitemap. Maintain a clean URL structure, and use "noindex" directives in robots meta headers for dynamic or user-generated content. For pages you want completely removed from search results, return 404 or 410 HTTP status codes. Regularly monitor indexed pages using tools like Google Search Console to ensure compliance with your indexing preferences while considering the potential impact on SEO and user experience.
-
When it comes to search engine optimization (SEO), there are certain types of pages that you may consider excluding from being indexed by search engines. Here are some common examples:
Duplicate content pages: If you have multiple pages with similar or identical content, it's generally a good idea to avoid indexing all of them. This could include printer-friendly versions, alternate language versions, or slight variations of the same content.
Temporary or seasonal pages: Pages that are only relevant for a limited time, such as seasonal promotions or special event pages, may not need to be indexed. Once the event or promotion has passed, you can remove them from being indexed to prevent clutter in search engine results.
Private or internal pages: If you have pages that are intended for internal use only, such as employee login pages, private user profiles, or administrative sections, it's typically best to exclude them from indexing. This ensures that sensitive or irrelevant content doesn't appear in search results.
Thin or low-quality pages: Pages with minimal or insufficient content, such as placeholder pages, thin affiliate pages, or low-quality auto-generated content, might not provide much value to search engine users. It's generally better to improve or remove such pages rather than indexing them.
Pagination and sorting pages: Pages that only differ in sorting, filtering, or pagination functionality, such as category listings or search result pages, may not need individual indexing. In these cases, it's often recommended to use canonical tags or URL parameters to consolidate them into a single indexed page.
Remember, these are general recommendations, and the specific needs of your website may vary. It's always a good idea to consult with an SEO professional to understand what pages should or shouldn't be indexed based on your unique circumstances.
things i cant accomplish ai seems interesting being looking at this blog
https://givevaluefirst.com/artificial-intelligence-for-dummies/
-
To prevent certain types of pages from appearing in search engine results, use methods like robots.txt, meta robots tags, or canonicalization. Common pages to exclude include duplicates, low-quality content, search result pages, login/profile pages, thank you pages, and outdated content. Be cautious when choosing which pages to exclude to avoid affecting your site's SEO and user experience.
-
Best practices for preventing indexing of certain types of pages on your website include:
Avoid indexing duplicate content pages.
Exclude pages with thin or low-quality content.
Do not index internal search results pages.
Privacy and policy pages are typically not meant for indexing.
Consider "noindexing" tag or category archive pages.
Author pages may be "noindexed" if they lack substantial content.
"Thank-you" pages after form submissions or purchases can often be excluded.
Dynamic parameters or session IDs should not be indexed.
Pagination pages can be "noindexed" if they duplicate content.
Login or registration pages often don't need indexing.
Implement these practices using "noindex" meta tags or "robots.txt" directives while being cautious not to inadvertently block essential pages. Regularly monitor indexing status through tools like Google Search Console. -
Best practices for preventing indexing of certain types of pages on your website include:
- Avoid indexing duplicate content pages.
- Exclude pages with thin or low-quality content.
- Do not index internal search results pages.
- Privacy and policy pages are typically not meant for indexing.
- Consider "noindexing" tag or category archive pages.
- Author pages may be "noindexed" if they lack substantial content.
- "Thank-you" pages after form submissions or purchases can often be excluded.
- Dynamic parameters or session IDs should not be indexed.
- Pagination pages can be "noindexed" if they duplicate content.
- Login or registration pages often don't need indexing.
Implement these practices using "noindex" meta tags or "robots.txt" directives while being cautious not to inadvertently block essential pages. Regularly monitor indexing status through tools like Google Search Console.
-
I try not to index terms and conditions pages and privacy policy. Then there's the "Thank you" pages that might have conversion tracking pixels on. I do this for a few sites.
-
I have created a new website. I am new to blogging, so I need help with indexing and indexing. Some experts say we need to index our all pages and a few blog posts are just the opposite of that opinion.
I have a few pages like Affiliate Disclosures, Contact, About, and Policy.
Should I index them or not? -
Duplicate Content: Avoid indexing pages with duplicate or thin content to prevent SEO issues.
Private or Confidential Pages: Secure login, checkout, or admin pages should not be indexed.
Thank You Pages: Exclude pages users see after form submissions or purchases.
Low-Value Pages: Hide pages with low-quality or outdated content from indexing.
Temporary Pages: Prevent indexing of staging, test, or under-construction pages.
Pagination: Use rel="nofollow" for paginated pages to consolidate value.
Canonicalization: Set canonical tags to specify preferred URLs for similar content.
Sitemaps: Exclude non-essential pages from your sitemap to control indexing.
Non-Content Files: Don't index PDFs, images, or other non-HTML files.
Disallowed in Robots.txt: Use robots.txt to block search engines from indexing unwanted pages.
-
Certainly! Best practices for types of pages not to index are essential for optimizing your website's SEO performance. By carefully selecting which pages to exclude from search engine indexing, you can improve crawl budget allocation, enhance user experience, and maintain the quality of your site. These practices typically include:
Duplicate Content: Identifying and addressing duplicate content issues by using canonical tags to consolidate signals to search engines.
Thin Content: Evaluating and improving pages with thin or low-quality content by either costume updating them with relevant information or redirecting them to more pertinent pages.
Private or Internal Pages: Ensuring that private or internal pages, such as login pages or admin sections, are not indexed to prevent them from appearing in search results.
Search Result Pages: Excluding search result pages from indexing to prevent user-generated queries from appearing in SERPs.
Media Files: Preventing indexing of media files like images, videos, and PDFs, as they may not provide valuable information in search results.
-
There are a few types of pages that you may not want to index, for a variety of reasons. Here are some of the best practices for types of pages not to index:
Pages that are not relevant to your website's visitors: If a page is not relevant to the content of your website, or if it is not likely to be of interest to your visitors, then there is no reason to index it. This could include pages such as login pages, error pages, and internal pages that are only used by administrators.
Pages that are duplicate content: If a page is duplicate content of another page on your website, then there is no need to index both pages. This could include pages that are generated dynamically, such as search results pages or product pages.
Pages that are not secure: If a page is not secure, such as a page that uses HTTP instead of HTTPS, then you may not want to index it. This is because search engines may flag these pages as insecure, which could deter visitors from visiting your website.
Pages that are frequently updated: If a page is frequently updated, such as a blog page, then you may not want to index it. This is because the search engines will have to crawl the page more often, which could slow down your website.
Pages that are not mobile-friendly: If a page is not mobile-friendly, then you may not want to index it. This is because more and more people are using mobile devices to access the internet, and search engines are starting to favor mobile-friendly websites.
By following these best practices, you can ensure that your website is indexed by search engines only with the pages that are most relevant and useful to your visitors.
Here are some additional tips for deciding which pages not to index:
Consider your audience: Think about the type of content that your visitors are looking for. If a page is not relevant to their interests, then there is no need to index it.
Use your analytics: Look at your website analytics to see which pages are the most popular. These are the pages that you should focus on indexing.
Get feedback from your visitors: Ask your visitors what type of content they are looking for. This feedback can help you decide which pages to index and which pages to exclude.
By following these tips, you can make sure that your website is indexed by search engines in a way that is beneficial to your visitors.
-
Indexing decisions for web pages play a crucial role in search engine optimization (SEO) and overall website management. There are certain types of pages that you may want to prevent search engines from indexing to maintain the quality of your website's search engine results and to avoid potential SEO issues. Here are some best practices for types of pages not to index:
Thin Content Pages: Avoid indexing pages with minimal or low-quality content. Such pages can include placeholder pages, duplicate content, or pages with very little text. Thin content can harm your website's SEO.
Internal Search Result Pages: Search engines can sometimes index internal search result pages, which can lead to duplicate content issues. Use the "noindex" meta tag to prevent indexing of these pages.
Tag and Category Pages: If you have a blog or a content-heavy website, tag and category pages may contain duplicate or low-value content. Consider using the "noindex" tag for these pages.
Thank You and Confirmation Pages: Pages that users see after completing a form or making a purchase are often not useful for search engine results. Prevent these pages from being indexed to avoid cluttering search results.
Private or Confidential Pages: Pages with sensitive information or private data should never be indexed. Make sure to use proper authentication and access controls to protect these pages.
Duplicate Content Pages: If you have multiple versions of the same content (e.g., print-friendly versions, mobile versions), use canonical tags to indicate the preferred version and prevent duplicate content issues.
Session ID or URL Parameters: Pages with session IDs or excessive URL parameters can create many duplicate URLs. Use URL canonicalization techniques or robots.txt to prevent indexing of unnecessary variations.
Login Pages and Admin Sections: Prevent search engines from indexing login pages and admin sections of your website to maintain security and keep sensitive information hidden.
Temporary or Under-Construction Pages: If you're working on a page that's not ready for public viewing, use the "noindex" tag to prevent it from appearing in search results.
404 Error Pages: While 404 error pages should not be indexed, it's essential to provide a helpful 404 page that guides users to relevant content or the homepage.
Pagination Pages: For paginated content like articles split across multiple pages, it's often best to let search engines index the main content and use rel="prev" and rel="next" tags to indicate the paginated structure without indexing each page individually.
Regards : Epicsprtsx
-
I want to extend my gratitude to the author for this comprehensive guide on best practices for managing indexing in SEO. As someone deeply involved in digital marketing and SEO, I find this topic to be of utmost importance. In today's fast-paced online landscape, it's crucial to make informed decisions about which pages to index and which to keep out of search engine results pages (SERPs).
One of the key takeaways from this article is the emphasis on optimizing crawl budget. Google's crawl budget is a finite resource, and ensuring that search engines allocate it wisely can significantly impact a website's overall performance. The author rightly points out that preventing the indexing of pages that don't add substantial value to users can help in this regard.
One of the practices highlighted in the article is the use of the "noindex" meta tag. This is a simple yet effective way to communicate to search engines that specific pages should not be included in their index. I appreciate the step-by-step instructions provided on how to implement this tag properly. This can be particularly helpful for those new to SEO.
Additionally, the article's discussion on using robots.txt to disallow crawling of certain pages is a valuable strategy. However, as mentioned, it's important to exercise caution when using this method to prevent accidentally blocking important pages. The emphasis on regularly monitoring the robots.txt file and conducting thorough testing is a crucial piece of advice. This shows that the author is not just focused on prevention but also on maintaining site health and visibility.
Another aspect that I found intriguing is the section on "thin content" pages. Identifying and addressing these pages is essential for maintaining the quality of a website. It's great to see practical recommendations on how to handle such pages, including updating them with relevant content or redirecting them to more relevant pages. This demonstrates a commitment to providing the best possible user experience, which is at the core of SEO success.
Furthermore, the article delves into the nuances of handling duplicate content, an issue that many SEO practitioners encounter. The explanation of canonical tags and their role in consolidating duplicate content signals to search engines is spot on. The emphasis on regular audits to identify and resolve duplicate content issues is a proactive approach that can prevent potential ranking and indexing problems down the road.
I would like to add that, in my experience, it's also important to stay updated with Google's guidelines and algorithm changes. Google's algorithms are constantly evolving, and what works today may not be as effective tomorrow. Therefore, staying informed through resources like Google Webmaster Guidelines and reputable SEO news sources is essential.
In conclusion, this article provides a wealth of practical information and strategies for managing indexing in SEO. It's evident that the author has a deep understanding of the subject matter and is committed to helping SEO professionals make informed decisions. I look forward to reading more insightful articles from this source in the future. Thank you for sharing these valuable insights.
Team marketingratis.com
-
When it comes to optimizing your website for search engines, there are certain types of pages that you may want to consider not indexing. Here are a few examples of such pages:
Duplicate content: Pages that contain identical or substantially similar content to other pages on your site or elsewhere on the web. Search engines prefer unique and original content, so it's best to avoid indexing duplicate pages.
Thin content: Pages that lack substantial content or are of low quality. These can include pages that have little to no text, primarily consisting of images, videos, or advertisements. Search engines tend to prioritize pages with valuable and informative content.
Temporary or staging pages: Pages created during the development or testing phase of your website, which may not be relevant or useful to search engine users. It's a good idea to prevent these pages from being indexed to avoid confusion or negative impacts on search engine rankings.
Private or sensitive information: Pages that include personal information, login pages, or any content that should be restricted to authorized users only. Preventing indexing of such pages can help maintain privacy and security.
To prevent search engines from indexing specific pages, you can use the "robots.txt" file directives or the HTML "noindex" meta tag. These methods allow you to control which pages search engines can or cannot index.
Remember, it's essential to regularly review and update your website's indexing strategies to ensure optimal visibility and user experience.
-
The article does a great job of highlighting various scenarios where you should consider using the "noindex" meta tag or other techniques to prevent certain pages from appearing in search engine results. (Canada PR)Whether it's duplicate content, thin or low-quality pages, internal search result pages, or sensitive information, this post provides valuable insights and actionable tips to help improve your website's SEO and user experience.
-
Pages with duplicate content, like printer-friendly versions, should be set as "noindex" to prevent confusion. Thin or low-quality content pages, such as placeholders or login screens, should also be excluded. Internal search results, tag/category pages, and user-generated content areas might be better off without indexing due to potential duplicate content e.g https://smamepestimate.com/ or spam concerns. Thank you/confirmation pages, sensitive content, and paginated/sorted versions of content can also benefit from not being indexed.
-
@donsilvernail What should I do with pages that I've de-indexed intentionally?
Like contact us and privacy policy generator I have to do it on my personal blog Footyware. Can I interlink it with my other pages too like homepage etc please guide. -
Need to be clear on the purpose of "no-index". Search engines will still crawl the page, but in theory will not be published in the index. Some search engines may still choose to index the page despite no-index tag. Also that page will still be publicly accessible on your website.
As already noted a couple of times I would be very slow to noindex any page.
I can't think of very many applications where it would be used. The way I view it is either something is public or its private, if it's public you properly want search engines to find it, or if it's private it should be locked away behind a username and password.
-
Hi Richard,
Some archive pages in WordPress can produce significant traffic. Especial if the articles that reside under the archive are informative and the tag or category you use is a good keyword and provides value. So i only no index archives that have no real value.
Contact forms are up to you. Does the form sit on a landing page you want visitors? Or is it an internal link for data collection. A determination on what should be indexed or no indexed is what pages bring value to potential visitors. Many internal search pages bring no value to a user searching for your content on google. So these could be no index. User archives could be no index especial if the user is not an author of content on your site.
Thanks,
Don Silvernal
-
Hi there,
Really any pages that you would not want returned to a user in the SERPs. Does the site contain sensitive personal information in some sort of customer profile? If so, you would want to index these pages.
I would not noindex contact form pages (valuable for users to be able to find) but internal search pages would be a good candidate as well as 'thank you' pages. If you have an ecommerce website, noindexing the shopping cart would be another smart idea.
As for archive pages, I tend to handle these with a canonical tag.
Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Drop in traffic, spike in indexed pages
Hi, We've noticed a drop in traffic compared to the previous month and the same period last year. We've also noticed a sharp spike in indexed pages (almost doubled) as reported by Search Console. The two seemed to be linked, as the drop in traffic is related to the spike in indexed pages. The only change we made to our site during this period is we reskinned out blog. One of these changes is that we've enable 'normal' (not ajax) pagination. Our blog has a lot of content on, and we have about 550 odd pages of posts. My question is, would this impact the number of pages indexed by Google, and if so could this negatively impact organic traffic? Many thanks, Jason
Technical SEO | | Clickmetrics0 -
What would cause a sudden drop in indexed sitemap pages?
I have made no changes to my site for awhile and on 7/14 I had a 20% drop in indexed pages from the sitemap. However my total indexed pages has stayed the same. What would cause that?
Technical SEO | | EcommerceSite0 -
Google dropping pages from SERPs even though indexed and cached. (Shift over to https suspected.)
Anybody know why pages that have previously been indexed - and that are still present in Google's cache - are now not appearing in Google SERPs? All the usual suspects - noindex, robots, duplication filter, 301s - have been ruled out. We shifted our site over from http to https last week and it appears to have started then, although we have also been playing around with our navigation structure a bit too. Here are a few examples... Example 1: Live URL: https://www.normanrecords.com/records/149002-memory-drawings-there-is-no-perfect-place Cached copy: http://webcache.googleusercontent.com/search?q=cache:https://www.normanrecords.com/records/149002-memory-drawings-there-is-no-perfect-place SERP (1): https://www.google.co.uk/search?q=memory+drawings+there+is+no+perfect+place SERP (2): https://www.google.co.uk/search?q=memory+drawings+there+is+no+perfect+place+site%3Awww.normanrecords.com Example 2: SERP: https://www.google.co.uk/search?q=deaf+center+recount+site%3Awww.normanrecords.com Live URL: https://www.normanrecords.com/records/149001-deaf-center-recount- Cached copy: http://webcache.googleusercontent.com/search?q=cache:https://www.normanrecords.com/records/149001-deaf-center-recount- These are pages that have been linked to from our homepage (Moz PA of 68) prominently for days, are present and correct in our sitemap (https://www.normanrecords.com/catalogue_sitemap.xml), have unique content, have decent on-page optimisation, etc. etc. We moved over to https on 11 Aug. There were some initial wobbles (e.g. 301s from normanrecords.com to www.normanrecords.com got caught up in a nasty loop due to the conflicting 301 from http to https) but these were quickly sorted (i.e. spotted and resolved within minutes). There have been some other changes made to the structure of the site (e.g. a reduction in the navigation options) but nothing I know of that would cause pages to drop like this. For the first example (Memory Drawings) we were ranking on the first page right up until this morning and have been receiving Google traffic for it ever since it was added to the site on 4 Aug. Any help very much appreciated! At the very end of my tether / understanding here... Cheers, Nathon
Technical SEO | | nathonraine0 -
Differing numbers of pages indexed with and without the trailing slash
I noticed today that a site: query in Google (UK) for a certain domain I'm looking at returns different numbers depending on whether or not the trailing slash is added at the end. With the trailing slash the numbers are significantly different. This is a domain with a few duplicate content issues. It seems very rare but I've managed to replicate it for a couple of other well known domains, so this is the phenomenon I'm referring to: site:travelsupermarket.com - 16'300 results
Technical SEO | | ianmcintosh
site:travelsupermarket.com/ - 45'500 results site:guardian.co.uk - 120'000'000 results
site:guardian.co.uk/ - 121'000'000 results For the particular domain I'm looking at the numbers are 19'000 without the trailing slash and 800'000 with it! As mentioned, there are a few duplicate content issues at the moment that I'm trying to tidy up, but how should I interpret this? Has anyone seen this before and can advise what it could indicate? Thanks in advance for any answers.0 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
Google doesn't rank the best page of our content for keywords. How to fix that?
Hello, We have a strange issue, which I think is due to legacy. Generally, we are a job board for students in France: http://jobetudiant.net (jobetudiant == studentjob in french) We rank quite well (2nd or 3rd) on "Job etudiant <city>", with the right page (the one that lists all job offers in that city). So this is great.</city> Now, for some reason, Google systematically puts another of our pages in front of that: the page that lists the jobs offers in the 'region' of that city. For example, check this page. the first link is a competitor, the 3rd is the "right" link (the job offers in annecy), but the 2nd link is the list of jobs in Haute Savoie (which is the 'departement'- equiv. to county) in which Annecy is... that's annoying. Is there a way to indicate Google that the 3rd page makes more sense for this search? Thanks
Technical SEO | | jgenesto0 -
Getting a citation page indexed
Howdy mozzers, I have a citation on a .govt domain with 2 links pointing to my site. The page is not indexed by Google, bing or yahoo. URL; http://www.familyservices.govt.nz/directory/viewprovider.htm?id=17077 I have tried getting the paged indexed by building bookmark links to it. I have tweeted the url and gotten a few re-tweets for it. But no luck. The page has got no nofollow meta tag. Other listings have been indexed by google. Could someone please advise on means to help me get the page indexed? A strategy that I have not yet tried is submitting a sitemap that includes the external url as I am not sure if it is possible to include url's not part of my domain. Any advice, help would be greatly appreciated. viva le SEOmoz Thanks
Technical SEO | | ihms1