Lately I have noticed Google indexing many files on the site without the .html extension
-
Hello,
Our site, while we convert, remains in HTML 4.0.
Fle names such as http://www.sample.com/samples/index.shtml are being picked up in the SERPS as http://www.sample.com/samples/ even when I use the "rel="canonical" tag and specify the full file name therein as recommended. The link to the truncated URL (http://www.sample.com/samples/) results in what MOZ shows as fewer incoming links than the full file name is shown as having incoming.
I am not sure if this is causing a loss in placement (the MOZ stats are showing a decline of late), which I have seen recently (of course, I am aware of other possible reasons, such as not being in HTML5 yet).
Any help with this would be great.
Thank you in advance
-
Can you clarify what you're concerned about for 301 redirects in terms of link juice?
301 redirects don't carry as much link juice as a direct link, but it doesn't impact correct links, just the links that, otherwise, wouldn't get link juice to your end destination at all. (Though, if your canonical is working correctly, it'll pass the same amount of link juice as a 301 redirect.)
Dr. Pete goes into this a bit more over here: https://moz.com/community/q/do-canonical-tags-pass-all-of-the-link-juice-onto-the-url-they-point-to
-
Many thanks for taking the time to respond Kristina.
-
I don't like to do redirects, as so many have warned of the consequences in terms of link juice
-
No, I don't link to the pages in question using "/" rather than the ".shtml" version of the page indexed.
-
A few external sources use the "/" version (recent linkers) I have found, but they likely only did so as they saw it displayed as such in the SERPs previously. No commercial or other affiliate sites do.
The reason I was really confused is that some pages are indexed using the "/", while others are not -- with no apparent reason I could locate. The "/" version for pages still remains on the first page for keywords, even with far less domain authorities and pages linking to them (for now!). We will be moving to another platform with a different default extension, so I wonder how that will be handled. Endless mysteries.
Thank you again for your time and suggestions,
Greg
-
-
Hmm, that doesn't seem good. It's hard to say whether this is causing the decline in your rankings, but either way, you want to make sure that you're not splitting your link equity between your / and .shtml pages. Here's what I'd do:
- If you can, 301 redirect / pages to .shtml pages. Obviously, it'd be easier if the canonical worked, but it sounds like it doesn't.
- Use ScreamingFrog or DeepCrawl to look through internal pages on your site to see if you're ever linking to the / version of pages rather than the .shtml pages. When Google chooses a different version of a URL over the canonical one, it's often because that's how it sees internal links pointing to the page. Make sure that you only have links to the .shtml version of the page.
- Use a tool like Moz or Ahrefs to find all internal links to your site. For any links that you built or have a partnership with the owners, make sure that they're linking to the .shtml version of the page. I could especially see your ad partners using / because it's a cleaner before parameters than .shtml.
After that, wait and see if Google fixes the problem.
Also worth noting: have you thought about changing your default to /? That's more common today, so you're probably getting a lot of external links with / instead of .shtml, and you'll never be able to fix that problem. If that's a possible solution, you may want to explore it.
Good luck!
Kristina
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Start a new site to get out of Google penalties?
Hey Moz, I have several questions in regards to whether I should a start a new second site to save my online presence after a series of Google penalties. The main questions being: Is this the best way to spend my time/resources? If I’m forced to jump my company over to the new site can Google see that and transfer the penalty? I plan on all new content (no link redirect, no dup content) so do I need to kill the original site? Are there any Pro’s/cons I am missing? Summary of my situation: Looking at analytics it appears I was hit with both Penguin 2.0 and 2.1, each cutting my traffic in half, despite a link remediation campaign in the summer of 2013. There was a manual penalty also imposed on the site in the fall of 2013, which was released in early 2014. With Penguin 3.0’s release at the end of 2014, the site saw a slight uptick in organic traffic, improving from essentially nothing to next to nothing. Most of the site’s issues revolved around cheap $5 links from India in the 2006-09 time frame. This link building was abandoned, and replaced with nothing but “letting them happen naturally” from 2010 through the 2013 penalties. Since 2013 we have done a small amount of quality articles on a monthly basis to promote the site, social media, and continuous link remediation. In addition the whole site has been redesigned, optimized for speed/mobile, secured, and completely rewritten. Given all of this, the site has really only recovered to page 2 and 3 of the SERPs for our key words. Even after a highly circulated piece appeared on an Authority site (97 DA) a few months ago there was zero movement. It appears we have an anvil tied around our leg until Penguin 4.0. With all of the above, and no sign of when the next penguin will be released, I ask, is it time to start investing in a new site? With no movement in 2.5 years, it’s impossible to know where my current site stands, so I don’t know what else I can do to improve it. I am considering slowly building a new site that is a high quality informational site. My thought process is it will take a year for a new site to gain any traction with Google. If by that time my main site has not recovered, I can jump to that new site, add a commercial component, and use it as a life boat for my company. If I have recovered, then I have a future asset. Thanks in advance!
Intermediate & Advanced SEO | | TheDude0 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
Why is a site no longer being indexed by Google after HTTPS switch?
A client of ours recently had a new site built and made the switch to HTTPS. We made sure to redirect all of the HTTP pages to HTTPS and submitted a new sitemap to Google. GWT says the sitemap was submitted successfully but only 4 pages have been indexed where there should be over 2000. This has led to a plummet of organic traffic and we can't find the issue. Has anyone else had issues/success with doing a HTTPS switch that knows how to fix this problem?
Intermediate & Advanced SEO | | ATMOSMarketing560 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
How do you achieve Google Authorship verification on a site with no clearly defined authors?
Google Authorship seems to be the current buzz topic in SEO. It seems perfect for people who write lots of articles of blog posts, but what about sites where the main focus isn't articles e.g. e-commerce sites? Can the website as a whole get verified?
Intermediate & Advanced SEO | | statman870 -
Does Google Index Videos onsite when using JQuery?
Hi, I'm showing my videos using jquery lightbox etc. This means that I do not have the normal YouTube "embedding" code onpage. Does anyone know if Google will somehow index my videos? Any solutions / ideas? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
My Job Site is having Indexing Issues
I have 2 job sites that I am managing and working on. One of the sites has a great deal of job vacancies and expired job pages that have been indexed. This one below: http:// job search.cctc .com/cctc Jobsearch/expandedjobsearch.do This job site does not have any job pages index: http://www.cross countryallied. com/ctAlliedWebSite/ travel-nurse-jobs/job-search.jsp Why and what can I do to get the dynamic pages index and ranking? Any help tips would be much appreciated. Thanks
Intermediate & Advanced SEO | | Melia0 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0