Lately I have noticed Google indexing many files on the site without the .html extension

gheh2013

Hello,

Our site, while we convert, remains in HTML 4.0.

Fle names such as http://www.sample.com/samples/index.shtml are being picked up in the SERPS as http://www.sample.com/samples/ even when I use the "rel="canonical" tag and specify the full file name therein as recommended. The link to the truncated URL (http://www.sample.com/samples/) results in what MOZ shows as fewer incoming links than the full file name is shown as having incoming.

I am not sure if this is causing a loss in placement (the MOZ stats are showing a decline of late), which I have seen recently (of course, I am aware of other possible reasons, such as not being in HTML5 yet).

Any help with this would be great.

Thank you in advance

KristinaKledzik

Can you clarify what you're concerned about for 301 redirects in terms of link juice?

301 redirects don't carry as much link juice as a direct link, but it doesn't impact correct links, just the links that, otherwise, wouldn't get link juice to your end destination at all. (Though, if your canonical is working correctly, it'll pass the same amount of link juice as a 301 redirect.)

Dr. Pete goes into this a bit more over here: https://moz.com/community/q/do-canonical-tags-pass-all-of-the-link-juice-onto-the-url-they-point-to

gheh2013

Many thanks for taking the time to respond Kristina.

I don't like to do redirects, as so many have warned of the consequences in terms of link juice
No, I don't link to the pages in question using "/" rather than the ".shtml" version of the page indexed.
A few external sources use the "/" version (recent linkers) I have found, but they likely only did so as they saw it displayed as such in the SERPs previously. No commercial or other affiliate sites do.

The reason I was really confused is that some pages are indexed using the "/", while others are not -- with no apparent reason I could locate. The "/" version for pages still remains on the first page for keywords, even with far less domain authorities and pages linking to them (for now!). We will be moving to another platform with a different default extension, so I wonder how that will be handled. Endless mysteries.

Thank you again for your time and suggestions,

Greg

KristinaKledzik

Hmm, that doesn't seem good. It's hard to say whether this is causing the decline in your rankings, but either way, you want to make sure that you're not splitting your link equity between your / and .shtml pages. Here's what I'd do:

If you can, 301 redirect / pages to .shtml pages. Obviously, it'd be easier if the canonical worked, but it sounds like it doesn't.
Use ScreamingFrog or DeepCrawl to look through internal pages on your site to see if you're ever linking to the / version of pages rather than the .shtml pages. When Google chooses a different version of a URL over the canonical one, it's often because that's how it sees internal links pointing to the page. Make sure that you only have links to the .shtml version of the page.
Use a tool like Moz or Ahrefs to find all internal links to your site. For any links that you built or have a partnership with the owners, make sure that they're linking to the .shtml version of the page. I could especially see your ad partners using / because it's a cleaner before parameters than .shtml.

After that, wait and see if Google fixes the problem.

Also worth noting: have you thought about changing your default to /? That's more common today, so you're probably getting a lot of external links with / instead of .shtml, and you'll never be able to fix that problem. If that's a possible solution, you may want to explore it.

Good luck!

Kristina

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Lately I have noticed Google indexing many files on the site without the .html extension

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google does not want to index my page

Google Ignoring Canonical Tag for Hundreds of Sites

Why isn't my site being indexed by Google?

Google de-indexed a page on my site

Multiple 301 redirects and old site content appearing in Google results

Why is my site not getting crawled by google?

Google penalized site--307/302 redirect to new site-- Via intermediate link—New Site Ranking Gone..?

How to remove an entire subdomain from the Google index with URL removal tool?