Internal file extension canonicalization
-
Ok no doubt this is straightforward, however seem to be finding to hard to find a simple answer; our websites' internal pages have the extension .html. Trying to the navigate to that internal url without the .html extension results in a 404.
The question is; should a 401 be used to direct to the extension-less url to future proof? and should internal links direct to the extension-less url for the same reason?
Hopefully that makes sense and apologies for what I believe is a straightforward answer;
-
As above
example/abc rewrites to example/abc.html
example/abc.html redirects to example/abc
and all internal links link to example/abc
-
Thankyou for the replies.
I will try and clarify what I am trying to get at; apologies in advance for any naivety.
I understand homepage canonicalization; the confusion revolves around how this applies to internal pages.
Logically; I am struggling to see how internal pages are any different to a homepage in terms of the need to avoid multiple urls....and thus an extension-less url seemed appropriate. Not too mention the benefit or cleaner urls, easier to link to, remember etc.
i.e.
example/abc
example/abc.html
example/abc.index.html
-
As nick said, you dont need to do this, but if you are.
1. REWRITE the new url to the old url, as your webserver needs to know the extention
2. REDIRECT the old url to the new one, incase you already have links to the old urls, you dont want5 duplicate content
3. you need to make surer that all internal links point to the new url, you dont want un-necessary redirects as they leak link juice.
-
I'm about to make a whole lot of assumptions about your website to give this answer, just be aware.
Your website is built static, using HTML. Hence the .html file extension. If you're seeing websites that don't have file extension, it's most likely they are using content management systems (or have some serious /folder/index.html stuff going on).
Having a file extension like .html or .aspx or .php is not a bad thing. On websites like yours, it is required (unless you do the above subfolder thing) because it's an actual file the browser is grabbing rather than something being dynamically generated by a CMS. It has nothing to do with future-proofing.
As for 301'ing non-extension URLs to extention'd ones...well I don't know why you'd need to do that for your type of site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do I need a separate robots.txt file for my shop subdomain?
Hello Mozzers! Apologies if this question has been asked before, but I couldn't find an answer so here goes... Currently I have one robots.txt file hosted at https://www.mysitename.org.uk/robots.txt We host our shop on a separate subdomain https://shop.mysitename.org.uk Do I need a separate robots.txt file for my subdomain? (Some Google searches are telling me yes and some no and I've become awfully confused!
Technical SEO | | sjbridle0 -
Does Google index internal anchors as separate pages?
Hi, Back in September, I added a function that sets an anchor on each subheading (h[2-6]) and creates a Table of content that links to each of those anchors. These anchors did show up in the SERPs as JumpTo Links. Fine. Back then I also changed the canonicals to a slightly different structur and meanwhile there was some massive increase in the number of indexed pages - WAY over the top - which has since been fixed by removing (410) a complete section of the site. However ... there are still ~34.000 pages indexed to what really are more like 4.000 plus (all properly canonicalised). Naturally I am wondering, what google thinks it is indexing. The number is just way of and quite inexplainable. So I was wondering: Does Google save JumpTo links as unique pages? Also, does anybody know any method of actually getting all the pages in the google index? (Not actually existing sites via Screaming Frog etc, but actual pages in the index - all methods I found sadly do not work.) Finally: Does somebody have any other explanation for the incongruency in indexed vs. actual pages? Thanks for your replies! Nico
Technical SEO | | netzkern_AG0 -
Product pages getting no internal links in Magento
Hello I think i have a serious problem. Most of my products are not getting internal links.
Technical SEO | | macrovet
I discoverd this when i was running a Crawl Test Tool Report | Moz Here an example of one product.
This product can be navigate to a normal way true the navigation structure on my website. The navigation is http://www.macrovet.nl/scheermachine/scheerapparaat-paard-paardenscheermachine.html
On this page is the product URL: http://www.macrovet.nl/aesculap-econom-equipe-gt674.html
Time Crawled 2014
Title tag: Aesculap Econom Equipe GT674 | Macrovet.nl
Meta Description: Bekijk en bestel een Aesculap Econom Equipe GT674 paardenscheermachine voor de scherpste prijs Macrovet.nl
HTTP Status Code: 200
Referrer http://www.macrovet.nl/sitemap.xml
Link Count: 550
Content-Type Header: text/html; charset=UTF-8
4XX (Client Error): NO
5XX (Server Error): NO
Title Missing or Empty: No
Duplicate Page Content: NO
URLs with Duplicate Page Content (up to 5)
Duplicate Page Title:No
Long URL NO
Overly-Dynamic URL NO
301 (Permanent Redirect) NO
302 (Temporary Redirect) NO
301/302 Target
Meta Refresh NO
Meta Refresh Target
Title Element Too Short NO
Title Element Too Long No
Too Many On-Page Links YES
Missing Meta Description Tag No
Search Engine blocked by robots.txt No
Meta-robots Nofollow No
Meta Robots Tag INDEX,FOLLOW
Rel Canonical Yes
Rel-Canonical Target http://www.macrovet.nl/aesculap-econom-equipe-gt674.html
Blocking All User Agents No
Blocking Google No
Internal Links 0
Linking Root Domains 0
External Links 0
Page Authority 1 Domain Autority 30 Do you have an answer what is wrong, thanks for your answers Regards,
Willem-Johan0 -
Can the Hosting location of image files have a negative effect if on the developers own media server rather than on client site server ?
Hi Can the Hosting location of image files have a negative effect if on the developers own media server as opposed to on the actual websites server ? In the case i'm looking at the image files are hosted on a totally separate server (a media subdomain of the developers site server) from the subject sites dedicated server. Will engines still attribute the properties of files hosted in this manner to the main website (such as file name or should they really be on the subject sites server own media folder ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Help writing a .htacess file with the correct 301 redirects
Hello I need help writing a .htaccess file that will do two things. URL match abc.com and www.abc.com to www.newabc.com except one subdomain was also changed www.abc.com/blog is now www.newabc.com/newblog everything after blog matches. Any help would greatly be appreciated. Thanks
Technical SEO | | chriistaylor0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
Do index.php extensions count as duplicate content on Joomla sites?
When i run my error report, i see 2 duplicate pages, but both are the main domain and then the /index.php extension. how do i fix this? does it really count as duplicate content?
Technical SEO | | valetseo0 -
Too many internal links on one page
Hello All, I have just started using SEO moz. I had one quick question i would like answered. Currently SEOmoz is telling me that there are too many internal links. The recommendation is 100 links per page but the majority of my pages have 125+ links Will this effect the page when its crawled? Look forward to your comments. Thanks in advance
Technical SEO | | TWPLC_seo0