The end of April, we made the switch from http to https and I was prepared for a surge in crawl errors while Google sorted out our site. However, I wasn't prepared for the surge in impossibly incorrect URLs and partial URLs that I've seen since then.
I have learned that as Googlebot grows up, he'she's now attempting to read more javascript and will occasionally try to parse out and "read" a URL in a string of javascript code where no URL is actually present. So, I've "marked as fixed" hundreds of bits like
/TRo39,
category/cig
etc., etc....
But they are also returning hundreds of otherwise correct URLs with a .html extension when our CMS system generates URLs with a .uts extension like this:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.html
when it should be:
https://www.thompsoncigar.com/thumbnail/CIGARS/90-RATED-CIGARS/FULL-CIGARS/9012/c/9007/pc/8335.uts
Worst of all, when I look at them in GSC and check the "linked from" tab it shows they are linked from themselves, so I can't backtrack and find a common source of the error.
Is anyone else experiencing this? Got any suggestions on how to stop it from happening in the future? Last month it was 50 URLs, this month 150, so I can't keep creating redirects and hoping it goes away.
Thanks for any and all suggestions!
Liz Micik