Kill your htaccess file, take the risk to learn a little
-
Last week I was browsing Google's index with "site:www.mydomain.com and wanted to scan over to see what Google had indexed with my site. I came across a URL that was mistakenly indexed. It went something like this
www.mydomain.com/link1/link2/link1/link4/link3
I didn't understand why Google had indexed a page like that of mine when the "link" pages were links that were on my main bar which were site wide links. It seemed to be looping infinitely over and over. So I started trying to see how many of these Google had indexed and I came across about 20 pages. I went through the process of removing the URL's in Webmaster Tools, but then I wanted to know why it was happening. I had discovered that I had mistakenly placed some links on my site in my header in such a manner
If you know HTML you will realize that by not placing the "/" in the front of the link I was telling that page to add that link in addition to the URL that is was currently on. What this did was create an infinite loop of links which is not good
Basically when Google went to www.mydomain.com/link1/ it found the other links which then told Google to add that url to the existing URL and then go to that link.
Something like: www.mydomain.com/links1/link2/...
When you do not add the "/" in front of the directory you are linking too it will do this. The "/" refers to the root so if you place that in front of your directory you are linking too it will always assume that first "/" as the root then the url will follow.
So what did I do?
Even though I was able to find about 20 URL's using the "site:" search method there had to be more out there. Even though I tried to search I was not able to find anymore, but I was not convinced.
The light bulb went on at this point
My .htaccess file contained many 301 redirects in my attempt to try and redirect those pages to a real page, they were not really relevant pages to redirect too. So how could I really find out what Google had indexed out there for me since Webmaster Tools only reports the top 1000 links.
I decided to kill my htaccess file. Knowing that Google is "forgiving" when major changes to your site happen I knew Google would not simply just kill my site for removing my htaccess file immediately.
I waited 3 days then BOOM! Webmaster Tools was reporting to me that it found a ton of 401's on my site. I looked at the Crawl Errors and there they were. All those infinite loop links that I knew had to be more out there, I was able to see.
How many were there?
Google found in the first crawl over 5,000 of them. OMG! Yeah could you imagine the "Low quality" score I was getting on those pages? By seeing all those links I was able to determine about 4 patterns in the links. For example:
Now my issue was I wanted to keep all the URL's that were pointing to www.mydomain.com/link1 but anything after that I needed gone. I went into my Robots.txt file and added this
Disallow: www.mydomain.com/link1/link2/
Disallow: www.mydomain.com/link1/link3/
Disallow: www.mydomain.com/link1/link4/
Disallow: www.mydomain.com/link1/link5/
Now there were many more pages indexed that went deeper into those links but I knew I wanted anything after the 2nd URL gone since it was the start of the loop that I detected. With that I was able to have from what I know at least 5k links if not more.
What did I learn from this?
Kill your htaccess file for a few days and see what comes back in your reports. You might learn something
After doing this I simply replaced my htaccess file and I am on my way to removing a ton of "low quality" links I didn't even know I had.
-
Interesting post. Yeah, the .htaccess file is the most important file out there and it is easy to mess up (as I am sure most everyone has at one time or another).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
.htaccess Redirect 301 issues
I have completely rewritten my web site, adding structure to the file directories. Subsequently added was Redirect information within the .htaccess file. The following example ...
Technical SEO | | Cyberace
Redirect 301 /armaflex.html http://www.just-insulation.com/002-brands/armaflex.html
Returns this response in the URL bar of ...
http://www.just-insulation.com/002-brands/armaflex.html?file=armaflex
I am at a loss to understand why the suffix "?file=armaflex" is added The following code is inserted at the top of the file ...
RewriteEngine On redirect html pages to the root domain RewriteRule ^index.html$ / [NC,R,L] Force www. prefix in URLs and redirect non-www to www RewriteCond %{http_host} ^just-insulation.com [NC]
RewriteRule ^(.*)$ http://www.just-insulation.com/ [R=301,NC] Any advice would be most welcome.0 -
Htaccess - multiple matches by error
Hi all, I stumbled upon an issue on my site. We have a video section: www.holdnyt.dk/video htaccess rule: RewriteCond %{REQUEST_FILENAME} !-f
Technical SEO | | rasmusbang
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^video index.php?area=video [L,QSA] Problem is that these URLs give the same content:
www.holdnyt.dk/anystring/video
www.holdnyt.dk/whatsoever/video Any one with a take on whats wrong with the htaccess line? -Rasmus0 -
What steps can you take to help a site that does not change
Hi, i am working on a product and services website www.clairehegarty.co.uk but the problem i have is, the site does not really change. The home page stays the same and the only time it changes is when a new course is advertised. The most important page on the website is http://www.clairehegarty.co.uk/virtual-gastric-band-with-hypnotherapy but we have seen the site drop in rankings because the page is not being updated. This page has all the information you could want on weight loss but we have seen the page drop from number one in google to number four. I would like to know what steps we should take to increase our rankings in google and would be grateful for your suggestions. If i put in articles on the site, had a section where we put a new article every week, would this then get google to visit the whole site more and move our pages back up the rankings, or should we be looking at doing other things.
Technical SEO | | ClaireH-1848860 -
HTACCESS redirect vs. forwarding
I'm having trouble using htaccess redirect to redirect a subdomain to a new domain on a different server. Tech support at godaddy suggested I forward the subdomain. The subdomain has already been cached by google. Will forwarding in this way have the same affect (SEO wise) as an htaccess redirect??
Technical SEO | | triple90 -
A week ago I asked how to remove duplicate files and duplicate titles
Three weeks ago we had a very large number of site errors revealed by crawl diagostics. These errors related purely to the presence of both http://domain name and http://www.domain name. We used the rel canonical tag in the head of our index page to direct all to the www. preference, and we have no improvement. Matters got worse two weeks ago and I checked with Google Webmaster and found that Google had somehow lost our preference choice. A week ago I asked how to overcome this problem and received good advice about how to re-enter our preference for the www.tag with Google. This we did and it was accepted. We aso submitted a new sitemap.xml which was also acceptable to Google. Today, a week later we find that we have even more duplicate content (over 10,000 duplicate errors) showing up in the latest diagnostic crawl. Does anyone have any ideas? (Getting a bit desperate.)
Technical SEO | | FFTCOUK0 -
Is there actual risk to having multiple URLs that frame in main url? Or is it just bad form and waste of money?
Client has many urls that just frame in the main site. It seems like a total waste of money, but if they are frames, is there an actual risk?
Technical SEO | | gravityseo0 -
Indexing of flash files
When Google indexes a flash file, do they use a library for such a purpose ? What set me thinking was this blog post ( although old ) which states - "we expanded our SWF indexing capabilities thanks to our continued collaboration with Adobe and a new library that is more robust and compatible with features supported by Flash Player 10.1."
Technical SEO | | seoug_20050