Kill your htaccess file, take the risk to learn a little
-
Last week I was browsing Google's index with "site:www.mydomain.com and wanted to scan over to see what Google had indexed with my site. I came across a URL that was mistakenly indexed. It went something like this
www.mydomain.com/link1/link2/link1/link4/link3
I didn't understand why Google had indexed a page like that of mine when the "link" pages were links that were on my main bar which were site wide links. It seemed to be looping infinitely over and over. So I started trying to see how many of these Google had indexed and I came across about 20 pages. I went through the process of removing the URL's in Webmaster Tools, but then I wanted to know why it was happening. I had discovered that I had mistakenly placed some links on my site in my header in such a manner
If you know HTML you will realize that by not placing the "/" in the front of the link I was telling that page to add that link in addition to the URL that is was currently on. What this did was create an infinite loop of links which is not good
Basically when Google went to www.mydomain.com/link1/ it found the other links which then told Google to add that url to the existing URL and then go to that link.
Something like: www.mydomain.com/links1/link2/...
When you do not add the "/" in front of the directory you are linking too it will do this. The "/" refers to the root so if you place that in front of your directory you are linking too it will always assume that first "/" as the root then the url will follow.
So what did I do?
Even though I was able to find about 20 URL's using the "site:" search method there had to be more out there. Even though I tried to search I was not able to find anymore, but I was not convinced.
The light bulb went on at this point
My .htaccess file contained many 301 redirects in my attempt to try and redirect those pages to a real page, they were not really relevant pages to redirect too. So how could I really find out what Google had indexed out there for me since Webmaster Tools only reports the top 1000 links.
I decided to kill my htaccess file. Knowing that Google is "forgiving" when major changes to your site happen I knew Google would not simply just kill my site for removing my htaccess file immediately.
I waited 3 days then BOOM! Webmaster Tools was reporting to me that it found a ton of 401's on my site. I looked at the Crawl Errors and there they were. All those infinite loop links that I knew had to be more out there, I was able to see.
How many were there?
Google found in the first crawl over 5,000 of them. OMG! Yeah could you imagine the "Low quality" score I was getting on those pages? By seeing all those links I was able to determine about 4 patterns in the links. For example:
Now my issue was I wanted to keep all the URL's that were pointing to www.mydomain.com/link1 but anything after that I needed gone. I went into my Robots.txt file and added this
Disallow: www.mydomain.com/link1/link2/
Disallow: www.mydomain.com/link1/link3/
Disallow: www.mydomain.com/link1/link4/
Disallow: www.mydomain.com/link1/link5/
Now there were many more pages indexed that went deeper into those links but I knew I wanted anything after the 2nd URL gone since it was the start of the loop that I detected. With that I was able to have from what I know at least 5k links if not more.
What did I learn from this?
Kill your htaccess file for a few days and see what comes back in your reports. You might learn something
After doing this I simply replaced my htaccess file and I am on my way to removing a ton of "low quality" links I didn't even know I had.
-
Interesting post. Yeah, the .htaccess file is the most important file out there and it is easy to mess up (as I am sure most everyone has at one time or another).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving to TLS and disavow file
I'm considering the move to TLS/SSL obviously will be setting up the version in Search Console, do I need to re-upload the disavowal file previously generated before the move? Look forward to your response.
Technical SEO | | seoman100 -
Will it be possible to point diff sitemap to same robots.txt file.
Will it be possible to point diff sitemap to same robots.txt file.
Technical SEO | | nlogix
Please advice.0 -
Can someone interpret this entry in my htaccess file into english so that I can understand?
There are a number of entries in my htaccess and I'd like to understand what they are doing so that I can understand if they need to be there or not. So, can someone tell me what this says...in plain english? RewriteCond %{HTTP_HOST} ^legacytravel.com$ [OR]
Technical SEO | | cathibanks
RewriteCond %{HTTP_HOST} ^www.legacytravel.com$
RewriteRule ^carrollton-travel-agent$ "http://www.legacytravel.com/carrollton-travel-agent" [R=301,L] Thank you a million times in advance.0 -
Help creating a 301 redirect in my htaccess file
Hi Guys, I'm trying to build a 301 file with the file requirements: It should be visible only for Google and other Search Engine Agents. It will have a few direct redirects. A few URL must be dynamic redirect. For example each page the starts with olddomain.com/category and is not in the list of of direct redirects should be redirect for newdomain.com/category Here is my start point: #301 Starts here Set the agents RewriteEngine On
Technical SEO | | Felip3
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp Make the direct redirect. Redirect 301 /category/sub1 http://www.newdomain.com/category/sub1
Redirect 301 /category/sub2 http://www.newdomain.com/category/sub2 Redirect everything else Redirect 301 /category/* http://www.newdomain.com/category #End of my 301 Will that work how I want? is there anything wrong?0 -
301 redirect .htaccess
Hi guys I am working on some 301 redirects on an apache webserver and I'd like a bit of assistance in trying to get a specific type result: I want all addresses from domaina.com to be redirected to domainb.com in the same structure so domaina.com/folder/file will go to domainb.com/folder/file expect for 2 folders.
Technical SEO | | seobackbone
ie: DomainA.com --> DomainB.com
except domainA.com/folder1
and domainB.com/folder2 Can someone let me know how I can pull this off?0 -
Title tag not changing in Google. Can somebody take a look for me?
I'm using Yoast SEO plugin for the website. The website is http://www.emerypharmaservices.com. It appears on the webpage, the title tag is correct (home page should be Contract Laboratory Research Services for Analytical Chemistry and Microbiology), however, in Google it only says Emeryville Pharmaceutical Services. Could this be due to my settings? Please advise. Thank you
Technical SEO | | leopold49520 -
Does Google take a lot of notice of html and what a p class maybe called?
Hi there My client has p class="seoText” in his html Do you think this is something to get cleaned up? How much note does Google make of HTML? Will Google read this and 'think hold on a minute' do you think ... ? Its small I know but trying to capture any advantage we can at the moment - there are other bigger things we are working on! Thanks
Technical SEO | | Chammy0 -
How long does it take for customized Google Site Search to show results from pdf files?
The site in question is http://www.ejmh.eu I am pretty unsatisfied with the results I am getting from the Site Search provided by Google. We have over 160 pdf files in this subfolder: http://www.ejmh.eu/mellekletek The files are the digital versions of articles. When I search for content in those pdf files, Google does not show results. It does show results from older pages, dating back 1-2 years but it is certainly not showing anything from pdf files that I have just put up 3 weeks ago. My questions: If I place a Google Search on a site, does it not automatically display results from ALL the content in the root domain? Is there any correlation between how the Site Search is indexing the files and how Google is indexing the urls in general? Should I just wait and see whether site search performance improves or should I switch to another Search software like Zoom Search? It is vital to have a proper, high-quality search functioning on that site in the very near future. What are your experiences? Any tips are greatly appreciated.
Technical SEO | | Lauroca0