OK Robert,
First I'm going to tip my hat to Ryan, who has perfectly explained the fact that some of what you see in your site: search can be because the 301's have not yet been recognized by the search engine.
Second, an apology to Alan as I went right to the LAMP solution because of prior knowledge from a previous thread or two that you were going to be talking about .htaccess
Now...I will spell out a couple of things because I have a feeling that you are likely to come across them again in the future and quick recognition can often mean a lot of time saved.
So here goes.
When I first read your question, my little web developer antennae suddenly started twitching! When I hear that there are multiple versions of a file with different file names deployed on a server I generally suspect one of two things:
- The site has been developed from a standard Template package, or
- There has just been a little "untidiness" taking place in the development process.
In your example, the /contact.php was the original file deployed live to the server, then the /contact-us.php file was created to replace it (presumably for SEO purposes - debatable, but that is a whole other conversation). As I'm sure you can imagine, /contact is pretty common in template packages, although the biggest template producer out there is much easier to spot, as the pages in their templates are always in the format /index-1.htm etc. It may just be that the developer creates their own standard template from an original design and rather than pre-planning and creating the file names to maximize SEO, they create standard page names and change them later.
While there is nothing really wrong with either of these things (unless you are charging the client for an original design and buying a pre-designed template at a fraction of the cost), both methods do open up the way for mistakes and errors to occur. As a result, there are a few things to keep in mind if you are working this way -
- It is a much better idea to build on a development server so that none of the files that will become obsolete during the process will be indexed by search engines in the meantime. Tidy architecture, remove the obsolete files, test, then push to production.
- When changing file names it is ALWAYS better to re-name the existing file and do a global update of links rather than create a duplicate with a different name. As soon as you create two files, you open up the possibility of accidentally linking both files within the site. You could have /contact.php linked from the home page and contact-us.php linked from the footer for example. There is a danger here that should you decide to delete the unwanted file, you create broken links without knowing it, or you have duplicate content. Either way, you have to recognize the problem and either fix it, or put a 301 in place to catch it.
- NEVER hard code your links, because as soon as you change the name of the directory you placed your files in, you create a broken link! If you use relative links, the change of directory name will not matter.
I can see from Screaming Frog that some of the URL's for the pdf files have 301's in place, but it appears that the Redirect URL may also be hard coded to the /pdfs directory. The fact that they all return a 404 when the directory name is changed to match that section makes it purely a guess as to what is happening here. It seems both www and non www pdf's are returning 404's in the browser.
The picture is muddied a little by the fact that there appear to be internal URL rewrites in the mix as well (to produce those pretty URL's with trailing slashes). So, there are a few options as to why the pdf's are not accessible:
- They are not actually on the server at all (unlikely)
- The names of the pdf's themselves have been changed, so even if the URL rewrite is sending the request to the new directory, the file requested does not exist.
- The /pdfs directory has been named something completely different and the hard coding is the problem
- The /pdfs directory has been moved to another location within the site architecture
I tried guessing a couple dozen of the obvious options, but no luck I'm afraid
There is one other possibility, in that the internal URL rewrites and 301 redirects could be creating a problem for each other. I am not clever enough to identify whether this is the case without a hint from the code, but will ask the God of All Things Code (my Boss) if he can answer that for me when daytime arrives 8D
OK....this is now so long that I really need to read the whole thread back to see if I have forgotten anything! If I find something I have missed, or can find anything else when help arrives, I'll be back!
Hope it makes some sort of sense and ultimately helps,
Sha