Mod rewrite question
-
Sorry in advance if this isn't the best place to ask this question.
Google Webmaster Tools has recently identified a ton of "Not Found" pages, which are actual pages with some digits appended at the end.
For example, suppose an actual page on my blog is:
(A) http://www.example.com/blog/2012/09/my-post-title/
This page works just fine.
However, GWT has identified the following page as a "not found" page:
(B) http://www.example.com/blog/2012/09/my-post-title/9157586677/1846732913010
This appears to be happening to hundreds of posts on my site. In each case, the "9157586677" portion of the URL is identical, but the remaining 13 digits change from page to page.
I haven't been able to determine exactly what is causing this to happen - it's probably a social plug-in for Wordpress, or perhaps Disqus, but I'm not sure which one. I'll go through a process of elimination to narrow it down over the coming week.
As a quick fix, I'd like to create a ModRewrite rule so that requests for (B) get 301 redirected to (A). Since there are hundreds of posts, I need to do this in a way that works regardless of what's in the "/2012/09/my-post-title/" part of the URL.
Unfortunately, mod-rewrite is outside of my area of expertise. Can somebody please suggest how I can handle this? Thanks in advance.
PS - As for tracking down the cause, I've looked at the source of the pages in the "Linked From" area of GWT and the Not Found link is nowhere to be found. That is why I assume the bad link is being generated by some javascript that is a part of one of my plug-ins.
Update: It seems like Disqus is the source of these phantom links. There's considerable discussion here. I'll continue searching for a long-term solution. Meanwhile, I'd still appreciate help with the mod-rewrite question above. Thanks again.
-
I've found a solution and am posting it here in case anybody else is having the same problem:
RewriteRule ^([0-9]{4})/([0-9]{2})/([^/]+)/[0-9]+ /blog/$1/$2/$3/ [L,R=301]
-
I hadnt seen the update over Disquss at the end of the post.
Please, post all your advances on this topic Ahirai
Best regards!
-
Hi ahirai,
I was gonna say you should check the linked from tab in GWT but since you actually did it, for me its pretty sure that a plugin that drives content is creating this issue from scratch.
Since i´m neither an apache expert, i can´t give you a method to do the dirty work, but i can tell you the problem is created by some 3rd party plugin driving content of site.
Please, post your advances in the topic!
Good luck!!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Geo Targeting Content Question
Hi, all First question here so be gentle, please My question is around geo targeted dynamic content; at the moment we run a .com domain with, for example, an article about running headphones and then at the end - taking up about 40% of the content - is a review of some people can buy, with affiliate links. We have a .co.uk site with the same page about running headphones and then 10 headphones for the UK market. Note: rel alternative is used on the pages to point to each other, therefore (hopefully) removing duplicate content issues. This design works well but it involves having to build links to two pages, in the case of this example. What we are thinking of doing is to just use the .com domain and having the product page of the page served dynamically, ie, people in the UK see UK products and people in US see US products. What are people's thoughts on this technique, please? From my understanding, it wouldn't be any problem with Google for cloaking etc because a googlebot and a human from the same country will see the same content. The site is made in Wordpress and has <....html lang="en-US"> (for the .com) in the header. Would this cause problems for the page ranking in the UK etc? The ultimate goal of doing this would be to reduce link building efforts by halving the number of pages which links would have to be built for. I welcome any feedback. Many thanks
Technical SEO | | TheMuffinMan0 -
301 redirect question
Hi Everyone When doing 301 redirects for a large site, if a page has 0 inbound links would you still redirect it or just leave it? Im just curious on the best practice for this Thanks in advance
Technical SEO | | TheZenAgency0 -
Url rewrite subfolder
Hi, How can i rewrite example.com/example1/example2/example3 to example.com/example3 And is there tools or software that can generate url rewrite... (not a plugin) Thanks !
Technical SEO | | bigrat950 -
Hard-working newbie question: benefit of moving my blog to my online store's domain?
Hi all, I've been running an online wine store in Switzerland for a month and have been working hard on SEO (I love learning about it). Anyway, for a couple of years prior to launching the store, I had been running a wine blog whose articles are ranking well in Google. I now want to link the two. My questions are: A) will the addition of the blog (store.com/blog) contribute to the store's domain authority (currently, the blog authority is higher than the site authority)? B) technically, can I 301 the whole blog to store.com/blog? Any help and tips would be appreciated. Thank you!
Technical SEO | | fkupfer0 -
Indexation question
Hi Guys, i have a small problem with our development website. Our development website is website.dev.website.nl This page shouldn't be indexed bij Google but unfortunately it is. What can i do to deindex it and ask google not to index this website. In the robots.txt or are there better ways to do this? Kind regards Ruud
Technical SEO | | RuudHeijnen0 -
Questions about Redirects
Hi, I am trying to make sure that I can determine if a site has a 301 redirect set up to redirect the site from domain.com to www.domain.com and am hoping that you can confirm the following for me, or let me know if I am off track: is http://www.internetofficer.com/seo-tool/redirect-check/ a reliable way to check if a 301 redirect is set up? is Screaming Frog SEO Spider a good tool to use to see if a redirect is in place? if I search for site:www.domain.com and site:domain.com, I should only get results for the site being indexed, not for the site that has the 301 redirect set up, right? For example, if www.domain.com is set up to redirect to domain.com, then I should get no search results for site:www.domain.com and only show indexed pages for domain.com. If I search for site:www.domain.com and site:domain.com and get results for both, then does this mean that the redirect is not set up? if a redirect is set up from www.domain.com to domain.com, should the crawl report should only show one page crawled on www.domain.com? if a crawl report shows same number of pages for www.domain.com as for domain.com, does that mean that redirect is not set up properly? Thanks in advance for your help! Carolina
Technical SEO | | csmm0 -
Robots.txt question
I want to block spiders from specific specific part of website (say abc folder). In robots.txt, i have to write - User-agent: * Disallow: /abc/ Shall i have to insert the last slash. or will this do User-agent: * Disallow: /abc
Technical SEO | | seoug_20050 -
Home Page Indexing Question/Problem
Hello Everyone, Background: I recently decided to change the preferred domain settings in WM Tools from the non www version of my site to the www version. I did this because there is a redirect from the non www to the www and I've built all of my internal links with the www. Everything I read on SEO Moz seemed to indicate that this was a good move. Traffic has been down/volatile but I think it's attributable mostly to a recent site change/redesign. Having said that the preferred domain change did seem to drop traffic an additional notch. I made the move two weeks ago. Here is the question: When I google my site, the home page shows up as the site title without the custom title tags I've written. The page that displays in the SERP is still the non www version of the site. a site:www.mysite.com search shows an internal page first but doesn't return the home page as a result. All other pages pop up indexed with the www version of the page. a site:mysite.com (notice lack of www) search DOES SHOW my home page and my custom title tags but with a non www version of the page. All other pages pop up indexed with the www version of the page. Any one have thoughts on this? Is this a classic example of waiting on Google to catch up with the changes to my tiny little site?
Technical SEO | | JSOC0