Mod rewrite question

ahirai

Sorry in advance if this isn't the best place to ask this question.

Google Webmaster Tools has recently identified a ton of "Not Found" pages, which are actual pages with some digits appended at the end.

For example, suppose an actual page on my blog is:

(A) http://www.example.com/blog/2012/09/my-post-title/

This page works just fine.

However, GWT has identified the following page as a "not found" page:

(B) http://www.example.com/blog/2012/09/my-post-title/9157586677/1846732913010

This appears to be happening to hundreds of posts on my site. In each case, the "9157586677" portion of the URL is identical, but the remaining 13 digits change from page to page.

I haven't been able to determine exactly what is causing this to happen - it's probably a social plug-in for Wordpress, or perhaps Disqus, but I'm not sure which one. I'll go through a process of elimination to narrow it down over the coming week.

As a quick fix, I'd like to create a ModRewrite rule so that requests for (B) get 301 redirected to (A). Since there are hundreds of posts, I need to do this in a way that works regardless of what's in the "/2012/09/my-post-title/" part of the URL.

Unfortunately, mod-rewrite is outside of my area of expertise. Can somebody please suggest how I can handle this? Thanks in advance.

PS - As for tracking down the cause, I've looked at the source of the pages in the "Linked From" area of GWT and the Not Found link is nowhere to be found. That is why I assume the bad link is being generated by some javascript that is a part of one of my plug-ins.

Update: It seems like Disqus is the source of these phantom links. There's considerable discussion here. I'll continue searching for a long-term solution. Meanwhile, I'd still appreciate help with the mod-rewrite question above. Thanks again.

ahirai

I've found a solution and am posting it here in case anybody else is having the same problem:

RewriteRule ^([0-9]{4})/([0-9]{2})/([^/]+)/[0-9]+ /blog/$1/$2/$3/ [L,R=301]

roquelagellera

I hadnt seen the update over Disquss at the end of the post.

Please, post all your advances on this topic Ahirai

Best regards!

roquelagellera

Hi ahirai,

I was gonna say you should check the linked from tab in GWT but since you actually did it, for me its pretty sure that a plugin that drives content is creating this issue from scratch.

Since i´m neither an apache expert, i can´t give you a method to do the dirty work, but i can tell you the problem is created by some 3rd party plugin driving content of site.

Please, post your advances in the topic!

Good luck!!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Mod rewrite question

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Bing rankings question

Launching large content project - date-stamp question

Specific question about pagination prompted by Adam Audette's Presentation at RKG Summit

Indexation question

Question about duplicate content in crawl reports

How to find original URLS after Hosting Company added canonical URLs, URL rewrites and duplicate content.

Sub-domains for keyword targeting? (specific example question)

Robots.txt questions...