Mod rewrite question
-
Sorry in advance if this isn't the best place to ask this question.
Google Webmaster Tools has recently identified a ton of "Not Found" pages, which are actual pages with some digits appended at the end.
For example, suppose an actual page on my blog is:
(A) http://www.example.com/blog/2012/09/my-post-title/
This page works just fine.
However, GWT has identified the following page as a "not found" page:
(B) http://www.example.com/blog/2012/09/my-post-title/9157586677/1846732913010
This appears to be happening to hundreds of posts on my site. In each case, the "9157586677" portion of the URL is identical, but the remaining 13 digits change from page to page.
I haven't been able to determine exactly what is causing this to happen - it's probably a social plug-in for Wordpress, or perhaps Disqus, but I'm not sure which one. I'll go through a process of elimination to narrow it down over the coming week.
As a quick fix, I'd like to create a ModRewrite rule so that requests for (B) get 301 redirected to (A). Since there are hundreds of posts, I need to do this in a way that works regardless of what's in the "/2012/09/my-post-title/" part of the URL.
Unfortunately, mod-rewrite is outside of my area of expertise. Can somebody please suggest how I can handle this? Thanks in advance.
PS - As for tracking down the cause, I've looked at the source of the pages in the "Linked From" area of GWT and the Not Found link is nowhere to be found. That is why I assume the bad link is being generated by some javascript that is a part of one of my plug-ins.
Update: It seems like Disqus is the source of these phantom links. There's considerable discussion here. I'll continue searching for a long-term solution. Meanwhile, I'd still appreciate help with the mod-rewrite question above. Thanks again.
-
I've found a solution and am posting it here in case anybody else is having the same problem:
RewriteRule ^([0-9]{4})/([0-9]{2})/([^/]+)/[0-9]+ /blog/$1/$2/$3/ [L,R=301]
-
I hadnt seen the update over Disquss at the end of the post.
Please, post all your advances on this topic Ahirai
Best regards!
-
Hi ahirai,
I was gonna say you should check the linked from tab in GWT but since you actually did it, for me its pretty sure that a plugin that drives content is creating this issue from scratch.
Since i´m neither an apache expert, i can´t give you a method to do the dirty work, but i can tell you the problem is created by some 3rd party plugin driving content of site.
Please, post your advances in the topic!
Good luck!!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content question
Hey Mozzers! I received a duplicate content notice from my Cycle7 Communications campaign today. I understand the concept of duplicate content, but none of the suggested fixes quite seems to fit. I have four pages with HubSpot forms embedded in them. (Only two of these pages have showed up so far in my campaign.) Each page contains a title (Content Marketing Consultation, Copywriting Consultation, etc), plus an embedded HubSpot form. The forms are all outwardly identical, but I use a separate form for each service that I offer. I’m not sure how to respond to this crawl issue: Using a 301 redirect doesn’t seem right, because each page/form combo is independent and serves a separate purpose. Using a rel=canonical link doesn’t seem right for the same reason that a 301 redirect doesn’t seem right. Using the Google Search Console URL Parameters tool is clearly contraindicated by Google’s documentation (I don’t have enough pages on my site). Is a meta robots noindex the best way to deal with duplicate content in this case? Thanks in advance for your help. AK
Technical SEO | | AndyKubrin0 -
Webmaster tools question
Hi all. I have a question regarding http vs https. I have an https site and was wondering how to tell google in Webmaster tools to combine and use https. I have setup all sites in Webmaster tools. Both www and non www for both http and https. I see where to set up the www vs the non www but don't quite understand how to do the https part. I want all traffic to: https://www-creative -technology-solutions.com Thanks
Technical SEO | | twoacejr0 -
Duplicate Content Question (E-Commerce Site)
Hi All, I have a page that ranks well for the keyword “refurbished Xbox 360”. The ranking page is an eCommerce product details page for a particular XBOX 360 system that we do not currently have in stock (currently, we do not remove a product details page from the website, even if it sells out – as we bring similar items into inventory, e.g. more XBOX 360s, new additional pages are created for them). Long story short, given this way of doing things, we have now accumulated 79 “refurbished XBOX 360” product details pages across the website that currently, or at some point in time, reflected some version of a refurbished XBOX 360 in our inventory. From an SEO standpoint, it’s clear that we have a serious duplicate content problem with all of these nearly identical XBOX 360 product pages. Management is beginning to question why our latest, in-stock, XBOX 360 product pages aren't ranking and why this stale, out-of-stock, XBOX 360 product page still is. We are in obvious need of a better process for retiring old, irrelevant (product) content and eliminating duplicate content, but the question remains, how exactly is Google choosing to rank this one versus the others since they are primarily duplicate pages? Has Google simply determined this one to be the original? What would be the best practice approach to solving a problem like this from an SEO standpoint – 301 redirect all out of stock pages to in stock pages, remove the irrelevant page? Any thoughts or recommendations would be greatly appreciated. Justin
Technical SEO | | JustinGeeks0 -
Drupal Question
So on our site we have a plugin for our fan gallery. The issue is that I am getting a lot of duplication errors and it's saying the URL is too long and all the errors are coming from the Fan Gallery, which has over 8,000 errors. It seems to be pulling a long form query URL that has over 100 characters. You can't physically see it on the site, but the crawlers can. Anyway I'm trying to figure out a fix for this. One method would be to just stop those pages from being crawled, but I would hate to do that as the fan gallery for us would be a great source of links and content. So I'm wondering if anyone else has had an issue with these types of plugins before where the user can upload a photo or do a video embed and then it submits to the site. If you have a better method please let me know. I usually work on E-comm platforms so my experience with drupal is limited.
Technical SEO | | KateGMaker0 -
Technical SEO question re: java
Hi, I have an SEO question that came my way, but it's a bit too technical for me to handle. Our entire ecom site is in java, which apparently writes to a page after it has loaded and is not SEO-friendly. I was presented with a work-around that would basically consist of us pre redering an html page to search engines and leaving the java page for the customer. It sounds like G's definition of "cloaking" to me, but I wanted to know if anyone has any other ideas or work-arounds (if there are any) on how we can make the java based site more SEO-friendly. Any thoughts/comments you have would be much appreciated. Thanks!!
Technical SEO | | Improvements0 -
Canonical tags/wordpress permalink question
Need help: Do canonical tags do the exact same thing that wordpress already does with it’s permalink function? Or are these 2 separate things? thank you.
Technical SEO | | bonnierSEO1 -
Mobile Domain \ URL Structure SEO questions
Hi We are making a mobile site for our site for one of our partner sites and I would like to know which one of the following URL structure you recommand as far as SEO concerned? mobile.mywebsite.com or mywebsite.mobi Also, should I worry about duplicated content on my mobile site?
Technical SEO | | CookingCom0 -
Home Page Canonical Question
I have an online store through hosting service Volusion. I have asked them about this and was told that this is normal. I would like to confirm this with you guys because I'm not convinced of the quality of their customer service and I'm not an expert. When I check Analytics the landing page that is visited most often is www....../default.asp and the second most visited is www........./ . These are, of course, both my home page. Volusion has radio button that allows the admin to "enable canonical links", which I have enabled, and they told me that it is normal to see this on google analytics regardless. When I type in either of those addreses, the homepage comes up as the address that I typed. In other words it doesn't redirect so that it is always the same. Am I right to be concerned about this?
Technical SEO | | berglin0