Thank for the endorsement, Christy! Funny, I only just now saw Rand's recent WBF related to this topic, but pleased to see my answer lines up exactly with his info.
P.
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Thank for the endorsement, Christy! Funny, I only just now saw Rand's recent WBF related to this topic, but pleased to see my answer lines up exactly with his info.
P.
You need to be aware, Jonathan, that there is absolutely nothing about a robots.txt disallow that will help remove a URL from the search engine indexes. Robots is a crawling directive, NOT an indexing directive. In fact, in most cases, blocking URLs in robots.txt will actually cause them to remain in the index even longer.
I'm assuming you have cleaned up the site so the actual spam URLs no longer resolve. Those URLs should now result in a 404 error page. You must confirm they are actually returning the correct 404 code in the headers. As long as this is the case, it is a matter of waiting while the search engines crawl the spam URLs often enough to recognise they are really gone and remove them from the index. The problem with adding them to the robots.txt is that is actually telling the search engines NOT to crawl them, so they are unlikely to discover that they lead to 404s, hence they may remain in the index even longer.
Unfortunately you can't use a no-index tag on the offending pages, because the pages should no longer exist on the site. I don't think even a careful implementation of a X-Robots noindex directive in htaccess would work, because the URLs should be resulting in a 404.
Make certain the problem URLs return a clean 404, use the Google Search Console Remove URLs tool for as many of them as you can (for example you can request removal for entire directories, if the spam happened to be built that way), and then be patient for the rest. But do NOT block them in robots.txt - you'll just prolong the agony and waste your time.
Hope that all makes sense?
Paul
The one thing you haven't mentioned, which is likely to be most critical for this issue, is your XML sitemap. I couldn't find it at any of the standard URLs (/sitemap.xml and /sitemap_index.xml both lead to generic 404 pages). Also, there's no directive to the sitemap in your robots.txt.
Given that the sitemap.xml is the clearest and fastest way for you to help Google to discover new content, I'd strongly recommend you get a clean, dynamically updated sitemap.xml implemented for the site, submit it through both Google and Bing webmaster tools, and place the proper pointer to it in your robots.txt file.
Once it's been submitted to the webmaster tools, you'll be able to see exactly how frequently its being discovered/crawled.
Hope that helps?
Paul
Actually, if you already have 302 redirects in place, you're going to have to find the source of those 302s and change those to 301s, Joel. Simply adding an extra 301 redirect via a plugin is not going to fix your issue.
If you don't already have a plugin in place that is creating the 302 redirects, they are almost certainly being set in the htaccess file and must be corrected there.
Paul
The updating of Google Analytics stored data can be anywhere from 18 to 24 hours behind the real-time data. Did you just implement the event tracking? If so you're going to need to wait a day or so for it to be displayed by the "regular" part of Google Analytics.
Hope that helps?
Paul
PS Don't forget, if you used IP filters to remove your own IP address from registering data in Google Analytics, you'll need to be looking at an unfiltered view in order to see your own test events.
Google Webmaster Tools has upgraded their Fetch as GoogleBot tool to show you a visual representation of what the search crawler will see on your page, Tiffany. I'd absolutely start there, as it's Google's own tool, and you need to have a GWT account anyway.
Paul
Another quick update, gang. We're going to meet today in the lunch room. We'll meet at the fifth table from the front on the right side (near the washroom entrances). Look for the black Hawaiian shirt
Looking forward to seeing everybody!
Paul
SO here's the scoop... Between here and the MozCon Facebook Group we've got a number of folks interested, so we're going to grab a table together for lunch on Tuesday at the conference. I'll try to mark it with something noticeable, or you can tweet me @thompsonpaul and I'll let you know where to find us. Hope to see lots of us there!
Great to hear, Jesse. Look forward to meeting you. Will let you know if it looks like a lunch group will happen.
Great stuff, Irving! Will keep an eye out for you. I'll let you know if enough folks chime in to make up a lunch group.
[Update] Between here and the MozCon Facebook Group we've got a number of folks interested, so we're going to grab a table together for lunch on Tuesday at the conference. I'll try to mark it with something noticeable, or you can tweet me @thompsonpaul and I'll let you know where to find us. Hope to see lots of us there! [/Update]
Hey gang - I'm wondering which Q&A regular (or irregular!) contributors are coming to MozCon next week? I know Dana Tan and Marie Haynes are for sure.
So who else is coming? If there's a bit of a gang of us, it would be fun to connect up for a lunch table or a drink at some point during the event to meet each other face-to-face.
Let me know in the comments, and if there are enough of us, I'll set something up.
Paul
To implement a canonical tag for an individual page/file in IIS, you need to insert a custom response header via an outbound rule in the IIS Rewrite module, not through the web.config.
Sorry I don't have a specific example handy (haven't had to wrassle with IIS in some time). I'll see if I can dig one up.
Meanwhile, here's a link to the relevant section of the general Rewrite Module info in case maybe Alan can suggest the specifics.
Paul
Most welcome Tom. Happy to assist.
P.
Gotcha. Then Dana's suggestions will be good. The challenge you'll probably encounter is one of matching cost for an audit against the business value. The companies recommended by Moz tend to be at the higher end of cost, because they're large companies with plenty of experience and expertise.
Paul
Are you looking for tools that provide in-depth data so you can do the SEO review/audit yourself, Gordian? Or are you looking for companies/consultants that will do the whole audit for you? Just wanted to clarify.
Paul
This is the same kind of issue you encounter from any system that uses URL parameters to add additional info to a URL, Tom. (For example, even Google Analytics uses additional URL parameters to track incoming campaigns, and those can be indexed as additional URLs.
These extra URLs can cause duplicate content issues, and the first line of defence is to ensure that each of your pages includes a rel=canonical meta tag in its header. As long as the page refers to itself in the rel=canonical, any new URL that just adds parameters will include the canonical tag pointing back to the non-parameter version. (This is a best practice in WordPress anyway, and can be enabled using many of the WordPress SEO plugins.)
As an additional step, you can use Google and Bing Webmaster Tools to manually tell those two search engines to ignore the extra parameters.
Hope that helps?
Paul
P.S. The best solution for this issue is avoid having BackupBuddy need to add those parameters in the first place. They're created by BackupBuddy having to use the alternate cron process, which in turn is only necessary if your host has disabled Loopbacks. If loopbacks are in fact enabled, you can turn off the alternate cron in your wp-config file and the extra URL parameters will no longer be generated.
To answer your specific question, Jason, yes, there's an issue with those URLs going through two consecutive redirects.
Each redirect, like any link, costs a little bit of "link juice". So running through two consecutive redirects is wasting twice as much link juice as if the origin URL redirects immediately to the final URL without the intermediate step. It's not a massive difference, but on an e-commerce site especially, there's no point in wasting any. (Some folks reckon the loss could be as high as 15% per link/redirect.) Plus, I've occasionally seen problems with referrer data being maintained across multiple redirects (anecdotal).
Hope that answers your specific question?
Paul
Just to follow up on your last question about 404s, Kim...
No, having a bunch of 404s like that will be no more work for the server than if they were landing on actual blog pages - in fact somewhat less work as the 404 page generally has less content and far fewer database calls.
Also, a page timing out due to server load (server working too hard) doesn't generally result in a 500 error, it just returns a timed-out error. 500 errors are delivered when something actually breaks the server's ability to deliver the correct page content.
Paul