Welcome to the Q&A Forum

ThompsonPaul

You Tube video transcriptions can be edited by the channel's administrator, Dan. Essentially, you download the transcription file, correct it in a text editor, then re-upload it to replace the existing one.

On your My Uploaded Videos page, select the one you want to caption, and click Edit.
Click on** Captions and Subtitles** from the dropdown menu.
Under the Available Caption Tracks header, click the** Download **button beside English: Machine Transcription.
Download that file and edit it in a text editor to correct any errors. (You can change where the captions fall on the video by changing those time stamps, but be sure you keep the formatting)
When finished editing your transcript, save the file, maintaining the same file name.
Upload the corrected file back to YouTube by clicking Add New Captions or Transcripts under Add a Caption Track.

The joys of self-serve

Paul

ThompsonPaul

There was a Moz blog post published last year that gives a complete rundown of the Whiteboard Fridays production methods. Includes lighting, sound, editing, camera choice etc.

That should give you a great headstart.

Paul

ThompsonPaul

Thanks!

Typically these blacklists are created and maintained by security specialists who have done testing on the different bots to determine which are legit/beneficial and which are crapbots. They then provide these lists for others to use. Often the lists are amalgamations of bots detected and analysed on a number of different sites and by a number of different specialists to act as a double-check for each other.

You do need to be careful that you are using a well-curated list, as carelessly blocking bots can cause problems for legitimate bots. You would check out the creator of such a list the same way you'd check out the creator of a plugin you're considering using - check reviews, look at comments and responses on the post that provides the blacklist etc.

That answer your question?

Paul

ThompsonPaul

As Lynn mentions, these entries form a blacklist for "bad bots". These are bots that are identified as being harmful (or at least non-helpful) to the real use of a website. Bots are essentially spiders that crawl and record the pages of your site the same way the GoogleBot does.There are 2 main reasons for blocking them

Too many unnecessary bots can put a real strain on server resources, causing the site to slow down for real users. This can be especially problematic with bad bots as they do not respect the entries in your robots.txt file and so will crawl even blocked pages. This can mean huge numbers of extra pages get crawled, leading to even more load.
Many (most?) of these bots are collecting data for nefarious purposes. Some are scrapers to collect your site content in order to re-use it illegally on another site, some are scanning for certain files/plugins on your site known to be insecure so they can target them for attack, etc.

Best case scenario, these bots waste your bandwidth and can cause site slowdowns on low-powered (e.g. shared) servers. Worst case, they can actually cause harm to your site.

There are literally many thousands of these types of bots out there, and their creators often change their identifying user agents just to get around these types of blacklists. But many have been around for some time and still use the same identifier. So having a blacklist to block the most common of them is actually very good security practice. To be totally proactive however, you'd need to update the list every couple of months.

Bottom line - those entries are providing some security and overload protection for your site, and there's essentially no downside to having them in place even if they're not catching everything.

Hope that helps - if any of my explanation isn't clear, just holler

Paul

ThompsonPaul

"I'd have to quit my job and just drink full-time."

You say that like it's a bad thing?!

Paul

ThompsonPaul

That tool that Matt mentioned looked interesting, but it would have been painful to have to go through your site one page at a time.

As usual for crawling tasks like this, the paid version of Screaming Frog will do what you want. You can tell it to crawl your site looking for **href="yoursite.com **to find all occurrences of absolute internal links. You'd have to do a bit of regex magic to get it to find the relative links, but since by their nature a relative link will work even with the domain change, not sure why you'd be looking for those.

Or you could just do a find and replace of the URL string using something like phpMyAdmin directly in your database. That would be fastest as it would find & replace in one go, instead of having to manually edit each page.

Is this a WordPress site, there's a plugin specifically for finding and automatically updating these links. (It basically automates and puts a UI on the phpMyAdmin process mentioned above.)

Any of those ideas help?

Paul

ThompsonPaul

Hey Oscar - my second language is French and I couldn't possibly have asked the question in my other language as well as you have here, so don't worry about the language issue at all!

For your question about changing the index.htm to something with a keyword: no, that would have absolutely no effect. As long as the redirect was with a 301 code, the search engines and the visitors will never know that page even exists. That's the whole purpose of the 301-redirect. it makes that page invisible to the search engines.

I would also strongly advise not to switch to a keyword-rich subdomain. that's just asking for a lot of headaches and problems in the future.

What you are suggesting here is trying to create what is called and Exact Match Domain (EMD), meaning the title of your domain matches the keyword you are trying to rank for. The value of these has recently been lowered by Google anyway, and as I say, doing it using a subdomain will cause all kinds of hassles, not to mention confusion for your users.

If I were you, the area i would be focusing most of my efforts would be on building and earning links form other websites. At the moment, you essentially only have one other domain linking to you.

In addition, trying to rank with a very competitive general term like impresión digital will be very difficult for a new site. Better to use terms like impresión digital en Chihuahua and others related to where you offer your services. This is assuming that most of your customers are fairly local, not from all over the Internet.

Hope that answers your question? If not, be sure to let me know.

Paul

ThompsonPaul

Yup - Chris has the solution. The robots.txt disallow directive simply instructs the crawler not to crawl, it doesn't have any instructions regarding removing URLs from the index. I'm betting there are other pages linking in to the subdomains that the bots are following to find and index as the URL Removal requests are expiring.

Do note though that when you add the no-index meta-robots tag, you're going to need to remove the robots.txt disallow directive. Otherwise the crawlers won't make any attempt to crawl all the pages and so won't even discover most of the no-index requests.

Paul

[Edited to add - there's no reason you can't implement the no-index meta-tags and then also again request removal via the Webmaster Tools removal tool. Kind of a "belt & suspenders approach. The removal request will get it out quicker, and the meta-no-index will do the job of keeping it out. Remember to do this in Bing Webmaster Tools as well.]

ThompsonPaul

There's no real way to estimate how long the re-crawl will take, Ben. You can get a bit of an idea by looking at the crawl rate reported in Google Webmaster Tools.

Yes, asking for a page fetch then submitting with linked pages for each of the main website sections can help speed up the crawl discovery. In addition, make sure you've submitted a current sitemap and it's getting found correctly (also reported in GWT) You should also do the same in Bing Webmaster Tools. Too many sites forget about optimizing for Bing - even if it's only 20% of Google's traffic, there's no point throwing it away.

Lastly, earning some new links to different sections of the site is another great signal. This can often be effectively & quickly done using social media - especially Google+ as it gets crawled very quickly.

As far as your other question - yes, once you get the unwanted URLs out of the index, you can add the robots.txt disallow back in to optimise your crawl budget. I would strongly recommend you leave the meta-robots no-index tag in place though as a "belt & suspenders" approach to keep pages linking into those unwanted pages from triggering a re-indexing. It's OK to have both in place as long as the de-indexing has already been accomplished, as we've discussed.

Hope that answer your questions?

Paul

ThompsonPaul

You're right to be confused, B. The terminology is unfortunate and misleading.

To answer your questions

1. Yes

2. Yes.

A disallow in robots.txt does nothing to remove already-indexed pages. That's not its purpose. Its only purpose is to tell the search crawlers not to waste their time crawling those pages. Even if pages have been blocked in robots, they will remain in the index if already there. Even if never crawled, and blocked in robots.txt, they can still end up indexed if some other indexed page links to them and the crawlers find those pages by following links. Again, nothing in a robots.txt disallow tells the engines to remove a page from the index, just not to waste time crawling it.

Put another way, the robots.txt disallow directive only disallows crawling - it says nothing about what to do if the page gets into the index in other ways.

The meta-robots no-index tag however explicitly states to the crawler "if you arrive at this page, do not add it to the index. If it is already in the index, remove it".

And yea - as you suspected - if pages are blocked in robots.txt, the crawler obeys and doesn't visit those pages So it can't discover the no-index command to drop them from the index. Thus the only way a page could get dropped is if a crawler followed a link from an external site and discovered the page that way. A very inefficient way of trying to get all those pages out of the index.

Bottom line - robots.txt is never the correct tool to deal with duplicate content issues. It's sole purpose is to keep the crawlers from wasting time on unimportant pages so they can spend more time finding (and therefore indexing) more important pages.

The three tools for dealing with duplicate content are meta-robots no-index tags in a page header, 301 redirects, and canonical tags. Which one to use depends on the architecture of your site, your intended purpose, and the site's technical limitations.

Hope that makes sense?

Paul

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

ThompsonPaul

@ThompsonPaul

Posts made by ThompsonPaul

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved