This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
Posts made by CleverPhD
-
RE: Can you use Screaming Frog to find all instances of relative or absolute linking?
-
RE: Can you use Screaming Frog to find all instances of relative or absolute linking?
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
RE: Robots.txt for Facet Results
Google ignores everything after the hash to start with, so you do not need to block it to finish with. It is a clever way to pass parameters without having to worry about Google getting lost.
-
RE: Strange rankings on new website
Two things I found when using the Wayback Machine to look at your site
- Did you setup 301 redirects from old to new content?
https://web.archive.org/web/20150225025506/http://advanced-driving.co.uk/
I pulled links to various and random advanced driving groups and they 404ed vs 301
http://www.advanced-driving.co.uk/advanced-driving-lessons/region5/
http://www.advanced-driving.co.uk/advanced-driving-lessons/region6/
Also on the page for driving lessons
https://web.archive.org/web/20150219123145/http://www.advanced-driving.co.uk/driving-lessons/
I pulled links to various and random advanced driving lessons and they 404ed vs 301
http://www.advanced-driving.co.uk/driving-lessons/norwich/
http://www.advanced-driving.co.uk/driving-lessons/liverpool/
Looks like you did not properly migrate the URLs from the old site to the new site. You should be seeing 404 errors in Google Search Console and so that can be a starting point to find what pages need to be updated. It sounds like you tracked rankings for pages on the old site, start with those URLs and make sure they are 301ed to the correct new URL.
- Site structure
On the new home page, you are not linking into any of the pages such as your Driver Instruction and Traffic Reports page. Likewise, your header and sidebars used to link into your content. Your new design is cleaner, but you pretty much blew up your previous internal linking structure. Google will use your internal links to find pages to crawl and also to determine what pages are important on the website. I see much much less of this on the new site. You may want to consider updating how your internal linking is structured so that you are showing your users and Google what your most important pages are.
Good luck!
-
RE: Removing duplicated content using only the NOINDEX in large scale (80% of the website).
Just seeing the other responses. Agree with what EGOL mentions. A content audit would be even better to see if there was any value at all on those pages (GA traffic, links, etc). Odds are though that there was not any and you already killed all of it with the noindex tag in place.
-
RE: Removing duplicated content using only the NOINDEX in large scale (80% of the website).
Couple of things here.
-
If a second Panda update has not occurred since the changes that were made then you may not get credit for the noindexed content. I don't think this is "cheating" as with the noindex, it just told Google to take 350K of its pages out of the index. The noindex is one of the best ways to get your content out of Google's index.
-
If you have not spent time improving the non-syndicated content then you are missing the more important part and that is to improve the quality of the content that you have.
A side point to consider here, is your crawl budget. I am assuming that the site still internally links to these 350K pages and so users and bots will go to them and have to process etc. This is mostly a waste of time. As all of these pages are out of Google's index thanks to the noindex tag, why not take out all internal links to those pages (i.e. from sitemaps, paginated index pages, menus, internal content) so that you can have the user and Google focus on the quality content that is left over. I would then also 404/410 all those low quality pages as they are now out of Google's index and not linked internally. Why maintain the content?
-
-
RE: How does Google Home Services Ads work?
Here is the information you need on qualification
https://support.google.com/ads/answer/6230381
Google will be performing a background and license check in addition to performing a "reputation assessment" of the business online. I would assume that if this business has good reviews on Yelp or Google that this would help.
A good overview of the program is here
http://searchengineland.com/google-home-services-ads-plumbers-locksmiths-san-francisco-beta-226469
The service just launched the end of August so it is pretty new
keep that in mind when taking your client through the process.
-
RE: How keywords and subfolders connect
My experience with subtopics in URL structure is that they are over-rated. Use the main category to help the user know where they are in the site and potentially about the topic that the page might be about. If you want to drop keywords, do it in the category or in the slug for the name of the page. You can work it in there and it gives you more flexibility. This also helps with making your page be closer to the root folder vs ending up being too far down in a folder structure.
When I have used sub-categories you always end up with content that could be in two different ones and so then you have to decide which one is the better one, etc etc. You will end up having to rework your URLs later due to issues with your sub-categories. The only way I have seen subcats work well is when you have something like /state/city/zip or something else where your end item is only in one cat, subcat, etc.
I would not do the /peanut-butter/ to redirect to /peanut-butter/subtopic-1 - that makes no sense from an organizational perspective. If peanut butter is not a strong enough category by itself, it should not be a category to start with. You need to rethink what your category topics are. Ideally, /peanut-butter/ is a keyword combo you want to rank for and has great traffic potential that converts. It should be a hub page for your site for that topic.
Find good categories, work in the keyword into that category or if not work it into the slug for the name of the page. If you want a good example, look at how the Moz site is setup. Also, remember that keywords in the URL are good for SEO, but you really get more bang for the buck for a good title and content and links into that content. Dont overthink the URL.
Good luck!
-
RE: List of SEO "to do's" to increase organic rankings
I agree with this. I would just add, make sure, that whatever company you work with, they can explain themselves to you in plain English vs Geek Speak. You need to understand what they are doing and why so that you can work with them on this project. Ideally you can learn and collaborate with them. SEO is not about sending the company off to work on stuff you dont understand and then they come back with results. They could be buying links on spam sites and while you get a short term gain, you end up with a long term ruined site reputation with Google. If they have issues being transparent, then walk away.
-
RE: Product Description Blurb on Category Page
It might or it might not. I would test it on a single page and see. The other thing you have to consider is that on the current category page you are just reposting duplicate content (descriptions) from the product page. Google may not consider that to be exceptional to start with. See if there is a way on the category page to post some original text on the category itself. Make it a real hub information page that people would want to share and link to.
-
RE: Should I block google indexing "search.php"
You want to block Google from any URL that produces a search result that is essentially a resorting or refiltering of a master list of search results that they have already crawled/indexed.
If you already have a set of pages that lets Google crawl all the pages in your site (could be all the products in your store, all the articles in your blog, etc), having Google crawl through variants of that same page causes a couple of problems. 1) You are wasting Google's time in spidering pages that it has already seen, vs having Google crawl your more important pages. Depending on how you have these setup, you may end up sending Google into an endless loop of non-important pages to crawl 2) You are creating pages that are generally low quality, have nothing truly original on them, they will not rank for anything anyway and may give the impression that your site consists of primarily low quality pages.
What I show Google is a single simple path to browse my content. For a blog this would be a chronological listing of articles that is paginated so that Google and the user can browse from my most recent to my oldest articles. For an ecommerce page, I might setup basic category pages, make sure the category pages have great content on them and then allow Google to crawl back through all the products based on those main category pages. If I have some products in 2 or 3 categories I do not sweat it. If on either of these examples, I show the user options to resort, filter, etc the results, I block Google with a nofollow or with robots.txt.
In your example, you already have "pretty" URLs by country and town, keep those, that will let Google and your users find your content and also provide context around that content. The crazy a$$ search URL you show is handy for your PHP code to give a search result, but would just waste Google's time. Unless you think it would be useful for a user to save the search URL results, I would see if there is a way to simply hide all the parameters from the user (submit the parameters using a POST vs a GET request for example) so that all they see in the URL result is /search/search.php
Good luck!
-
RE: Old pages STILL indexed...
Yea. If you cannot do it dynamically, it gets to be a real PIA, and also, depending on how you setup the 301s, you may get an overstuffed .htaccess file that could cause problems.
If these pages were so young and did not have any link equity or rank to start with, they are probably not worth 301ing.
One tool you may want to consider is URLprofiler http://urlprofiler.com/ You could take all the old URLs and have URL profiler pull in GA data (from when they were live on your site) and then also pull in OSE data from Moz. You can then filter them and see what pages got traffic and links. Take those select "top pages" and make sure they 301 to the correct page on the new URL structure and then go from there. URL profiler has a free 15 day trial that you could use for this project and get done at no charge. But after using the product, you will see it is pretty handy and may buy anyway.
Ideally, if you could have dynamically 301ed the old pages to the new, that would have been the simplest method, but with your situation, I think you are ok. Google is just trying to help to make sure you did not "mess up" and 404 those old pages on accident. It wants to give you the benefit of the doubt. It is crazy sometimes how they keep things in the index.
I am monitoring a site that scraped one of my sites. They shut the entire site down after we threatened legal action. The site has been down for weeks and showing 404s, but I can still do a site: search and see them in the index. Meh.
-
RE: Old pages STILL indexed...
Forgot to add this - just some free advice. You have your CSS inlined in your HTML. Ideally, you want to have that in an external CSS file. That way, once the user loads that external file, they do not have to download it multiple times so the experience is faster on subsequent pages.
If you were testing your page with Google site speed and they mentioned render blocking CSS issues and that is why you inlined your CSS, the solution is not to inline all your CSS, but to just inline what is above the fold and put the rest in an external file.
Hope that makes sense.
-
RE: Anybody experience with speeding up loading time for visitors from China mainland?
This may not be you, but simply due to what has been called "The great firewall of china" that slows things down. This is an article that looked at a bunch of sites that were non-Chinese companies that hosted outside of the mainland and did a lot of business in China.
https://www.internetretailer.com/2014/08/14/foreign-retail-web-sites-load-slowly-china
Catchpoint says sites hosted outside of China often load slowly, because they are slowed by the so-called Great Firewall of China. For example, sites with the .cn domain—which are almost always hosted within China—typically are more than twice as fast as sites with .com in their domain names, Catchpoint says.
Regardless, I would not disable GA as you need that data to improve and understand your site traffic. Facebook widgets can slow you down, but you have to determine if they help with your sales or not. Your best bet is to try and design your site to be as simple as possible. Cut down on all the Javascript menus etc. Design your site for a mobile experience, then expand out from there. Trim down your CSS file as much as possible. Don't use typekit or other services with custom fonts. Get lean and mean.
-
RE: I currently have a canonical tag pointing to a different url for single page categories on eCommerce site. Is this wrong ?
Just to clarify what is happening here, I looked at your examples links and here is what I see.
Your website has a home page (e.g. homepage.com) and site wide links in navigation etc to various categories such as
http://www.website.com/category-keyword1/
http://www.website.com/category-keyword2/
http://www.website.com/category-keyword3/
As I look at these what I will call "original" category pages, they have canonical links that link to the following pages (note I do not see this on any of your product pages or other pages on the site)
<link rel="canonical" href="http: website.com="" category-keyword2="" limit:9999"=""></link rel="canonical" href="http:>
<link rel="canonical" href="http: website.com="" category-keyword3="" limit:9999"=""></link rel="canonical" href="http:>
The URLs with the limit:9999 also show a 200 if you visit them, are a duplicate page and canonical to themselves.
This is not good. What you are telling Google is that for each of your "original" category pages that you link to extensively with your internal link structure, that the actual (aka canonical) page is the URL with the limit:9999.
I would say that you did not need the canonical to start with, but now that it is there, here is how you fix it.
-
on all the original category pages (i.e. http://www.website.com/category-keyword1/) you need to add a canonical to self. Just update the canonical tag and remove the "limit:9999" There is somewhere in your CMS that is doing this, you may need a dev to help. You have to absolutely do this.
-
on all the limit:9999 pages you have 4 possible options that you can do. I put these in order of preference with option a being your best approach, option b your second best, and so on. Therefore, if you cannot do option a, then try option b, and so on.
a) 301 redirect the limit:9999 pages to the original category pages
b) set the canonical on the limit:9999 pages to the original category pages
c) 404 the limit:9999 pages
d) block the limit:9999 pages in robots.txt, but be careful that you do not block the original pages. Search Console has a great robots.txt testing tool for figuring this out.
Good luck!
-
-
RE: Old pages STILL indexed...
General rule of thumb, if a page 404s and it is supposed to 404 dont worry about it. The Search Console 404 report does not mean that you are being penalized although it can be diagnostic. If you block the 404 pages in robots.txt yea, it will take the 404 errors out of the Search Console report, but then Google never "deals" with those 404s. It can take 3 months (maybe longer) to get things out of Search Console, I have noticed it taking longer here lately, but what you need to do first is ask the following questions
-
Do I still link internally to any of these /product/ URLs? If you do, Google may assume that you are 404ing those pages by mistake and leave them in the report longer as if you are still linking internally to them they must be a viable page.
-
Do any of these old URLs have value? Do they have links to them from external sites? Did they used to rank for a KW? You should probably 301 them to a semantically relevant page then vs 404ing and getting some use out of them.
If you have either of the above, Google may continue to remind you of the 404 as it thinks the page might be valuable and want to "help" you out.
You mention 5,000 URLs that were indexed and then you 404 them. You cannot assume that Search Console works in real time or that Google checks all 5,000 of these URLs at the same time. Google has a given crawl budget for your site on how often it will crawl a given page. Some pages they crawl more often (home page) some pages they crawl less often. They then have to process those crawls once they get the data back. What you will see in a situation like this is that if you 404 several thousand pages, you will first see several hundred show up in your Search Console report, then the next day some more, then some more, etc. Over time, the total will build and then may peak and then gradually start to fall off. Google has to find the 404s, process them and then show them in the report. You may see 500 of your 404 pages today, but then 3 months later, there may be 500 other 404 pages that show up in the report and those original 500 are now gone. This is why you might be seeing 404 errors after 3 months in addition to the examples I gave above.
It would be great if the process were faster and the data was cleaner. The report has a checkbox for "this is fixed" and that is great if you fixed something, but they need a checkbox for "this is supposed to 404" to help clear things out. If I have learned anything about Search Console, it is helpful, but the data in many cases is not real time.
Good luck!
-
-
RE: How to make this informational site successful
Option 1 - If you want to reach out to the mainstream - Tier your information out. I have a site that is meant for the general public on a given health issue. If you have the health professionals write the articles, they are way too technical for the general public, but it is information that the public needs (and ultimately wants). We hired a journalist/editor to work with editors on writing things that go on the everyday blog type items and social media. The health professionals give input into the topics etc, but we let the journalist/editor do the writing. The articles are fact checked to make sure that they are accurate, but we try not to edit too much. We then have a second level of content that is more advanced and is really the reference section of the site. When topics get too complex for the blog, we link to the reference articles if they need to read more. It is a classic hub and spoke type setup, but we find it works for blog vs reference type of articles. On the reference articles, we do get a little more technical, some more than others. We do not feel that this is "dumbing it down" per se, but making it more accessible.
Option 2 - If you only care about the experts, play to that niche and see if you can find topics that may have low search volume, convert really well. Give the site more of an exclusive feel to it. You actually may be surprised at how "non-experts" want to find and read that information. If you layered this in with Option 1, you could hit both audiences potentially.
Good luck!
-
RE: We're considering making notable changes to our website's navigation. Other than 301 redirects from old pages to new, what do I need to consider with this type of move or update?
Go through everything on your website navigation, content, XML sitemap and find links to those URLs. Make sure that your internal links are all updated in addition to having the 301s in place.
Lets say all the old URLs are in the folder /stuff/. You can setup a spider like Screaming Frog to spider your current site and let you know all the pages that link to internal URLs with /stuff/ in them using the Custom Search Feature. This will let you know all the pages internally that you need to update the links on. You can also generate a list of all the /stuff/ pages you link to internally for testing later.
Once you make updates to your site with the links and 301s you can the use the spider to check things 2 ways. Ideally you would first do this on a development server, test and then go live and test again once you are live.
-
Have the spider go through your site (spider mode) and your XML sitemap and make sure there are no links to /stuff/ and/or that it finds no internal 301s.
-
Have the spider go through the list of old /stuff/ URLs (list mode) and make sure they all 301 to the correct page.
You could go a step further and use OSE (Majestic, ahrefs, etc) or the data from Google Search Console to find external sites that link to your old /stuff/ pages and do two things. 1) If the link is from an authoritative site ask them to update the link. 2) Cross check all the links to /stuff/ pages to see if there were any that you missed in your internal audit to make sure that those 301 to the correct page.
This all assumes that you are doing a 1 to 1 redirect from your old pages to new pages, i.e. you are keeping the content all the same on the old and the new pages and just updating the URL. If you have any old content that may not have links or are of low quality you may want to consider a content audit and let those 404/410.
-
-
RE: Issues with Duplicates and AJAX-Loader
I would really need to see the page you mention to make sure I am following you, but I think one approach would be that when the page is called via AJAX, call the actual URL, not the one with the parameter. That way you do not have the 2 URLs that need to be canonicalized to start with. You would still need to test this with a spider program to make sure the URLs are found. I am thinking you would also need a sitemap or alternative navigation to allow the spiders to find the pages and get the cataloged.
All of that said, I have to be honest, my gut is telling me that if you are having to work this hard to get the spider to find the URLs correctly, then you may also have an issue with this design being too clever for what it is worth. You may need to rethink how you approach this. USA today uses a setup that seems similar to yours check it out http://www.usatoday.com/ When you click on a tile to view a story, there is an AJAX type overlay of the home page with the article on top. It allows you to X out and go back to the home page. Likewise from the article you can page through other articles (left and right arrows). While you do this, notice that USA today is updating with an SEO friendly URL. I have not tested this site spider wise, but just by the look of it they seem to have the balance correct.
Good luck!
-
RE: Customer Reviews inputted by a single person
I agree with all of the above, just to add another layer, if you take this from a Yelp perspective, they actually specify that you are not supposed to solicit reviews to start with so that the reviews you get are more objective and less biased.
http://www.yelp.com/guidelines
- Don't ask customers for reviews: Don't ask your customers to review your business on Yelp. Over time, solicited reviews create bias in your business listing — a bias that savvy consumers can smell from a mile away. Learn why you shouldn't ask for reviews.
I really like the option that Tim mentioned to put the reviews on your own site and mark up with schema so you get the chance at the rich snippet. We have seen some nice increases in CTR when we have done this.
-
RE: Researching search volume drop
If you wanted to look at relative search volume, you can look at Google Trends https://www.google.com/trends/ I would also see if you notice any trends in Google Search Console under Search Traffic > Search Analytics > Impressions
What your graph has me wondering is if this is an attribution issue with GA? On the grey line, Moz is simply taking your GA traffic that is tagged as organic and showing it in the graph. If you have an attribution issue in GA, organic traffic may be showing up as direct traffic. If there is anything wonky in the traffic attribution, GA will put it as Direct. You have this classic article by Groupon that was a good example of how organic can be attributed incorrectly. http://searchengineland.com/60-direct-traffic-actually-seo-195415
Look at your overall traffic in GA and then add a segment for organic traffic and then direct traffic. If your overall traffic is constant and you see organic going down while direct traffic is going up, you have your answer. As I understand it, this phenomenon is due to browser issues, so see if you have had more traffic recently from a given browser and that may give you another clue.
Another thing to check, you should be able to look at your organic traffic in GA and see if it is the same as Moz, or not. If not, ping the Moz folks to make sure your data from GA is coming in properly. May be some data import issues there.
My other guess here is that your ranking is ok, but your click rate has been jacked. Google Search console will show you CTR over time, and that may help. Look and see, did you change meta descriptions? Did you change up your schema markup so previously you had rich snippets in the SERP, but now you do not. You could potentially keep ranking, but loose CTR.
These are all things I would look at, but at this point, your guess is as good as mine. Looking through the above will probably prompt you to check other things that might give you an answer.
Good luck!
-
RE: Trailing slash
http://googlewebmastercentral.blogspot.com/2010/04/to-slash-or-not-to-slash.html
Rest assured that for your root URL specifically, http://example.com is equivalent to http://example.com/ and can’t be redirected even if you’re Chuck Norris.
Note the quote from John Mueller in the comments
_Lots of good advice from an older blog post that's still valid & relevant today. _
There may be some technical merit that Googlebot starts at the slashed version on the TLD, but I do not think there is any type of SEO advantage with that (if that is the case).
-
RE: Trailing slash
Trailing slash or no trailing slash - it does not matter, you just need to be consistent. I would also make sure that the no trailing slash 301 redirects to the URL with the trailing slash (or vice versa).
The only reason you need to look at trailing slashes is when you have a reporting system that needs the trailing slash to differentiate between folders. i.e. website.com/folder vs website.com/folder/ The first without the slash is a page within the root folder and the second is a foler within the root folder.
I am not aware of an "advantage" of having the slash vs not on the home page URL, per se though. You SEO company may reason that more people with link to you with a trailing slash (I am not aware of any data on average to support this - your mileage may vary), and if that is the case you are losing link juice through the 301 from the non slashed version to the slashed version on those links to your home page. 301 redirects work and pass equity as long as the two pages are semantically related, I would not think that a 301 from a non slashed to a slashed would cause any issue. Getting back to above, I am not sure that I see any reason to change what you have. Ask the SEO company what reason they have and have them make a real reason vs a generic "best practice" answer.
-
RE: 301 redirects aren't passing value.
I am seeing a double hop on the example 301
http://startupfashion.com/product/fashion-brand-line-sheet-template > 301 redirects to > http://startupfashion.com/shop/product/wholesale-line-sheet-template > 301 redirects to > http://startupfashion.com/shop/product/fashion-line-sheet-template > and the final page sends a 200.
I made some assumptions on your original URL structure (took out /shop/) looked around and found something similar on
http://startupfashion.com/product/fashion-designers-guide-creating-websites-sell >301> http://startupfashion.com/shop/product/fashion-designers-guide-creating-websites-sell/ >301> http://startupfashion.com/shop/product/fashion-designers-guide-creating-websites-sell > shows 200
The second instance is redirecting a slashed to a non-slashed version.
Your "category" URL has a typo in it
http://startupfashion.com/shop/catgory/fashion-business-guides-and-ebooks
I checked your sitemap
http://startupfashion.com/product-sitemap.xml
Does not have any links to the new product pages. They all reference the old.
You also have a category sitemap with different URLs than your new catgory URL
http://startupfashion.com/category-sitemap.xml
Just on my quick 10 min look, I think you need to
-
Double check your 301 redirects
-
Make sure there are no "old" links on your site to the old urls
-
Make sure the new URLs are properly linked to in your site structure (menus and XML and old blog posts).
It looks like your update may not be as "clean" as you realized.
-
-
RE: Duplicate Content - Bulk analysis tool?
I have not used this tool in this way, but have used it for other crawler projects related to content clean up and it is rock solid. They have been very responsive to me on questions related to use of the software. http://urlprofiler.com/
Duplicate content search is the project next on my list, here is how they do it.
http://urlprofiler.com/blog/duplicate-content-checker/
You let URL profiler crawl the section of your site that is most likely to be copied (say your blog) and you tell URL profiler what section of your HTML to compare against (i.e. the content section vs the header or footer). URL profiler then uses proxies (you have to buy the proxies) to perform Google searches on sentences from your content. It crawls those results to see if there is a site in the Google SERPs that has sentences from your content word for word (or pretty close).
I have played with Copyscape, but my markets are too niche for it to work for me. The logic here from URL profilers is that you are searching the database that most matters, Google.
Good luck!
-
RE: My 404 page is returning a 404
You need to look at the CSV version of your Moz crawl. It will show what pages link to it.
-
RE: My 404 page is returning a 404
How is http://domain.com/404 being found? You should not have any links to that page to start with.
P
-
RE: My 404 page is returning a 404
If Moz crawl is showing that it is finding your 404 page and it is showing a 404 response, that is good as this is what your 404 page is supposes to show. What you need to see (and you need to download the CSV version of the report in Moz for this) is what page(s) are pointing to the 404 page? Figure out if you need to have the links removed or updated or you possibly need to redirect them.
Search console 404s are different as Google may be finding those pages from other websites linking to yours. If there are old pages that are supposed to 404 then that is ok. Let them 404 and they will eventually go away in Search Console. If the pages are not supposed to 404 then you setup a 301 redirect etc.
-
RE: Sitemap Contains Blocked Resources
I would recommend that you try and get those pages out of your sitemap. If you look through the Google sitemap best practices, it states that the sitemap should be for pages that Googlebot can access.
http://googlewebmastercentral.blogspot.com/2014/10/best-practices-for-xml-sitemaps-rssatom.html
URLs
URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:
- Only include URLs that can be fetched by Googlebot. **A common mistake is **including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don't exist.
-
RE: Duplicate content
If you have to quote the law, then why not make the page more unique and provide more analysis around it. Why not add information from other laws and legal input. Nothing is ever 100% original. All modern science is built upon the "shoulders of giants", they reference previous works and expand from there, but very commonly, summarizing (known as a meta analysis) is a new way of looking at old data and is considered original and helpful.
Say you needed to quote the law on drunk driving in a particular city. You probably need to not just quote the law, but answer the question, "If I get pulled over for drunk driving, what should I do?" "If I need a lawyer what should I do?" "How do I find a good lawyer who specializes in drunk driving?" Show stats on how many drunk driving offenses occur in that particular city and suburbs. You get the idea. If it is appropriate quote the law and then link back to the government page that you found it on. Shoot a video with an expert talking about all these things - you get the idea.
None of the individual pieces are "original" per se, but pulling it all together is, and this is not only helpful to the user (who now does not have to spend all this time researching), but you have a great page that covers a nice range of keywords related to drunk driving laws. The page I mention above is a very linkable and shareable page on the topic.
Quote the law, link to the reference, but build content around it and you can potentially rank for it.
Good luck!
-
RE: How to switch brand domain and address previous use of domain
Howdy,
This brings into question if you should use the new domain name at all. You are going to have to start at a "negative" SEO standpoint and you may or may not be able to work your way out of it. Here is what I would do.
Setup Google Search Console and Google Analytics for the new domain. Start getting some data on how Google looks at this domain and if it sends traffic. If you do not have access to the old website setup, look at the Search Console errors to see what pages Google expects to see and get an idea of the URL structure. You can also use the wayback machine potentially for this. Search Console will also give you sites that link to this domain and what URLs they were pointing to.
Just because this domain has links from sites with significant spam scores according to Moz, does not mean that your new domain is penalized, it just has a higher potential.
Perform a link audit using the links you find in OSE, Google Search Console and any other tools such as Majestic, Ahrefs, etc. This will allow you to find all the bad links. Go ahead and disavow the low quality links at the domain level.
All the pages that these "bad links" pointed, just let them 404. I would let every referenced URL from the old site 404. Do not 301 redirect them to the home page or to new pages you have setup. The 301 will not pass any link equity unless they are semantically related, and it sounds like you are setting up a completely new site. Don't worry about all the 404 errors in Google Search Console. Just check them to make sure they are for pages from the old site, vs pages on the new site. The 404 errors will fade away. Likewise, any bad (and good) link equity to the pages are gone as they are going to pages that do not exist.
Some folks around here would say that if you simply 404 the old pages, you do not need to disavow, but you would not be able to do this for the home page. Plus, if you want a conservative, "belt and suspenders" approach to eliminate link equity from the old links, this has you covered.
Finally, even if you only have a little organic SEO on your current site, I would 301 redirect it to the new site to cover that base.
This will hopefully start you from zero, but just know that you will still have an uphill battle. Google has looked at this site before and had it associated with "Red Widgets" and so if the new site is about "Blue Bunnies" it may take a while for the basic classification to change in the Google system, let alone the impact of links etc. Really take some time to consider if you feel like the new domain name is that much better than your old domain name, or some other domain that is related to your current site and does not have a significant spam score.
-
RE: How to measure the penalty of duplicate content if we populate our provider bios on WebMD?
Thanks Andy for sharing that post!
-
RE: Anyways to pull anchor text?
Thanks Jay. If I look on the backlinks side, they all seem to have the same subdomain in some form or another. You would just need to setup the regex in Screaming Frog to look for just that keyword in the subdomain so it should match all the variants of it.
That said, ignore everything I just posted. I was thinking earlier, "Surely there is scraper software out there that does this already." I did not take the time to look. Your mention of Scrapebox reminded me of that.
Scrapebox has a separate addon that does this
http://www.scrapebox.com/anchor-text-checker
The ScrapeBox Anchor Text Checker allows you to enter your domain and then load a list of URL’s that contain your backlink. It will scan all the URL’s containing your link and extract the anchor text used by the websites that link to you.
-
RE: Anyways to pull anchor text?
Ok. Can you be more specific on what you are trying to accomplish with this data? I think that may help my understanding of what you are trying to do.
-
RE: Deindexed from Google images Sep17th
Bummer. This smells of a technical change that occurred on your site.
Check: robots.txt - are you blocking access to images? You can also look in Search Console and under Crawl use the Robots.txt tester and see if your image URLs fail there. It will show you where the issue is.
Check things like all your images got moved to a CDN and no 301 redirects from the old image URLs were put in place.
Talk to your dev and look at every ticket prior to Sept 17th and see if there is anything else that was changed.
The good news is that if this is something technical and you fix it quickly, you should recover.
Good luck!
-
RE: Website Redesign, 301 Redirects, and Link Juice
Great answer. A good tool to use for testing the 301s in bulk is Screaming Frog. Save a CSV list of your old URLs before you migrate. When you update sites, set Screaming Frog in list mode and it will show you where all the old URLs 301 to. Makes it really easy to test.
If you do have any sort of staging site to do this with, that would be optimal before you go live. If you do go live, I would make this the first thing you do to check those 301s. Screaming frog will quickly check a ton of them and give you some peace of mind.
Side note, the only way link juice is lost in a 301 is if you 301 to a page that does not have semantically related content to the original page. i.e. if you have a page on Red Widgets and you 301 it to a page on Blue Bangles, Google will not pass the juice as it sees you trying to manipulate the link juice. As you are using 301 redirect to a new URL with the exact same content, you should be fine, assuming the other points that Dirk mentions.
-
RE: Anyways to pull anchor text?
Screaming Frog can do this with custom extraction and list mode. If I am reading your question correctly, you have a list of URLs and what pages on your site that they link to.
You would upload the list of URLs into Screaming Frog so it knows what pages to scan and run it in list mode
http://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/#15
You would then use the custom extraction tool to grep for the ahref code that has a link to your domain
http://www.screamingfrog.co.uk/web-scraper/
You would need to plug in a regular expression to look for your domain (or versions of it) and then include the rest of the HTML tag that include the anchor text all the way through the ending .
You should then be able to import that data into a spreadsheet and use text to columns to split the anchor text into it's own column.
It is a little tricky as the regular expression may have to be tweaked depending on how other sites link to your site. Run the Frog on a test group of 10 or so to make sure it works. If you have a bunch of errors, take the error examples and tweak the regular expression based on those.
-
RE: URL / sitemap structure for support pages
Agree with Dirk. You can use links to show the structure more effectively than the URLs per se.
-
RE: Where are the crawled URLS in webmaster tools coming from?
Just to make complete. Google search console will list errors for pages with links coming from 3 general location
-
Crawling links on your website. Starting from somewhere on your site and going link to link.
-
Crawling links in your sitemap.
-
Crawling URLs from your site that do not exist anymore on your site or sitemap. I have seen Google keep things in memory and come back to hit pages again that are no longer from option 1 or option 2. If you used to have a bunch of 301 directs in place for an old version of your website and then your developer changes something to delete all those 301s and they become 404s, you will find those pages showing up as errors again. This is really useful as it can help diagnose the issue and you can fix it.
-
Crawling links from other sites. Sometimes, this is how links get crawled for #3.
Here is what really sucks about Search Console and I mean sucks big bananas if you are trying to diagnose an issue. If you look at your Search Console error page. You can click on the URL in the report, it will pop up a box and then you can click the tab "Linked From" and see what pages are linking to the URL in question. That is good! If you then download the CSV, all of that info is lost. If you have more than 20 errors to deal with, you do not have a practical way to manage things and see if there is a trend etc. Otherwise you are left with clicking a lot of links in the report and taking lots of notes and going a little insane.
Good luck!
-
-
RE: How to measure the penalty of duplicate content if we populate our provider bios on WebMD?
Just to follow-up on Russ' point, if you want to estimate cost. Contract out a couple part-time writers to go and do some web research on the providers and rewrite the bios/profiles. You will need someone from your internal team to supervise the part-timers who is familiar with the healthcare industry and writing to look through and make sure that what the writers put down is correct. This should take you 4-6 months. Your costs will be the 60-70% salary for the full-time person (as they will not just be doing this project), plus plan to pay about 20 bucks an hour for 20 hours a week from each part timer. You can adjust and get another (third) part-timer if you like for a bit more cost but faster results.
We did this for about 2,000 locations for a site I work on. We found that you would not want to have anyone doing this full-time as they would probably go insane and quality suffers. Find a way to break up the tasks so that persons spend part of the time researching, part time proofing the other's work and part time writing. Helps with a better output. Sure, you could use software to "spin" the bios, but they would come out looking like crapola. That was why we used people and were happy with the results.
We did see a significant jump in our organic traffic, so for us it was worth it. You may take a look and decide not to, but wanted to put this option out there.
-
RE: Do permanent redirect solve the issue of duplicate content?
Sometimes things will take a while to roll out of Search Console. It tends to be slow.
You may also want to put a self referencing canonical link on your final page
www.mysite.com/Main-category/SubCatagory/product-page.html
I have seen that help when Google gets "confused".
-
RE: My First SEO strategy - What's next?
Followerwonk will help you find influencers that may give you an opportunity to collaborate on additional content, spread your social media influence or even have those followers notice what you are doing and link to your site.
More here: https://moz.com/help/guides/followerwonk
Fresh web explorer is kind of like Google Alerts. You set it up to look for keywords that you want to build content around and your brand name. If there is some content that you see published on one of your keywords, reach out to the site and see if they would add you. If a site mentions your website but does not link to it, then ask for a link.
More here: <a title="https://moz.com/help/guides/research-tools/fresh-web-explorer" target="_blank">https://moz.com/help/guides/research-tools/fresh-web-explorer</a>
-
RE: Should I subscribe my site to bellnet.de?
Paying for links is against Google guidelines. Google does not look at the Moz DA and PA, so for all you know the link is worthless from the site. Spend the 50 bucks on making some great content on your site or upgrading your server to make your site faster.
-
RE: How to de-index old URLs after redesigning the website?
I respectfully disagree with all of the above. Please repeat after me, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
**Part 1 Internal links that 404s from Moz Crawl: **The 404s that show up in the Moz crawl are only going to be from an internal link on your website. The Moz crawl only looks at internal links and not links from other website. In other words, if you see 404s in your Moz crawl, that means, somewhere, you are linking to those pages and that is why the 404s are showing up. Download the CSV and you will find them in your Moz crawl. Other tools such as screaming frog, Botify, Deep Crawl, will show you a similar analysis.
Simple solution. Go through your code and remove the internal links on your site that direct the Moz crawler to those pages and the 404s will go away. (FYI this same approach will work for any internal 301s) These 404 errors in the Moz report are great diagnostic signals on where to fix your site. It is bad for users to click on a link within your website and get sent to a page that does not exist.
**Part 2 external links from Search Console: **The 404s that show up in Search console can come from your internal links on your site AND external links from other sites. Google will keep trying to crawl these links due to other sites linking to pages on your site and your own internal links. For internal link fixing - see suggestion above. For external links you need a different approach.
Look at the external links, where are they coming from? Are they from quality websites? Do they go to formerly important pages on your websites (ie pages that were good converters? If so, then use the 301 redirect to send them to the correct replacement page (and this is not always the home page). You get users to the correct page and also any link equity is passed along as well and this can help with your site rankings. If the link goes to former page on your site that was not any good to start with and the links that come into it are poor quality, then you just let the page 404. Tools such as Moz Open Site Explorer or Ahrefs or Majestic can help with this assessment - but usually you can just look at a site linking to you and tell if it is crap or not.
You need to consider the above regardless of if you want to get the pages that are 404ing in question out of the Google index as if you get Google to remove the page from the index, it will then see the internal link on your site and then find the 404 again. If you have removed the links to the 404 pages on your site, eventually Google will stop crawling them and drop out of the index.
Important note regarding the use of robots.txt. Blocking Google from crawling the 404s will not remove the pages from the index, Google will just stop crawling them. Google has to be able to crawl the URL to see the 404 and then see that it is a bad page and then remove the page from the index. Blocking with robots.txt stops Google from doing that. As soon as you take the page out of robots Google will recrawl and the 404 shows up again. Robots.txt treats a symptom that is a red herring, allowing the 404 to occur takes care of the issue permanently.
Dead pages are a natural part of the web. Let Google see the 404 (if it truly is a page that should 404 and has no link equity that should be passed along with a 301). Google will crawl the 404 several times, you will see it in search console several times. It is ok. You are not penalized for X number of 404s. You may lose ranking if you 404 a page that Google used to rank well, but this is just because Google will not keep a page highly ranked that does not exist :-). Help Google out by cleaning up your internal link structure so when it sees that you do not link to the page any more, then that is a signal that the page should 404. Google knows that due to the nature of the web, pages will time out on occasion and show an error. Google will continue to recrawl a page just to make sure, it wants to give you the benefit of the doubt. Therefore, you have to give clear directives by not linking to dead pages so that after Google double and triple checks the page, it will finally drop it. You will see the 404 in your Search Console for several months then it will eventually go away.
Hope that makes sense. Good luck!
-
RE: HTML snapshot creating soft 404
A side note first. Something to consider on transient content for job listings like this that I have used on job sites I have worked on and worked pretty well - The unavailable after meta tag
http://searchengineland.com/googles-matt-cutts-seo-advice-unavailable-e-commerce-products-186882
"The “unavailable_after” Meta tag will allow you to tell Google that a page should expire from the search results at a specific time. "
This way your pages would be removed from the index on the date you list and if you have also removed the links from your sitemap etc, Google may not need to crawl them and find the 404 and/or soft404 to begin with.
The soft 404 (according to Google) means your server is not showing a 404 server response for the HTML snapshot version. I would try fetch as Google on those pages to see what Google is seeing and that may help you diagnose the situation. I may be that your server is giving a different response than the 404 and Google is questioning it.