I highly recommend buying the license for Screaming Frog, at $100/year, you won't find a more valuable SEO tool for the money. You won't find a free (and trustworthy) that will crawl a site that large.
Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Posts made by LoganRay
-
RE: Measuring the size of a competitors website?
-
RE: Why can I not add Schema Mark up to my homepage?
Correction on my earlier statement, reviews/ratings only apply to products.
However, you can use the organization markup if you've got that info on the homepage. This schema generator will build the code you'll need: https://webcode.tools/microdata-generator/organization.
-
RE: Why can I not add Schema Mark up to my homepage?
Do you have reviews on your homepage? On most sites, reviews are relative to specific products or services rather than the whole company. You would need to have an aggregate review score displayed on your homepage in order to be able to mark it up with schema. Schema is meant to identify data types that are visible on the site, it's not like meta data where it runs in the background for search engines to see and not people.
-
RE: Why can I not add Schema Mark up to my homepage?
Hi,
What kind of schema are we talking about here? There are tons of different data types you can mark up, most of which would be better suited for interior pages.
-
RE: What does Disallow: /french-wines/?* actually do - robots.txt
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
RE: What does Disallow: /french-wines/?* actually do - robots.txt
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
RE: What does Disallow: /french-wines/?* actually do - robots.txt
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
-
RE: Should I use noindex or robots to remove pages from the Google index?
Rhys,
Your web dev team is confused. You cannot de-index by simply disallowing them in your robots.txt file. Google will still index anything they find (that doesn't have a noindex tag) from a link, this is the reason you often see search results that say "A description for this result is not available because of this site's robots.txt" as the description.
Here's a quote from Google regarding the subject: "You should not use robots.txt as a means to hide your web pages from Google Search results." - https://support.google.com/webmasters/answer/6062608?hl=en
-
RE: Should I use noindex or robots to remove pages from the Google index?
Hi Tyler,
Yes, remove the robots.txt disallow for that section and add a noindex tag. Noindex is the only sure-fire way to de-index URLs, but the crawlers need to be allowed to crawl those pages to see the tag.
-
RE: Taxonomy question - best approach for site structure
Honestly, search engines aren't that particular about URL structure, it is important, but not to the degree where one of these two examples is going to make or break your SEO campaign. That being said, I usually set up my URLs with the broadest category in the first folder, and get more granular from there. In your first example, the assessment and treatment folders make more sense to me, since there's additional content that could live in each of those respective folders. In your second example, there's less opportunity for future content to live in those folders.
-
RE: URL has caps, but canonical does not. Now what?
I've had some run-ins with case-sensitive URLs in the past and it drives me crazy, I don't understand why CMSs still do that!! While canonical tags are a perfectly fine way to handle this, there's a better solution. Brian Love wrote a great blog post on how to do server-side URL lower-casing. I've used this on a few sites and it works great.
-
Thoughts on RankScience?
I'm sure most of you have heard about this startup, RankScience, that has big ambitions to disrupt the SEO industry with their automated (I know I know...the word 'automated' and 'SEO' in the same sentence!!!) optimization software. Their claim is that by running thousands of congruent A/B tests on your site, they can maximize rankings and organic traffic.
Initially my thoughts were "oh crap, there goes my (and a lot of other people's) career". But then I started thinking about it a bit more and realized a couple things. First, software can't replace a face-to-face client meeting. Being in an agency world as most of us are, client interactions are vital to a sustained partnership. Second, someone is going to have to understand what this software does, configure it, and monitor it, and I'm ok with that being part of my job if that's how the industry shifts. Third, and most importantly, in theory this software has the capability to reverse engineer search algorithms. If they had the data of 10,000 websites using their platform and are collecting data on what works and what doesn't, it's only a matter of time before they can pick apart the algorithm piece by piece to figure out exactly how it works. Google is obviously not going to like that very much and will almost certainly right the ship.
That's my 2 cents, looking forward to what your thoughts are on RankScience and the future of our industry.
-
RE: [Very Urgent] More 100 "/search/adult-site-keywords" Crawl errors under Search Console
Oh yea, I missed that. That's very strange, not sure how to explain that one!