Initial Crawl Questions
-
Hello.
I just joined and used the Crawl tool. I have many questions and hoping the community can offer some guidance.
1. I received an Excel file with 3k+ records. Is there a friendly online viewer for the Crawl report? Or is the Excel file the only output?
2. Assuming the Excel file is the only output, the Time Crawled is a number (i.e. 1305798581). I have tried changing the field to a date/time format but that did not work. How can I view the field as a normal date/time such as May 15, 2011 14:02?
3. I use the symbol in my Title. This symbol appears in the output as a few ascii characters. Is that a concern? Should I remove the trademark symbol from my Title?
4. I am using XenForo forum software. All forum threads automatically receive a Title Tag and Meta Description as part of a template. The Crawl Test report shows my Title Tag and Meta Description as blank for many threads. I have looked at the source code of several pages and they all have clean Title tags and I don't understand why the Crawl Report doesn't show them. Any ideas?
5. In some cases the HTTP Status Code field shows a result of "3". Why does that mean?
6. For every URL in the Crawl Report there is an entry in the Referrer field. What exactly is the relationship between these fields? I thought the Crawl Tool would inspect every page on the site. If a page doesn't have a referring page is it missed? What if a page has multiple referring pages? How is that information displayed?
7. Under Google Webmaster Tools > Site Configurations > Settings > Parameter Handling I have the options set as either "Ignore" or "Let Google Decide" for various URL parameters. These are "pages" of my site which should mostly be ignored. For example a forum may have 7 headers, each on of which can be sorted in ascending or descending order. The only page that matters is the initial page. All the rest should be ignored by Google and the Crawl.
Presently there are 11 records for many pages which really should only have one record due to these various sort parameters. Can I configure the crawl so it ignores parameter pages?
I am anxious to get started on my site. I dove into the crawl results and it's just too messy in it's present state for me to pull out any actionable data. Any guidance would be appreciated.
-
Good question. There are a few ways of doing it but I'd advise using a canonical URL on each page to tell the search engines where the content stems from. I had a quick look at XenoForo and this looks relatively simple to do... although make sure you test things thoroughly just in case
-
Thank you very much for the detailed reply.
For #1, I did start my campaign and I will follow up.
2. That worked perfect!
3. Thank you for the information.
4. I realize the problem. It appears the crawler differentiates on the slightest difference in a URL. There are many pages which it shows ending with a slash "/" but those pages are often linked to without an ending slash. The latter pages do not show their Titles nor Meta tags in the crawler report. I presume this is just a crawler issue and would not affect SEO performance.
5. I checked the cell formatting and it is "General" which should be fine. All of the rest of the HTTP Status codes appear normally. What I did notice is that all of the "3" codes refer to attachments. Most attachments show a "3" code, but a few show as 301s.
6. Good to know, thanks for sharing.
7. My main follow up question would be, is there any harm to setting up in robots.txt to disregard all parameter URLs? Basically I want to clean things up, and all of those URLs which are style or sorting variations aren't helpful to any crawler, and those pages shouldn't be indexed.
-
I can help with a few of those:
1. Looks like you're using the crawl tool. If this is for an on-going project, go to http://www.seomoz.org/campaigns and set one up. That way you get a sexy GUI (if you like robots that is) and weekly crawls / rank tracking.
2. That number is almost certainly a UNIX timestamp. To convert it inside excel use the formula below (don't forget to format the cell as a date, otherwise you just see a random number!):
=(A1/86400)+25569+(-5/24)
3. I wouldn't worry about that at all - the crawler converts any non-standard characters to ASCII but, as far as I know, it won't affect your SERP performance.
4. Could you give a few examples of the pages that are affected so I can take a look?
5. That's either a bug or (not too likely but worth checking) an issue with how the numbers are formatted in your spreadsheet. I'd advise opening the file using a text editor to check that the numbers that excel shows match up with the raw format and, if they do, submitting a bug report to the SEOMoz team.
6. The referrer cell tells you how the crawler got to that page. If you don't have any internal links to a page on your site then, chances are, the crawler won't find it. The only caveat to that (and I'm not 100% sure so would need confirmation) is that if the crawl tool uses external linking data. I'd always assumed it didn't but SEOMoz will know where some of your pages are even if you don't link to them internally as external sites will point to them. If that's the case it could be the reason that the referrer cell is blank.
7. Remember that this is SEOMoz crawling your site, not Google. Anything you set in Webmaster tools isn't visible by other search engine spiders such as those used by Bing, Yahoo!, SEOMoz, Majestic, etc. Because of that they won't know how to handle your URL parameters. You're best setting this through either a meta robots tag, robots.txt, or .htaccess (depending on what you're trying to do). Be careful though - if you mess it up there's a strong possibility that you'll end up blocking pages that you want the search engines to be able to access!
Hope that's all helpful... give me a shout if there's anything else.
- Matt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Questions about travel industry keywords
My question is regarding my current learning of SEO and keyword research. I have been leaning towards Keyword Everywhere, SEMrush, and Google Keyword Planner for keyword research. What do you recommend? I know you might suggest Moz, but I'm looking for something that will enhance and help me find valuable keywords for the travel niche. If I sign up with Moz, will someone from the team help me with keyword research for the travel niche and my website? I haven't found any resources or lessons specifically focused on travel websites and their keywords.
Moz Pro | | RavenBhutan0 -
A question on keywords that rank 51+
Good afternoon everyone. I wanted to pose a question to the group about keywords and the "on-page optimization - grade a page tool." I have a list of keywords that I am trying to rank for. Some of them are not ranked in the top 50, so on the keyword ranking tool it gives you the 51+ message in the rank column. For the items that are ranked I can try to improve them by looking at grade a page and typing in the URL and keyword. It will then give me a score and suggestions on how to improve it. With that being said, is there an easy way to find out which pages I should be optimizing those keywords which rank at 51+ for, besides typing the keywords in Google and seeing what URL it associates with the specific keyword? I hope the question above is clear.
Moz Pro | | trumpfinc0 -
Moz crawl only shows 2 pages, but we have more than 1000 pages.
Hi Guys Is there anyway we can test Moz crawler ? it showing only 2 pages crawls. We are running website on HTTPS ? Is HTTPS is issues for Moz ?
Moz Pro | | dotlineseo0 -
API and bitflag question
I've googled and read and still confused on this. Regarding the link metric bit flags: http://apiwiki.seomoz.org/link-metrics I understand everything about how the API works and have it all working, except for this bit flag thing... And there's no example on this page of how to use the bit flags in an API request so it's hard to work it out. My current API call is: "http://lsapi.seomoz.com/linkscape/" & $db & "/" & $objectURL & "? SourceCols=" & $SourceCols & "&TargetCols=" & $TargetCols & "&Scope=" & $Scope & "&Sort=page_authority" & "&Filter=" & $Filter & "&AccessID=" & $accessID & "&Expires=" & $expires & "&Signature=" & $urlSafeSignature There are 4 Link Metric Bit Flags: 2, 4, 8, and 16. How do I use those in the above URL Do I use those in SourceCols? And there's about 15 Link Flag Definitions (ranging from 1 to 65536). So I use those in TargetCols? I've tested different variations and I get different responses but the responses aren't documented either. For example, if I get "luutrp":5.432777943351583,"luutrr":1.415861867916131e-14" in a response... what IS luutrp or luutrr. These acronyms are not outlined anywhere that I can find. This link: http://www.seomoz.org/ugc/the-busy-developers-guide-to-seomoz-bit-flags outlines an example but is using Cols (not SourceCols or TargetCols) so either this page is outdated or I'm missing something. Is there also a "Cols" parameter? Basically, I just want: - given a single URL, show me all links (page to page) pointing to that URL, and give me the same information that's displayed in Open Site Explorer (page authority, anchor text, title, etc, etc) but also give me some additional information such as, "is it on the same C block?" What's an example URL for the above request? Really hoping someone can shed some light on this under-documented API.
Moz Pro | | eatyourveggies0 -
Question about SEOMoz Pro and Root Domain vs. Subdomain tracking
I currently have two Pro campaigns set up. They are both tracking the root domains of two different e-commerce sites. I also am tracking three competitors for each company, in each campaign. I have those set up by subdomains, like so www.Competitor.com. So in my Historical link analysis I am getting MyRootDomain.com, compared to www.competitor1.com, www.competitor2.com and www.competitor3.com Is this a problem? Would it be better for me to switch my company campaigns to track subdomains too, or to switch my competitor tracking to root domains. This is probably pretty rudimentary, but it never even occurred to me until just now. I realize that if I switch to subdomains for my own company tracking this would necessitate setting up a completely new campaign. This would be a problem because I am maxed out on my 1,000 keywords. Last but not least, does the fact that I have been tracking my own site root domains compared to competitors subdomains mean all of my competitive domain and link analysis is, well, garbage?...because I haven't really been comparing the same things?
Moz Pro | | danatanseo1 -
Crawl Diagnostics returning duplicate content based on session id
I'm just starting to dig into crawl diagnostics and it is returning quite a few errors. Primarily, the crawl is indicating duplicate content (page titles, meta tags, etc), because of a session id in the URL. I have set-up a URL parameter in Google Webmaster Tools to help Google recognize the existence of this session id. Is there any way to tell the SEOMoz spider the same thing? I'd like to get rid of these errors since I've already handled them for the most part.
Moz Pro | | csingsaas0 -
Campaign web crawl has failed last 4 times
I have 4 websites setup in my pro dashboard. The only site that isn't getting crawled is an HTTPS site. It has worked for over a year, but the past 4 crawls (an entire month now) has returned only one page crawled. Is there something going on with the crawler? I really need to be able to see these stats. Has anyone else experienced this issue?
Moz Pro | | nbyloff0 -
When will be the 250 pages crawled limit eliminated?
Hi, I signed up yesterday for a SEOMoz Pro Account, and would like to know, please, when will be the 250 pages crawled limit eliminated? 🙂 Thanks in advance for your help!
Moz Pro | | Andarilho0