Initial Crawl Questions
-
Hello.
I just joined and used the Crawl tool. I have many questions and hoping the community can offer some guidance.
1. I received an Excel file with 3k+ records. Is there a friendly online viewer for the Crawl report? Or is the Excel file the only output?
2. Assuming the Excel file is the only output, the Time Crawled is a number (i.e. 1305798581). I have tried changing the field to a date/time format but that did not work. How can I view the field as a normal date/time such as May 15, 2011 14:02?
3. I use the symbol in my Title. This symbol appears in the output as a few ascii characters. Is that a concern? Should I remove the trademark symbol from my Title?
4. I am using XenForo forum software. All forum threads automatically receive a Title Tag and Meta Description as part of a template. The Crawl Test report shows my Title Tag and Meta Description as blank for many threads. I have looked at the source code of several pages and they all have clean Title tags and I don't understand why the Crawl Report doesn't show them. Any ideas?
5. In some cases the HTTP Status Code field shows a result of "3". Why does that mean?
6. For every URL in the Crawl Report there is an entry in the Referrer field. What exactly is the relationship between these fields? I thought the Crawl Tool would inspect every page on the site. If a page doesn't have a referring page is it missed? What if a page has multiple referring pages? How is that information displayed?
7. Under Google Webmaster Tools > Site Configurations > Settings > Parameter Handling I have the options set as either "Ignore" or "Let Google Decide" for various URL parameters. These are "pages" of my site which should mostly be ignored. For example a forum may have 7 headers, each on of which can be sorted in ascending or descending order. The only page that matters is the initial page. All the rest should be ignored by Google and the Crawl.
Presently there are 11 records for many pages which really should only have one record due to these various sort parameters. Can I configure the crawl so it ignores parameter pages?
I am anxious to get started on my site. I dove into the crawl results and it's just too messy in it's present state for me to pull out any actionable data. Any guidance would be appreciated.
-
Good question. There are a few ways of doing it but I'd advise using a canonical URL on each page to tell the search engines where the content stems from. I had a quick look at XenoForo and this looks relatively simple to do... although make sure you test things thoroughly just in case
-
Thank you very much for the detailed reply.
For #1, I did start my campaign and I will follow up.
2. That worked perfect!
3. Thank you for the information.
4. I realize the problem. It appears the crawler differentiates on the slightest difference in a URL. There are many pages which it shows ending with a slash "/" but those pages are often linked to without an ending slash. The latter pages do not show their Titles nor Meta tags in the crawler report. I presume this is just a crawler issue and would not affect SEO performance.
5. I checked the cell formatting and it is "General" which should be fine. All of the rest of the HTTP Status codes appear normally. What I did notice is that all of the "3" codes refer to attachments. Most attachments show a "3" code, but a few show as 301s.
6. Good to know, thanks for sharing.
7. My main follow up question would be, is there any harm to setting up in robots.txt to disregard all parameter URLs? Basically I want to clean things up, and all of those URLs which are style or sorting variations aren't helpful to any crawler, and those pages shouldn't be indexed.
-
I can help with a few of those:
1. Looks like you're using the crawl tool. If this is for an on-going project, go to http://www.seomoz.org/campaigns and set one up. That way you get a sexy GUI (if you like robots that is) and weekly crawls / rank tracking.
2. That number is almost certainly a UNIX timestamp. To convert it inside excel use the formula below (don't forget to format the cell as a date, otherwise you just see a random number!):
=(A1/86400)+25569+(-5/24)
3. I wouldn't worry about that at all - the crawler converts any non-standard characters to ASCII but, as far as I know, it won't affect your SERP performance.
4. Could you give a few examples of the pages that are affected so I can take a look?
5. That's either a bug or (not too likely but worth checking) an issue with how the numbers are formatted in your spreadsheet. I'd advise opening the file using a text editor to check that the numbers that excel shows match up with the raw format and, if they do, submitting a bug report to the SEOMoz team.
6. The referrer cell tells you how the crawler got to that page. If you don't have any internal links to a page on your site then, chances are, the crawler won't find it. The only caveat to that (and I'm not 100% sure so would need confirmation) is that if the crawl tool uses external linking data. I'd always assumed it didn't but SEOMoz will know where some of your pages are even if you don't link to them internally as external sites will point to them. If that's the case it could be the reason that the referrer cell is blank.
7. Remember that this is SEOMoz crawling your site, not Google. Anything you set in Webmaster tools isn't visible by other search engine spiders such as those used by Bing, Yahoo!, SEOMoz, Majestic, etc. Because of that they won't know how to handle your URL parameters. You're best setting this through either a meta robots tag, robots.txt, or .htaccess (depending on what you're trying to do). Be careful though - if you mess it up there's a strong possibility that you'll end up blocking pages that you want the search engines to be able to access!
Hope that's all helpful... give me a shout if there's anything else.
- Matt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
GOOGLE ANALYTIC SKEWED DATA BECAUSE OF GHOST REFERRAL SPAM ND CRAWL BOTS
Hi Guys, We are having some major problems with our Google Analytics and MOz account. Due to the large number of ghost/referral spam and crawler bots we have added some heavy filtering to GA. This seems to be working protecting the data from all these problems but also filtering out much needed data that is not coming through. In example, we used to get a hundred visitors a day at the least and now we are down to under ten. ANYBODY PLEASE HELP. HAVE READ THROUGH MANY ARTICLES WITH NO FIND TO PERMANENT SOLID SOLUTION (even willing to go with paid service instead of GA) Thank You so Much, S.M.
Moz Pro | | KristyKK0 -
Seo moz has only crawled 2 pages of my site. Ive been notified of a 403 error and need an answer as to why my pages are not being crawled?
SEO Moz has only crawled 2 pages of my clients site. I have noticed the following. A 403 error message screaming frog also cannot crawl the site but IIS can. Due to the lack of crawling ability, im getting no feed back on my on page optimization rankings or crawl diagnostics summary, so my competitive analysis and optimization is suffering Anybody have any idea as to what needs to be done to rectify this issue as access to the coding or cms platform is out of my hands. Thank you
Moz Pro | | nitro-digital0 -
Q&A alert e-mails no longer contain links back to the question referenced in the e-mail
How does everyone feel about no longer having direct links back to questions we may have answered or asked in the e-mails? Until just recently when somebody commented on a question you had commented on easily you were able to click a link and see if it was maybe another question for you if you had not been clear enough the first time or if you needed to give more info whatever the need you had a link directly to The question referenced in the email. Now all I can do is click on the person's profile which unfortunately does not get me back to Q&A to reply if needed to any questions. I have to go through all the steps again and find it strange why you guys would remove the link going straight to the question being referenced in the email. PS I love you and look the at new e-mails they look great. This is not anything big, but it was a nice advantage, and I will happily give up for all the great stuff you guys are giving us now. However if it would not be a burden I would love to see you back in their how does everyone else feel? Sincerely, Thomas Von Zickell
Moz Pro | | BlueprintMarketing0 -
No Crawl data in dashboard
For the second straight week, I have had no crawl data in my dashboard. It seems like the crawler erased all my results in the pro dashboard. Is there a way to manually recrawl my site, since I will have to wait another week to see if it comes back to earth? Thanks
Moz Pro | | bedwards0 -
I am experimenting with Raven Tools and have a question
Since I am an in-house SEO I'm a little slower to adopt some of the paid tools out there, but I am thinking of jumping in and opting for the paid version of Raven Tools. Can anyone who's currently using Raven Tools suggest how and where is the best place to export backlinks info for import into the Link Manager? I want something that will give me as much of the data as possible already filled in (i.e. PR, domain authority, anchor text, follow/no follow information etc.). I don't mind if I have to rearrange it, I just don't want to have to be inputting things like PR and anchor text one at a time by hand. I do have SEOMoz pro and I see that I can pull a CSV of backlinks and do two separate reports, one for followed and one for not followed links, etc. I'm just wondering if anyone has any additional suggestions. How about Ahrefs.com? Is anyone using the paid version of that tool in conjunction with Raven Tools for Link Management? Is there a better tool out there for Link Management than Raven Tools? (I just found this link in a Q & A thread for an open source SEO panel to try...http://www.seopanel.in/ - anyone using that one?) Thanks all!
Moz Pro | | danatanseo0 -
Why won't scheduled crawl of my site begin?
I currently have a campaign running on SEOMoz for over a month. It has been showing that a crawl was scheduled to start on 12/21. Now it's 12/23 and there has not been a new crawl, and it still says scheduled for 12/21.. Anyone know why this is happening or how to fix it? Thanks
Moz Pro | | Prime850 -
Do crawl reports see canonical tags?
Greetings, I just redesigned my site, www.funderstanding.com, and have the old site pointing to the new site via canonical URLs. I had a new crawl test run and it showed a large amount of duplicate content. Does the SEO Moz crawl tool validate canonical urls and adjusts the duplicate content count or is this note considered? FYI, I sent from no duplicate content to having 865 errors since the redesign went up so that seems suspicious. I would think though that assuming the canonical tag were used properly, and I hope it is?, that this would not be a problem?? All help with this is most appreciated. Eric
Moz Pro | | Ericc220 -
Crawl Issues
My website - qtmoving.com - has 26 articles and when the SEOmoz did a crawl it only found 13 articles. Can someone please give me some insight as to why not all pages are being crawled.
Moz Pro | | CohesiveMarketing0