Initial Crawl Questions
-
Hello.
I just joined and used the Crawl tool. I have many questions and hoping the community can offer some guidance.
1. I received an Excel file with 3k+ records. Is there a friendly online viewer for the Crawl report? Or is the Excel file the only output?
2. Assuming the Excel file is the only output, the Time Crawled is a number (i.e. 1305798581). I have tried changing the field to a date/time format but that did not work. How can I view the field as a normal date/time such as May 15, 2011 14:02?
3. I use the symbol in my Title. This symbol appears in the output as a few ascii characters. Is that a concern? Should I remove the trademark symbol from my Title?
4. I am using XenForo forum software. All forum threads automatically receive a Title Tag and Meta Description as part of a template. The Crawl Test report shows my Title Tag and Meta Description as blank for many threads. I have looked at the source code of several pages and they all have clean Title tags and I don't understand why the Crawl Report doesn't show them. Any ideas?
5. In some cases the HTTP Status Code field shows a result of "3". Why does that mean?
6. For every URL in the Crawl Report there is an entry in the Referrer field. What exactly is the relationship between these fields? I thought the Crawl Tool would inspect every page on the site. If a page doesn't have a referring page is it missed? What if a page has multiple referring pages? How is that information displayed?
7. Under Google Webmaster Tools > Site Configurations > Settings > Parameter Handling I have the options set as either "Ignore" or "Let Google Decide" for various URL parameters. These are "pages" of my site which should mostly be ignored. For example a forum may have 7 headers, each on of which can be sorted in ascending or descending order. The only page that matters is the initial page. All the rest should be ignored by Google and the Crawl.
Presently there are 11 records for many pages which really should only have one record due to these various sort parameters. Can I configure the crawl so it ignores parameter pages?
I am anxious to get started on my site. I dove into the crawl results and it's just too messy in it's present state for me to pull out any actionable data. Any guidance would be appreciated.
-
Good question. There are a few ways of doing it but I'd advise using a canonical URL on each page to tell the search engines where the content stems from. I had a quick look at XenoForo and this looks relatively simple to do... although make sure you test things thoroughly just in case
-
Thank you very much for the detailed reply.
For #1, I did start my campaign and I will follow up.
2. That worked perfect!
3. Thank you for the information.
4. I realize the problem. It appears the crawler differentiates on the slightest difference in a URL. There are many pages which it shows ending with a slash "/" but those pages are often linked to without an ending slash. The latter pages do not show their Titles nor Meta tags in the crawler report. I presume this is just a crawler issue and would not affect SEO performance.
5. I checked the cell formatting and it is "General" which should be fine. All of the rest of the HTTP Status codes appear normally. What I did notice is that all of the "3" codes refer to attachments. Most attachments show a "3" code, but a few show as 301s.
6. Good to know, thanks for sharing.
7. My main follow up question would be, is there any harm to setting up in robots.txt to disregard all parameter URLs? Basically I want to clean things up, and all of those URLs which are style or sorting variations aren't helpful to any crawler, and those pages shouldn't be indexed.
-
I can help with a few of those:
1. Looks like you're using the crawl tool. If this is for an on-going project, go to http://www.seomoz.org/campaigns and set one up. That way you get a sexy GUI (if you like robots that is) and weekly crawls / rank tracking.
2. That number is almost certainly a UNIX timestamp. To convert it inside excel use the formula below (don't forget to format the cell as a date, otherwise you just see a random number!):
=(A1/86400)+25569+(-5/24)
3. I wouldn't worry about that at all - the crawler converts any non-standard characters to ASCII but, as far as I know, it won't affect your SERP performance.
4. Could you give a few examples of the pages that are affected so I can take a look?
5. That's either a bug or (not too likely but worth checking) an issue with how the numbers are formatted in your spreadsheet. I'd advise opening the file using a text editor to check that the numbers that excel shows match up with the raw format and, if they do, submitting a bug report to the SEOMoz team.
6. The referrer cell tells you how the crawler got to that page. If you don't have any internal links to a page on your site then, chances are, the crawler won't find it. The only caveat to that (and I'm not 100% sure so would need confirmation) is that if the crawl tool uses external linking data. I'd always assumed it didn't but SEOMoz will know where some of your pages are even if you don't link to them internally as external sites will point to them. If that's the case it could be the reason that the referrer cell is blank.
7. Remember that this is SEOMoz crawling your site, not Google. Anything you set in Webmaster tools isn't visible by other search engine spiders such as those used by Bing, Yahoo!, SEOMoz, Majestic, etc. Because of that they won't know how to handle your URL parameters. You're best setting this through either a meta robots tag, robots.txt, or .htaccess (depending on what you're trying to do). Be careful though - if you mess it up there's a strong possibility that you'll end up blocking pages that you want the search engines to be able to access!
Hope that's all helpful... give me a shout if there's anything else.
- Matt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz crawling doesn't show all of my Backlinks
Hello, I'm trying to make an SEO backlinks report on my website When using the Link Explorer, I see only a few backlinks while I have much more backlinks on this website. Anyone has an idea about how to fix this issue. How can I check and correct this? My website is www.signsny.com.
Moz Pro | | signsny1 -
Unable to get into top 20 even when pages are optimized and most crawl issues resolved
I have a few keyword phrases I've been trying to rank in the top 20 for (starting place). I have optimized for a few different phrases, ranging in keyword difficulty, but no matter what I do I can't seem to get in. In many cases, the exact same results show up for many different variations of the phrases I'd like to rank for. I've read about how google tries to match user intent and so if it decides those results are more relevant then it will always show them, but does that mean that no matter what I do I will always be behind them? The main question I have is: how should I proceed? Should I stop optimizing pages and focus on link acquisition? Or go through and make sure there isn't a single crawl issue? Or focus on optimizing for longer tail keyword phrases? It just feels like I've done so much of what the moz tools have recommended and I'm seeing very little movement over the past couple of months, if anything I see dips in performance after optimization. Thanks in advance!
Moz Pro | | Dynata_panel_marketing1 -
Crawl diagnostics up to date after Magento ecommerce site crawl?
Howdy Mozzers, I have a Magento ecommerce website and I was wondering if the data (errors/warnings) from the Crawl diagnostics are up to date. My Magento website has 2.439 errors, mainly 1.325 duplicate page content and 1.111 duplicate page title issues. I already implemented the Yoast meta data plugin that should fix these issues, however I still see there errors appearing in the crawl diagnostics, but when going to the mentioned URL in the crawl diagnostics for e.g.: http://domain.com/babyroom/productname.html?dir=desc&targetaudience=64&order=name and checking the source code and searching for 'canonical' I do see: http://domain.com/babyroom/productname.html" />. Even I checked the google serp for url: http://domain.com/babyroom/productname.html?dir=desc&targetaudience=64&order=name and I couldn't find the url indexed in Google. So it basically means the Yoast meta plugin actually worked. So what I was wondering is why I still see the error counted in the crawl diagnostics? My goal is to remove all the errors and bring it all to zero in the crawl diagnostics. And now I am still struggling with the "overly-dynamic URL" (1.025) and "too many on-page links" (9.000+) I want to measure whether I can bring the warnings down after implementing an AJAX-based layered navigation. But if it's not updating it here crawl diagnostics I have no idea how to measure the success of eliminating the warnings. Thanks for reading and hopefully you all can give me some feedback.
Moz Pro | | videomarketingboys0 -
The crawl report shows a lot of 404 errors
They are inactive products, and I can't find any active links to these product pages. How can I tell where the crawler found the links?
Moz Pro | | shopwcs0 -
Question #3) My last question has to do with Some SEOmoz crawl diagnostics -
I recently fixed (or well, I am asking to make sure that this was the right thing to do in my first question posted a few minutes ago), a problem where all of my internal main sidebar category pages were linking using https://, which to my knowledge means SECURE pages. anyways, OSE, and google seem to be not recognizing the link juice. but my rank fell for one of my main keywords by 2 positions about a week after i made the fix to have the pages be indexable. Making my pages properly linked can't be a bad thing right? That's what I said. So I looked deeper, and my crawl diagnostics reports showed a MASSIVE reduction in warnings (about 3,000 301 redirects were removed by changing the https:// to http:// because all the secure pages were re-directing to http:// regular structure) and an INCREASE, in Duplicate Page Titles, and Temporary redirects... Could that have been the reason the rank dropped? I think I am going to fix all the Duplicate Page Title problems tonight, but still, I am a little confused as to why such a major fix didn't help and appeared to hurt me. I feel like it hurt the rank, not because of what I did, but because what I did caused a few extra re-directs, and opened the doors for the search engine to discover more pages that had problems (which could have triggered an algo that says hey, these people have to much duplicate problems) Any thoughts will be GREATLY appreciated thumbed, thanked, and marked as best answers! Thanks in advance for your time, Tyler A.
Moz Pro | | TylerAbernethy0 -
Crawl Diagnostics - unexpected results
I received my first Crawl Diagnostics report last night on my dynamic ecommerce site. It showed errors on generated URLs which simply are not produced anywhere when running on my live site. Only when running on my local development server. It appears that the Crawler doesn't think that it's running on the live site. For example http://www.nordichouse.co.uk/candlestick-centrepiece-p-1140.html will go to a Product Not Found page, and therefore Duplicate Content errors are produced. Running http://www.nhlocal.co.uk/candlestick-centrepiece-p-1140.html produces the correct product page and not a Product Not Found page Any thoughts?
Moz Pro | | nordichouse0 -
SEOmoz showing crawl errors but webmastertools says no errors, need help!
Hi this is my first question and i couldnt find a similar question on here. basically i have a clients website that is showing 150 duplicate page titles and content errors plus others. SEOmoz analysis is showing me for example is 3 duplicate hompage URLS: 1.www.domain.com 2.domain.com 3.www.domain.com/index.html all 3 are the same page. after explaining to the guy (who built the website) the errors, he ensured me that the main URL is URl 1. and the other 2 are 301 redirects. however SEOmoz analysis doesnt seem to change the results and webmastertools doesnt seem to show any errors at all. also if i try all 3 URL's there are no redirects to URL 1. any help or clarity would be awesome! Thanks e-bob
Moz Pro | | bobsnowzell0 -
Unanswered Questions
Hey SEOMoz I have been spending a bit of time in the Q&A of late and have noticed lots of questions never get answered. I also had a question of my own that I left open for ages and I figured it maybe a good idea to add some kind of email nag or on screen notification for people to close old questions. There are some that have had plenty of answers but that either never get closed & really, it takes away from the fun of participating if things are never closed off & searching shows the same old tired, answerd but not closed questions lingering around like a bad smell. 🙂 Not really a question, but a little tweak suggestion. 🙂 Cheers
Moz Pro | | Marcus_Miller
Marcus0