Crawl questions
-
My first website crawl indicating many issues. I corrected the issues, requested another crawl and received the results. After viewing the excel file I have some questions.
1. There are many pages with missing Titles and Meta Descriptions in the Excel file. An example is http://www.terapvp.com/threads/help-us-decide-on-terapvp-com-logo.25/page-2
That page clearly has a meta description and title. It is a forum thread. My forum software does a solid job of always providing those tags. Why would my crawl report not show this information? This occurs on numerous pages.
2. I believe all my canonical URLs are properly set. My crawl report has 3k+ records, largely due to there being 10 records for many pages. These extra records are various sort orders and style differences for the same page i.e. ?direction=asc.
My need for a crawl report is to provide actionable data so I can easily make SEO improvements to my site where necessary. These extra records don't provide any benefit.
IF the crawl report determined there was not a clear canonical URL, then I could understand. But that is not the case. An example is http://www.terapvp.com/forums/news/ If you look at the source you will clearly see
Where is the benefit to including the 10 other records in the Crawl report which show this same page in various sort orders? Am I missing anything?
3. My robots.txt appropriately blocks many pages that I do not wish to be crawled. What is the benefit to including these many pages in the crawl report?
Perhaps I am over analyzing this report. I have read many articles on SEO, but now that I have found SEOmoz, I can see I will need to "unlearn what I have learned". Many things such as setting meta keyword tags are clearly not helpful. I wish to focus my energy and I was looking to the crawl report as my starting point. Either I am missing something, or the report design needs improvement.
-
Those are commends added to the code. My site has is part of the effort to rid the internet of IE6 browsers. You can read more about it http://www.theie6countdown.com/join-us.html. There is a simple script placed in the site which detects IE 6 & 7 browsers and asks users to update their software.
TYNT is a SEO tool http://www.tynt.com/. It allows webmasters to track any information which is copied and pasted from their site. It creates links back to the site, and tracks all activity on those links.
Both of those scripts are working normally and neither should negatively impact crawls. My Google WMT show my site being crawled normally, no errors. I have multiple pages ranked #1. With that said, there are plenty of opportunities for me to improve, which is why I am here.
The position of the Title and meta tags within the head should not be a factor at all. I did hear Matt Cutts share once that webmasters could move the information to the top to help if there are other issues such as a page where the tag is not properly closed, but that is not really relevant in this case.
-
Your source code looks different to me, if not .... abnormal. What's with doing above the description anyway. I don't know if that code is blocking the crawler from that code to . I'm no code monkey, but all my metas are right next to each other. Yours are completely separated from the title. Take out that code and see if it happens again. OR better yet, go to GWT and FETCH AS GOOGLEBOT. For sure that will tell you what Google bot is seeing.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question on Indexing, Hreflang tag, Canonical
Dear All, Have a question. We've a client (pharma), who has a prescription medicine approved only in the US, and has only one global site at .com which is accessed by all their target audience all over the world.
Intermediate & Advanced SEO | | jrohwer
For the rest of the US, we can create a replica of the home page (which actually features that drug), minus the existence of the medicine, and set IP filter so that non-US traffic see the duplicate of the home page. Question is, how best to tackle this semi-duplicate page. Possibly no-index won't do because that will block the site from the non-US geography. Hreflang won't work here possibly, because we are not dealing different languages, we are dealing same language (En) but different Geographies. Canonical might be the best way to go? Wanted to have an insight from the experts. Thanks,
Suparno (for Jeff)1 -
Google Adsbot crawling order confirmation pages?
Hi, We have had roughly 1000+ requests per 24 hours from Google-adsbot to our confirmation pages. This generates an error as the confirmation page cannot be viewed after closing or by anyone who didn't complete the order. How is google-adsbot finding pages to crawl that are not linked to anywhere on the site, in the sitemap or linked to anywhere else? Is there any harm in a google crawler receiving a higher percentage of errors - even though the pages are not supposed to be requested. Is there anything we can do to prevent the errors for the benefit of our network team and what are the possible risks of any measures we can take? This bot seems to be for evaluating the quality of landing pages used in for Adwords so why is it trying to access confirmation pages when they have not been set for any of our adverts? We included "Disallow: /confirmation" in the robots.txt but it has continued to request these pages, generating a 403 page and an error in the log files so it seems Adsbot doesn't follow robots.txt. Thanks in advance for any help, Sam
Intermediate & Advanced SEO | | seoeuroflorist0 -
Prevent Google from crawling Ajax
With Google figuring out how to make Ajax and JS more searchable/indexable, I am curious on thoughts or techniques to prevent this. Here's my Situation, we have a page that we do not ever want to be indexed/crawled or other. Currently we have the nofollow/noindex command, but due to technical changes for our site the method in which this information is being implemented if it is ever displayed it will not have the ability to block the content from search. It is also the decision of the business to not list the file in robots.txt due to the sensitivity of the content. Basically, this content doesn't exist unless something super important happens, and even if something super important happens, we do not want Google to know of its existence. Since the Dev team is planning on using Ajax/JS to pull in this content if the business turns it on, the concern is that it will be on the homepage and Google could index it. So the questions that I was asked; if Google can/does index, how long would that piece of content potentially appear in the SERPs? Can we block Google from caring about and indexing this section of content on the homepage? Sorry for the vagueness of this question, it's very sensitive in nature and I am trying to avoid too many specifics. I am able to discuss this in a more private way if necessary. Thanks!
Intermediate & Advanced SEO | | Shawn_Huber0 -
Will a disclaimer affect Crawling?
Hello everyone! My German users will have to get a disclaimer according to German laws, now my question is the following: Will a disclaimer affect crawling? What's the best practice to have regarding this? Should I have special care in this? What's the best disclaimer technique? A Plain HTML page? Something overlapping the site? Thank you all!
Intermediate & Advanced SEO | | NelsonF0 -
Is it safe to not have a sitemap if Google is already crawling my site every 5-10 min?
I work on a large news site that is constantly being crawled by Google. Googlebot is hitting the homepage every 5-10 minutes. We are in the process of moving to a new CMS which has left our sitemap nonfunctional. Since we are getting crawled so often, I've met resistance from an overwhelmed development team that does not see creating sitemaps as a priority. My question is, are they right? What are some reasons that I can give to support my claim that creating an xml sitemap will improve crawl efficiency and indexing if we are already having new stories appear in Google SERPs within 10-15 minutes of publication? Is there a way to quantify what the difference would be if we added a sitemap?
Intermediate & Advanced SEO | | BostonWright0 -
Social Buttons Help SEO, 2 Questions...
Howdy Guys, I noticed a weird thing over the weekend - our main keyword has been hit pretty hard by penguin and we had dropped down to #79. On Friday I decided to change some on-page optimisation and changed the title tag and some tags. When I've ran my rank tracker this morning we have jumped up to #62... Has anyone else noticed just a simple change boosts rankings? Second Questions We took all our social buttons off the website back in January as no-body was using them but from a few recent reports I've seen having the buttons on the site help organic rankings... Is this true? Scott
Intermediate & Advanced SEO | | ScottBaxterWW0 -
How to stop Google crawling after 301 redirect?
I have removed all pages from my old website and set 301 redirect to new website. But, I have verified old website with Google webmaster tools' HTML verification file which enable me to track all data and existence of pages in Google search for my old website. I was assumed that, Google will stop crawling and DE-indexed all pages after 301 redirect. Because, I have set 301 redirect before 3 months. Now, I'm able to see Google bot activity on my website with help of Google webmaster tools. You can find out attachment to know more about it. How can it possible & How Google can crawl removed pages? You can see following image to know more about it. First & Second
Intermediate & Advanced SEO | | CommercePundit0 -
Sitemap.xml Question
I am pretty new to SEO and I have been creating new pages for our website for niche terms. Should I include ALL pages on our website in the sitemap.xml or should I only have our "main" pages listed on the sitemap.xml file? Thanks
Intermediate & Advanced SEO | | threebiz0