Working out exactly how Google is crawling my site if I have loooots of pages
-
I am trying to work out exactly how Google is crawling my site including entry points and its path from there. The site has millions of pages and hundreds of thousands indexed. I have simple log files with a time stamp and URL that google bot was on. Unfortunately there are hundreds of thousands of entries even for one day and as it is a massive site I am finding it hard to work out the spiders paths. Is there any way using the log files and excel or other tools to work this out simply? Also I was expecting the bot to almost instantaneously go through each level eg. main page--> category page ---> subcategory page (expecting same time stamp) but this does not appear to be the case. Does the bot follow a path right through to the deepest level it can/allowed to for that crawl and then returns to the higher level category pages at a later time? Any help would be appreciated
Cheers
-
Can you explain to me how you did your site map for this please?
-
I've run into the same issue for a site with 40 k + pages - far from your overall page # but still .. maybe it's the same flow overall.
The site I was working on had a structure of about 5 level deep. Some of the areas within the last level were out of reach and they didn't get indexed. More then that even a few areas on level 2 were not present in the google index and the google boot didn't visit those either.
I've created a large xml site map and a dynamic html sitemap with all the pages from the site and submit it via webmaster tool (the xml sitemap that is) but that didn't solve the issue and the same areas were out of the index and didn't got hit. Anyway the huge html sitemap was impossible to follow from a user point of view so I didn't keep that online for long but I am sure it can't work that way either.
What i did that finally solved the issue was to spot the exact areas that were left out, identify the "head" of those pages - that means several pages that acted as gateway for the entire module and I've build a few outside links that pointed to those pages directly and a few that were pointed to main internal pages of those modules that were left out.
Those pages gain authority fast and only in a few days we've spotted the google boot staying over night
All pages are now indexed and even ranking well.
If you can spot some entry pages that can conduct the spider to the rest you can try this approach - it should work for you too.
As far as links I've started with social network links, a few posts with links within the site blog (so that means internal links) and only a couple of outside links - articles with content links for those pages. Overall I think we are talking about 20-25 social network links (twitter, facebook, digg, stumble and delic), about 10 blog posts published in a 2-3 days span and about 10 articles in outside sources.
Since you have a much larger # as far as pages you probably will need more gateways and that means more links - but overall it's not a very time consuming session and it can solve your issue... hopefully
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Multi-Location SEO: Sites vs Pages
I just started with a new company that requires multi-location SEO for its niche product/service. Currently, we have a main corporate website, as well as, 40+ individual dealer websites (we host all). Keep in mind each of these dealers consist of only 1-2 people, so corporate I will be managing the site or sites and content strategy. Many of the individual dealer sites actually rank very well (#1-#3) in their areas for our targeted keywords, but they all use the same duplicate content. Also, there are many dealer sites that have dropped off the radar in last year, which is probably because of the duplicate and static content. So I'm at a crossroads... Attempt to redo all of these location sites with unique and local content for each or Create optimized unique pages for each of them on our main site and redirect their current local domains to their page on our site Any advise regarding which direction to go in and why. Why is very important. It will be very difficult to convince a dealer that is #1 with his local site that we are redirecting to our main site, so I need some good ammo and reasoning. Also, any tips toward achieving local seo success will be greatly appreciated, too! Thank you!
Intermediate & Advanced SEO | | the-coopersmith0 -
Street Address Not Appearing on Business Google+ Page
I run a local business in New York City, a commercial real estate brokerage. My firm has both a web site and Google+ accounts, one Google+ account for me personally and a Google+ account for my business. Under address my Google+ account is showing New York, NY. It is not showing a street address. Similiarly when my business name is entered in the Google search bar, my web site is the first result, but under address (directly to the right of a black dot with a grey circle around it) "New York, NY" with the phone number beneath it appears. No sign of my street address. My business is registered under Google Places and we have entered the correct street address. Any ideas on how I can get Google to display our street address? This is obviously very, very detrimental for local SEO. Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
HTML5 one page website on-site SEO
Hey guys, If for example, I'm faced with a client who has a website similar to: http://www.symphonyonline.co.uk/ How should I proceed with the on-site optimization? Should I create new pages on the website? Should I create a blog for the site to increase my reach? Please give me your tips on how to proceed with this kind of website. Thanks.
Intermediate & Advanced SEO | | BruLee0 -
How to resubmit a Web 2.0 site to Google?
I have 3 web 2.0 sites that look like theyve been hit by a penalty. I have checked their backlinks and there are a lot of backlinks from sites that have been deindexed. I have requested the removal of lots of the links, but now I need to resubmit the site to Google. Is this even possible with them being a web 2.0 site? I don't have webmaster tools for the site so how would I do this?
Intermediate & Advanced SEO | | JohnPeters0 -
Google giving me only partial site links?
Hi Guys, My site is #1 ranked for the term "waiting till marriage," but Google only gives me partial site links. See "Forums - Articles - Questions - Videos" links in attached screenshot. How do I get the full, page-dominating, mini-description-having site links? Any suggestions? Note: I've got a ton of content and decent traffic, but I haven't put much time into developing back links yet. I'm a php developer, but I'm new to professional-level SEO. Any help would be hugely appreciated. Also, sorry about the inflammatory nature of the site. It's not a preachy site; it's just a support group. Hope it doesn't offend. partial-sitelinks.png
Intermediate & Advanced SEO | | MikeAM270 -
How to see which site Google views as a scraper site?
If we have content on our site that is found on another site, what is the best way to know which site Google views as the original source? If you search for a line of the content such as "xyz abc etc" and the other site shows before yours in search results, does that mean that Google views that site as the original source?
Intermediate & Advanced SEO | | nicole.healthline0 -
One page wordpress site - what are the steps for SEO
Hello, I am launching 5 sites with keyword exact domains. I am developing the sites on wordpress as one page sales funnel sites. What do I need to do to optimize my sites? Really appreciate any bullet points or directions. Tks
Intermediate & Advanced SEO | | brianmaher0 -
How Fast Is Too Fast to Increase Page Volume of Your Site
I am working on a project that is basically a site to list apartment for rent (similar to apartments.com or rent.com). We want to add a bunch of amenity pages, price pages, etc. Basically increasing the page count on the site and helping users be able to have more pages relevant to their searches and long tail phrases. So an example page would be Denver apartments with a pool would be one page and Seattle apartments under 900 would be another page etc. By doing this we will take the site from about 14,000 pages or so to over 2 million by the time we add a list of amenities to every city in the US. My question is should I worry about time release on them? Meaning do you think we would get penalized for launching that many pages overnight or over the course of a week? How fast is too fast to increase the content on your site? The site about a year old and we are not trying to scam anything just looking to site functionality and page volume. Any advice?
Intermediate & Advanced SEO | | ioV0