Google Search Results...
-
I'm trying to download every google search results for my company site:company.com. The limit I can get is 100. I tried using seoquake but I can only get to 100.
The reason for this? I would like to see what are the pages indexed. www pages, and subdomain pages should only make up 7,000 but search results are 23,000. I would like to see what the others are in the 23,000.
Any advice how to go about this? I can individually check subdomains site:www.company.com and site:static.company.com, but I don't know all the subdomains.
Anyone cracked this? I tried using a scrapper tool but it was only able to retrieve 200.
-
I see. If you have some idea of what section of your site might be in there that you don't want, you can use site:company.com inurl:whatever to narrow it down. You should know the file or call for search and shop pages and can put that name after the inurl modifier.
-
The goal is to identify what pages are Google indexing and are there ones it shouldn't. (We don't index search pages, we don't index basket or checkout pages)
I do know know all of the subdomains and searching them individually isn't making up the total search count when I do site:company.com.
I don't have duplicate pages from my moz reports so it can't be that. If I was able to download a full google search result into a spreadsheet. I could quickly filter and see what pages are being indexed that shouldn't.
-
Ok, but what's your goal with this? And why don't you know your own subdomains that you've created? It seems like you could work backwards from a better starting point by applying those things.
-
My GA is only focused on a single domain, as subdomains hold just PDFs, images etc. Traffic reports from GA are focused on www.company.com pages.
The only way I can know exactly which URLS have been indexed, seems to be going through the google search results, but it caps after 7 pages
-
Hi Cyto. Why don't you try exporting pages receiving google/organic visits from Google Analytics using the Landing Page metric as a secondary dimension... It won't be all inclusive, but it will give you a good idea on what pages are indexed and drawing in visitors. You can then compare that data against your sitemaps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google frown on using 3 different page titles with same content to secure the top 3 results in SERPs?
Is it frowned upon by Google to create 3 different pages with the sames content yet different titles to secure the top three results in SERPs? For example: Luxury Care Homes in Liverpool Care Homes in Liverpool Private Care Homes in Liverpool The page titles are different with slightly different meta data but the user content is exactly the same, would this be considered a cheeky win or negative to rankings?
Intermediate & Advanced SEO | | TrustedCare.co.uk1 -
Google Search Console Site Property Questions
I have a few questions regarding Google Search Console. Google Search Console tells you to add all versions of your website https, http, www, and non-www. 1.) Do I than add ALL the information for ALL versions? Sitemaps, preferred site, etc.? 2.) If yes, when I add sitemaps to each version, do I add the sitemap url of the site version I'm on or my preferred version? - For instance when adding a sitemap to a non-www version of the site, do I use the non-www version of the sitemap? Or since I prefer a https://www.domain.com/sitemap.xml do I use it there? 3.) When adding my preferred site (www or non-www) do I use my preferred site on all site versions? (https, http, www, and non-www) Thanks in advance. Answers vary throughout Google!
Intermediate & Advanced SEO | | Mike.Bean0 -
Google Search Console "Change of Address" - Failed Redirection Test
I have a client who has a lot of domain variations, which have all been set up in Google Search Console. I requested that the client use the COA feature in GSC for the domains that are now redirecting to other domains that they own (which are set up in GSC). The problem is that we're not redirecting the homepages to the homepages of the destination domains. So, GSC is giving us this error message: fails redirection test: The old site redirects to www.domain.com/blog, which does not correspond to the new site you chose. Is our only way to use GSC COA for these domains to change the homepage redirect to go to the homepage of the destination domain? We don't really want that since the domain we're redirecting is a "blog.domain1.com" subdomain and we want to redirect it to "domain2.com/blog". Any help appreciated! Thanks,
Intermediate & Advanced SEO | | kernmedia
Dan0 -
Making Filtered Search Results Pages Crawlable on an eCommerce Site
Hi Moz Community! Most of the category & sub-category pages on one of our client's ecommerce site are actually filtered internal search results pages. They can configure their CMS for these filtered cat/sub-cat pages to have unique meta titles & meta descriptions, but currently they can't apply custom H1s, URLs or breadcrumbs to filtered pages. We're debating whether 2 out of 5 areas for keyword optimization is enough for Google to crawl these pages and rank them for the keywords they are being optimized for, or if we really need three or more areas covered on these pages as well to make them truly crawlable (i.e. custom H1s, URLs and/or breadcrumbs)…what do you think? Thank you for your time & support, community!
Intermediate & Advanced SEO | | accpar0 -
Some site's links look different on google search. For example Games.com › Flash games › Decoration games How can we do our url's like this?
For example Games.com › Flash games › Decoration games How can we do our url's like this?
Intermediate & Advanced SEO | | lutfigunduz0 -
Duplicate site (disaster recovery) being crawled and creating two indexed search results
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain. Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm. There seem to be two potential fixes. Which is best for this case? use the robots.txt to block Google from crawling the .gtm site 2) canonicalize the the gtm urls to toptable.co.uk In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best. Thanks in advance to the SEOmoz community!
Intermediate & Advanced SEO | | OpenTable0 -
I have search result pages that are completely different showing up as duplicate content.
I have numerous instances of this same issue in our Crawl Report. We have pages showing up on the report as duplicate content - they are product search result pages for completely different cruise products showing up as duplicate content. Here's an example of 2 pages that appear as duplicate : http://www.shopforcruises.com/carnival+cruise+lines/carnival+glory/2013-09-01/2013-09-30 http://www.shopforcruises.com/royal+caribbean+international/liberty+of+the+seas We've used Html 5 semantic markup to properly identify our Navigation <nav>, our search widget as an <aside>(it has a large amount of page code associated with it). We're using different meta descriptions, different title tags, even microformatting is done on these pages so our rich data shows up in google search. (rich snippet example - http://www.google.com/#hl=en&output=search&sclient=psy-ab&q=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&oq=http:%2F%2Fwww.shopforcruises.com%2Froyal%2Bcaribbean%2Binternational%2Fliberty%2Bof%2Bthe%2Bseas&gs_l=hp.3...1102.1102.0.1601.1.1.0.0.0.0.142.142.0j1.1.0...0.0...1c.1.7.psy-ab.gvI6vhnx8fk&pbx=1&bav=on.2,or.r_qf.&bvm=bv.44442042,d.eWU&fp=a03ba540ff93b9f5&biw=1680&bih=925 ) How is this distinctly different content showing as duplicate? Is SeoMoz's site crawl flawed (or just limited) and it's not understanding that my pages are not dupe? Copyscape does not identify these pages as dupe. Should we take these crawl results more seriously than copyscape? What action do you suggest we take? </aside> </nav>
Intermediate & Advanced SEO | | JMFieldMarketing0 -
How to enable crawling for dynamic generated search result pages?
I want to enable crawling facility for dynamic generated search result pages which are generating by Magento Solr search. You can view more about it by following URLs. http://code.google.com/p/magento-solr/ http://www.vistastores.com/catalogsearch/result/?q=bamboo+table+lamp
Intermediate & Advanced SEO | | CommercePundit
http://www.vistastores.com/catalogsearch/result/?q=ceramic+table+lamp
http://www.vistastores.com/catalogsearch/result/?q=green+patio+umbrella Right now, Google is not crawling search result page because, I have added following syntax to Robots.txt file. Disallow: /*?q= So, How do I enable crawling of search result pages with best SEO practice? If any other inputs in same direction so, it will help me more to get it done.0