Download all GSC crawl errors: Possible today?
-
Hey guys:
I tried to download all the crawl data from Google Search Console using the API and solutions like this one: https://github.com/eyecatchup/php-webmaster-tools-downloads but seems that is not longer working (or I made something wrong, I just receive a blank page when running the PHP file after some load time)... I needed to download more than 1.000 URLs long time ago, so I didn't tried to use this method since then.
Is there any other solution using the API to grab all the crawl errors, or today this is not possible anymore?
Thanks!
-
Hi Antonio,
Not sure which language you prefer - but you can find some sample codes here: https://developers.google.com/webmaster-tools/v3/samples - I tried the python example which was quite well documented inside the code, I guess it's the same for the other languages. If I have some time I could give it a try - but it won't be before the end of next week (and based on python)
Dirk
-
Thanks Dirk. At the moment I couldn't find any alternative, so maybe will be a good idea put some hands on this.
If any other person solved this, would be great if can share it with us the solution -
The script worked for the previous version of the API - it won't work on the current version.
You try to search to check if somebody else has created the same thing for the new API - or build something your self - the API is quite well documented so it shouldn't be to difficult to do. I build a Python script for the Search Analytics part in less than a day (without previous knowledge of Python) so it's certainly feasible.rgds
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Test Question
Good Morning, I am just looking for a little bit of advice, I ran a crawl report on our website www.swiftcomm.co.uk. I have resolved most of the issues myself, however I have two questions;- Screenshot image http://imgur.com/VlFEiZ2 Highlighted blue, we have two homepages www.swiftcomm.co.uk and www.swiftcomm.co.uk/ both are set with a Rel-Canonical Target of www.swiftcomm.co.uk/. Will this cause me any SEO issues and or other potential issue? If this may cause an issue how would I go about resolving? Highlighted yellow, Our contact and referral-form are showing as duplicate title and meta description. Both of these pages have separate title and meta desc which it does seem to be detecting. If I search the page in google it returns the correct title and meta desc. The only common denominator behind these pages is that both have php pages behind them for the contact form. Do you think that the moz crawl may be detecting the php page over the html? Could this be cause any issues when search engines crawl the site? Kind Regards Jonathan Mack VlFEiZ2
Intermediate & Advanced SEO | | JMack9860 -
Duplicate page content errors for Web App Login
Hi There I have 6 duplicate content errors, but they are for the WebApp login from our website. I have put a Noindex on the Sitemap to stop google from indexing them to see if that would work. But it didn't. These links as far as I can see are not even on the website www.skemaz.net, but are links beyond the website and on the Web App itself eg : <colgroup><col width="529"></colgroup>
Intermediate & Advanced SEO | | Skemazer
| http://login.skemaz.net |
| http://login.skemaz.net/LogIn?ReturnUrl=%2Fchangepassword |
| http://login.skemaz.net/Login |
| http://login.skemaz.net/LogIn?ReturnUrl=%2FHome | Any suggestions would be greatly appreciated. Kind regards Sarah0 -
URL Errors for SmartPhone in Google Search Console/Webmaster Tools
Howdy all, In recent weeks I have seen a steady increase in the number of smartphone related url errors on Googles Search Console (formerly webmaster tools). THe crawler appears to be searching for a /m/ or /mobile/ directory within the URLs. Why is it doing this? Any insight would be greatly appreciated. Unfortunately this is for an unresponsive site, would setting the viewport help stop the issue for know until my new responsive site is launched shortly. Cheers fello Mozzers 🙂 Tim NDh1RNs
Intermediate & Advanced SEO | | TimHolmes1 -
URL Capitalization Inconsistencies Registering Duplicate Content Crawl Errors
Hello, I have a very large website that has a good amount of "Duplicate Content" issues according to MOZ. In reality though, it is not a problem with duplicate content, but rather a problem with URLs. For example: http://acme.com/product/features and http://acme.com/Product/Features both land on the same page, but MOZ is seeing them as separate pages, therefor assuming they are duplicates. We have recently implemented a solution to automatically de-captialize all characters in the URL, so when you type acme.com/Products, the URL will automatically change to acme.com/products – but MOZ continues to flag multiple "Duplicate Content" issues. I noticed that many of the links on the website still have the uppercase letters in the URL even though when clicked, the URL changes to all lower case. Could this be causing the issue? What is the best way to remove the "Duplicate Content" issues that are not actually duplicate content?
Intermediate & Advanced SEO | | Scratch_MM0 -
Crawl diagnostic how important is these 2 types of errors and what to do?
Hi,
Intermediate & Advanced SEO | | nicolaj1977
I am trying to SEO optimized my webpage dreamesatehuahin.com When I saw SEO Moz webpage crawl diagnostic I kind of got a big surprise due to the high no. of errors. I don’t know if this is the kind of errors that need to be taken very serious i my paticular case, When I am looking at the details I can see the errors are cause by the way my wordpress theme is put together. I don’t know how to resolve this. But If important I might hire a programmer. DUPLICATE ERRORS (40 ISSUES HIGH PRIORITY ACCORDING TO MOZ)
They are all the same as this one.
http://www.dreamestatehuahin.com/property-feature/restaurent/page/2/
is eaqual to this one
http://www.dreamestatehuahin.com/property-feature/restaurent/page/2/?view=list This one exsist
http://www.dreamestatehuahin.com/property-feature/car-park/
while a level down don’t exsit
http://www.dreamestatehuahin.com/property-feature/ DUPLICATE PAGE TITLE (806 ISSUES MEDIUM PRIORITY ACCORDING TO MOZ)
This is related to search results and pagination.
Etc. Title for each of these pages is the same
http://www.dreamestatehuahin.com/property-search/page/1 http://www.dreamestatehuahin.com/property-search/page/2 http://www.dreamestatehuahin.com/property-search/page/3 http://www.dreamestatehuahin.com/property-search/page/4 Title element is to long (405)
http://www.dreamestatehuahin.com/property-feature/fitness/?view=list
this is not what I consider real pages but maybe its actually is a page for google. The title from souce code is auto generated and in this case it not makes sense
<title>Fitness Archives - Dream Estate Hua Hin | Property For Sale And RentDream Estate Hua Hin | Property For Sale And Rent</title> I know at the moment there are properly more important things for our website like content, title, meta descriptions, intern and extern links and are looking into this and taking the whole optimization seriously. Have for instance just hired a content writer rewrite and create new content based on keywords research. I WOULD REALLY APPRICIATE SOME EXPERIENCE PEOPLE FEEDBACK ON HOW IMPORTANT IS IT THAT I FIX THIS ISSUES IF AT ALL POSSIBLE? best regards, Nicolaj1 -
How can Google index a page that it can't crawl completely?
I recently posted a question regarding a product page that appeared to have no content. [http://www.seomoz.org/q/why-is-ose-showing-now-data-for-this-url] What puzzles me is that this page got indexed anyway. Was it indexed based on Google knowing that there was once content on the page? Was it indexed based on the trust level of our root domain? What are your thoughts? I'm asking not only because I don't know the answer, but because I know the argument is going to be made that if Google indexed the page then it must have been crawlable...therefore we didn't really have a crawlability problem. Why Google index a page it can't crawl?
Intermediate & Advanced SEO | | danatanseo0 -
How much content on PDF download page
Hello, This is about content for an ecommerce site. We have an article page that we also created a PDF out of. We have an HTML page that doesn't have anything commercial on it that is the download page for the PDF page. How much of the article do you recommend we put on the non-commercial HTML download page? Should we put most of the article on there? We're trying to get people to link to the HTML Download page, not the PDF.
Intermediate & Advanced SEO | | BobGW0 -
Is a 301 Direct with a canonical tag Possible ?
Hi All, Quick question , Are we correct in thinking that for any given URL it's not possible to do a 301 redirect AND a canonical tag? thanks Sarah
Intermediate & Advanced SEO | | SarahCollins0