Setting A Custom User Agent in Screaming Frog
-
Hi all,
Probably a dumb question, but I wanted to make sure I get this right.
How do we set a custom user agent in Screaming Frog? I know its in the configuration settings, but what do I have to do to create a custom user agent specifically for a website?
Thanks much!
- Malika
-
Setting a custom user agent determines things like HTTP/2 so there can be a big difference if you change it to something that might not take advantage of something like HTTP/2
Apparently, it is coming to Pingdom very soon just like it is to Googlebot
http://royal.pingdom.com/2015/06/11/http2-new-protocol/
This Is an excellent example of a user agent's ability to modify the way your site is crawled as well as how efficient it is.
https://www.keycdn.com/blog/https-performance-overhead/
It is important to note that we didn’t use Pingdom in any of our tests because they use Chrome 39, which doesn’t support the new HTTP/2 protocol. HTTP/2 in Chrome isn’t supported until Chrome 43. You can tell this by looking at the
User-Agent
in the request headers of your test results.Pingdom user-agent
Note: WebPageTest uses Chrome 47 which does support HTTP/2.
Hope that clears things up,
Tom
-
Hi Malika,
Think about screaming frog and what it has to detect in order to do that correctly it needs the correct user agent syntax for it will not be able to make a crawl that would satisfy people.
Using a proper syntax for a user agent is essential and I have tried to be non-technical in this explanation I hope it works.
the reason screaming frog needs the user agent because the user-agent was added to HTTP to help web application developers deliver a better user experience. By respecting the syntax and semantics of the header, we make it easier and faster for header parsers to extract useful information from the headers that we can then act on.
Browser vendors are motivated to make web sites work no matter what specification violations are made. When the developers building web applications don’t care about following the rules, the browser vendors work to accommodate that. It is only by us application developers developing a healthy respect
When the developers building web applications don’t care about following the rules, the browser vendors work to accommodate that. It is only by us application developers developing a healthy respect
It is only by us application developers developing a healthy respect for the standards of the web, that the browser vendors will be able to start tightening up their codebase knowing that they don’t need to account for non-conformances.
For client libraries that do not enforce the syntax rules, you run the risk of using invalid characters that many server side frameworks will not detect. It is possible that only certain users, in particular, environments would identify the syntax violation. This can lead to difficult to track down bugs.
I hope this is a good explanation I've tried to keep it very to the point.
Respectfully,
Thomas
-
Hi Thomas,
would you have a simpler tutorial for me to understand? I am struggling a bit.
Thanks heaps in advance
-
I think I want something that is dumbed down to my level for me to understand. The above tutorials are great but not being a full time coder, I get lost while reading those.
-
Hi Matt,
I havent had a luck with this one yet.
-
Hi Malika! How'd it go? Did everything work out?
-
happy I could be of help let me know if there's any issue and I will try to be of help with it. All the best
-
Hi Thomas,
That's a lot of useful information there. I will have a go on it and let you know how it went.
Thanks heaps!
-
please let me know if I did not answer the question or you have any other questions
-
this gives you a very clear breakdown of user agents and their set of syntax rules. The following is valid example of user-agent that is full of special characters,
read this please http://www.bizcoder.com/the-much-maligned-user-agent-header
user-agent: foo&bar-product!/1.0a$*+ (a;comment,full=of/delimiters
references but you want to pay attention to the first URL
https://developer.mozilla.org/en-US/docs/Web/HTTP/Gecko_user_agent_string_reference
| Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 |
http://stackoverflow.com/questions/15069533/http-request-header-useragent-variable
-
if you formatted it correctly see below
User-Agent = product *( RWS ( product / comment ) )
and it was received by your headers yes you could fill in the blanks and test it.
https://mobiforge.com/research-analysis/webviews-and-user-agent-strings
http://mobiforge.com/news-comment/standards-and-browser-compatibility
-
No, you Cannot just put anything in there. The site has to recognize it and ask why you are doing this?
I have listed how to build and already built in addition to what your browser will create by using useragentstring.com
Must be formatted correctly and have it work with a header it is not as easy as it sometimes seems but not that hard either.
You can make & use this to make your own from your Mac or PC
http://www.useragentstring.com/
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2747.0 Safari/537.36
how to build a user agent
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Gecko_user_agent_string_reference
- https://developer.mozilla.org/en-US/docs/Setting_HTTP_request_headers
- https://msdn.microsoft.com/en-us/library/ms537503(VS.85).aspx
Lists of user agents
https://support.google.com/webmasters/answer/1061943?hl=en
https://msdn.microsoft.com/en-us/library/ms537503(v=vs.85).aspx
-
Hi Thomas,
Thanks for responding, much appreciated!
Does that mean, if I type in something like -
HTTP request user agent -
Crawler access V2
&
Robots user agent
Crawler access V2
This will work too?
-
To crawl using a different user agent, select ‘User Agent’ in the ‘Configuration’ menu, then select a search bot from the drop-down or type in your desired user agent strings.
http://i.imgur.com/qPbmxnk.png
&
Video http://cl.ly/gH7p/Screen Recording 2016-05-25 at 08.27 PM.mov
Or
Also see
http://www.seerinteractive.com/blog/screaming-frog-guide/
https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#user-agent
https://www.screamingfrog.co.uk/seo-spider/user-guide/
https://www.screamingfrog.co.uk/seo-spider/faq/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Matching user intent in my blog
Hello, I am planning on doing a blog on travel bike basic. I noticed that the google keyword tool gives me things like How to plan a bike route Can I bike during pregnancy etc... In order to compete on that keyword do I need to answer those questions or can I answer different ones and still rank such as : The tool kit that are recommend. Whether you should take an insurance or not, if so which one. Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
How should I handle conflict between custom taxonomy and categories in a directory site?
Hello! I posted about this a week ago but haven't solidly figured it out yet. I'm building a website that is a directory of local therapists. I have categories for my blog and custom taxonomy to classify therapists. My problem is that my categories and my custom taxonomy overlap by necessity. For example I have the category "anxiety therapy" and the custom taxonomy "anxiety". Will this confuse google?...Do you think google will be able to figure out the differences between my blog archives and my therapist listing archives?...even though their names are similar and in a couple of cases the same? should I noindex my categories because the point of my site is to get customers to the directory....not the blog?.....even though the blog has lots of useful content? I should note here that I have my custom taxonomy pages set up so that they will display the 6 most recent blog posts in the corresponding category at the bottom of the page....so maybe that makes noindexing the categories more ok? Thank you for your help!
Intermediate & Advanced SEO | | angelamaemae0 -
Best Format to Index a Large Data Set
Hello Moz, I've been working on a piece of content that has 2 large data sets I have organized into a table that I would like indexed and want to know the best way to code the data for search engines while still providing a good visual experience for users. I actually created the piece 3 times and am deciding on which format to go with and I would love your professional opinions. 1. HTML5 - all the data is coded using tags and contains all the data on page in the . This is the most straight forward method and I know this will get indexed; however, it is also the ugliest looking table and least functional. 2. Java - I used google charts and loaded all the data into a
Intermediate & Advanced SEO | | jwalker880 -
Page loads fine for users but returns a 404 for Google & Moz
I have an e-commerce website that is built using Wordpress and the WP E-commerce plug-in, the products have always worked fine and the pages when you view them in a browser work fine and people can purchase the products with no problems. However in the Google merchant feed and in the Moz crawl diagnostics certain product pages are returning a 404 error message and I can't work out why, especially as the pages load fine in the browser. I had a look at the page headers and can see when the page does load the initial request does return a 404 error message, then every other request goes through and loads fine. Can anyone help me as to why this is happening? A link to the product I have been using to test is: http://earthkindoriginals.co.uk/organic-clothing/lounge-wear/organic-tunic-top/ Here is a part of the header dump that I did: http://earthkindoriginals.co.uk/organic-clothing/lounge-wear/organic-tunic-top/
Intermediate & Advanced SEO | | leapSEO
GET /organic-clothing/lounge-wear/organic-tunic-top/ HTTP/1.1
Host: earthkindoriginals.co.uk
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=159840937.1804930013.1369831087.1373619597.1373622660.4; __utmz=159840937.1369831087.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); wp-settings-1=imgsize%3Dmedium%26hidetb%3D1%26editor%3Dhtml%26urlbutton%3Dnone%26mfold%3Do%26align%3Dcenter%26ed_size%3D160%26libraryContent%3Dbrowse; wp-settings-time-1=1370438004; __utmb=159840937.3.10.1373622660; PHPSESSID=e6f3b379d54c1471a8c662bf52c24543; __utmc=159840937
Connection: keep-alive
HTTP/1.1 404 Not Found
Date: Fri, 12 Jul 2013 09:58:33 GMT
Server: Apache
X-Powered-By: PHP/5.2.17
X-Pingback: http://earthkindoriginals.co.uk/xmlrpc.php
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 6653
Connection: close
Content-Type: text/html; charset=UTF-80 -
DNS Settings went wrong....
Hi, I'm going to have to give you a little bit of a back story here... In July last year we launched a brand new website, www.turnkeylandlords.co.uk. It was on a new domain. The IT department set it live, and unfortunately messed up the DNS settings so that the site was launched under the wrong domain, smartloan.co.uk. This error was rectified within hours. Unfortunately in those few hours, Google indexed it! I then had to set up webmaster tools for both domains, so I could use the 'remove URLs' tool in there, to remove all the URLs from the smartloan domain. That all worked fine, the Landlords site was probably set back a bit, but we're now achieving some quite good results for it. 2 weeks ago we launched Smartloan as a product, and of course launched the website we've been working on for months... You guessed it! Google is now looking for all those old Landlords pages under Smartloan. My first thought is that we should do 301s. Would that be the best course of action, do you think? Webmaster tools has found 25 of them so far, but I know there are more - the Landlords site launched with about 90 pages... And where should I send the 301's? To the Landlords site, or to the smartloan root? Is there anything else I should do? Thanks for your help! Amelia
Intermediate & Advanced SEO | | CommT0 -
Parameters when deciding how to set up cross domain analytics?
Hi, I am planning on setting up cross domain analytics for my website. I have found lots of sources about the code to use, but haven't been able to find something about what parameters to consider. Would appreciate my fellow mozzers helping me to put together a list of all the variations of elements I may want to track to ensure I am setting it up correctly. Thanks!
Intermediate & Advanced SEO | | theLotter0 -
Help with setting up 301 redirects from /default.aspx to the "/" in ASP.NET using MasterPages?
Hi SEOMoz Moderators and Staff, My web developer and I are having a world of trouble setting up the best way to 301 redirect from www.tisbest.org/default.aspx to the www.tisbest.org since we're using session very heavily for our ASP.NET using MasterPages. We're hoping for some help since our homepage has dropped 50+ positions for all of our search terms since our first attempt at setting this up 10 days ago. = ( A very bad result. We've rolled back the redirects after realizing that our session system was redirecting www.tisbest.org back to www.tisbest.org/default.aspx?AutoDetectCookieSupport=1 which would redirect to a URL with the session ID like this one: http://www.tisbest.org/(S(whukyd45tf5atk55dmcqae45))/Default.aspx which would then redirect again and throw the spider into an unending redirect loop. The Google gods got angry, stopped indexing the page, and we are now missing from our previous rankings though, thankfully, several of our other pages do still exist on Google. So, has anyone dealt with this issue? Could this be solved by simply resetting up the 301 redirects and also configuring ASP.NET to recognize Google's spider as supporting cookies and thus not serving it the Session ID that has caused issue for us in the past? Any help (even just commiserating!) would be great. Thanks! Chad
Intermediate & Advanced SEO | | TisBest0 -
I run an (unusual) clothing company. And I'm about to set up a version of our existing site for kids. Should I use a different domain? Or keep the current root domain?
Hello. I have a burning question which I have been trying to answer for a while. I keep getting conflicting answers and I could really do with your help. I currently run an animal fancy dress (onesie) company in the UK called Kigu through the domain www.kigu.co.uk. We're the exclusive distributor for a supplier of Japanese animal costumes and we've been selling directly through this domain for about 3 years. We rank well across most of our key words and get about 2000 hits each day. We're about to start selling a Kids range - miniature versions of the same costumes. We're planning on doing this through a different domain which is currently live - www.kigu-kids.co.uk. It' been live for about 3-4 weeks. The idea behind keeping them on separate domains is that it is a different target market and we could promote the Kids site separately without having to bring people through the adult site. We want to keep the adult site (or at least the homepage) relatively free from anything kiddy as we promote fancy dress events in nightclubs and at festivals for over 18s (don't worry, nothing kinky) and we wouldn't want to confuse that message. I've since been advised by an expert in the field that that we should set up a redirect from www.kigu-kids.co.uk and house the kids website under www.kigu.co.uk/kids as this will be better from an SEO perspective and if we don't we'll only be competing with ourselves. Are we making a big mistake by not using the same root domain for both thus getting the most of the link juice for the kids site? And if we do decide to switch to have the domain as www.kigu.co.uk/kids, is it a mistake to still promote the www.kigu-kids.co.uk (redirecting) as our domain online? Would these be wasted links? Or would we still see the benefit? Is it better to combine or is two websites better than one? Any help and advice would be much appreciated. Tom.
Intermediate & Advanced SEO | | KIGUCREW0