Setting A Custom User Agent in Screaming Frog
-
Hi all,
Probably a dumb question, but I wanted to make sure I get this right.
How do we set a custom user agent in Screaming Frog? I know its in the configuration settings, but what do I have to do to create a custom user agent specifically for a website?
Thanks much!
- Malika
-
Setting a custom user agent determines things like HTTP/2 so there can be a big difference if you change it to something that might not take advantage of something like HTTP/2
Apparently, it is coming to Pingdom very soon just like it is to Googlebot
http://royal.pingdom.com/2015/06/11/http2-new-protocol/
This Is an excellent example of a user agent's ability to modify the way your site is crawled as well as how efficient it is.
https://www.keycdn.com/blog/https-performance-overhead/
It is important to note that we didn’t use Pingdom in any of our tests because they use Chrome 39, which doesn’t support the new HTTP/2 protocol. HTTP/2 in Chrome isn’t supported until Chrome 43. You can tell this by looking at the
User-Agent
in the request headers of your test results.Pingdom user-agent
Note: WebPageTest uses Chrome 47 which does support HTTP/2.
Hope that clears things up,
Tom
-
Hi Malika,
Think about screaming frog and what it has to detect in order to do that correctly it needs the correct user agent syntax for it will not be able to make a crawl that would satisfy people.
Using a proper syntax for a user agent is essential and I have tried to be non-technical in this explanation I hope it works.
the reason screaming frog needs the user agent because the user-agent was added to HTTP to help web application developers deliver a better user experience. By respecting the syntax and semantics of the header, we make it easier and faster for header parsers to extract useful information from the headers that we can then act on.
Browser vendors are motivated to make web sites work no matter what specification violations are made. When the developers building web applications don’t care about following the rules, the browser vendors work to accommodate that. It is only by us application developers developing a healthy respect
When the developers building web applications don’t care about following the rules, the browser vendors work to accommodate that. It is only by us application developers developing a healthy respect
It is only by us application developers developing a healthy respect for the standards of the web, that the browser vendors will be able to start tightening up their codebase knowing that they don’t need to account for non-conformances.
For client libraries that do not enforce the syntax rules, you run the risk of using invalid characters that many server side frameworks will not detect. It is possible that only certain users, in particular, environments would identify the syntax violation. This can lead to difficult to track down bugs.
I hope this is a good explanation I've tried to keep it very to the point.
Respectfully,
Thomas
-
Hi Thomas,
would you have a simpler tutorial for me to understand? I am struggling a bit.
Thanks heaps in advance
-
I think I want something that is dumbed down to my level for me to understand. The above tutorials are great but not being a full time coder, I get lost while reading those.
-
Hi Matt,
I havent had a luck with this one yet.
-
Hi Malika! How'd it go? Did everything work out?
-
happy I could be of help let me know if there's any issue and I will try to be of help with it. All the best
-
Hi Thomas,
That's a lot of useful information there. I will have a go on it and let you know how it went.
Thanks heaps!
-
please let me know if I did not answer the question or you have any other questions
-
this gives you a very clear breakdown of user agents and their set of syntax rules. The following is valid example of user-agent that is full of special characters,
read this please http://www.bizcoder.com/the-much-maligned-user-agent-header
user-agent: foo&bar-product!/1.0a$*+ (a;comment,full=of/delimiters
references but you want to pay attention to the first URL
https://developer.mozilla.org/en-US/docs/Web/HTTP/Gecko_user_agent_string_reference
| Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 |
http://stackoverflow.com/questions/15069533/http-request-header-useragent-variable
-
if you formatted it correctly see below
User-Agent = product *( RWS ( product / comment ) )
and it was received by your headers yes you could fill in the blanks and test it.
https://mobiforge.com/research-analysis/webviews-and-user-agent-strings
http://mobiforge.com/news-comment/standards-and-browser-compatibility
-
No, you Cannot just put anything in there. The site has to recognize it and ask why you are doing this?
I have listed how to build and already built in addition to what your browser will create by using useragentstring.com
Must be formatted correctly and have it work with a header it is not as easy as it sometimes seems but not that hard either.
You can make & use this to make your own from your Mac or PC
http://www.useragentstring.com/
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2747.0 Safari/537.36
how to build a user agent
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Gecko_user_agent_string_reference
- https://developer.mozilla.org/en-US/docs/Setting_HTTP_request_headers
- https://msdn.microsoft.com/en-us/library/ms537503(VS.85).aspx
Lists of user agents
https://support.google.com/webmasters/answer/1061943?hl=en
https://msdn.microsoft.com/en-us/library/ms537503(v=vs.85).aspx
-
Hi Thomas,
Thanks for responding, much appreciated!
Does that mean, if I type in something like -
HTTP request user agent -
Crawler access V2
&
Robots user agent
Crawler access V2
This will work too?
-
To crawl using a different user agent, select ‘User Agent’ in the ‘Configuration’ menu, then select a search bot from the drop-down or type in your desired user agent strings.
http://i.imgur.com/qPbmxnk.png
&
Video http://cl.ly/gH7p/Screen Recording 2016-05-25 at 08.27 PM.mov
Or
Also see
http://www.seerinteractive.com/blog/screaming-frog-guide/
https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#user-agent
https://www.screamingfrog.co.uk/seo-spider/user-guide/
https://www.screamingfrog.co.uk/seo-spider/faq/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to find Topics ? software or results and user intent
Hello, Is there a software that is better than an other to find to right topics to cover in my content. I am thinking about Moz, Marketmuse or Semrush or is it better to look at the search results because they match user intent and see what is covered and cover those in my content Thank you,
Intermediate & Advanced SEO | | seoanalytics1 -
My company wants to set up some blogs - what's best practice in getting started from scratch?
My company wants to set up two or three blogs (on previously unused domains) with the idea being to disseminate good content that gets picked up in SERPs and acts as a lead generator, shows us to be authorities in our market, creates brand (or individual employee who's doing the blogging) awareness etc... From scratch, what are all the boxes that should be ticked to make this work from the outset? What are the must haves?With all the ideals in place, how long could it realistically take to make this work? What are some pitfalls to look out for? Any advice in general will be appreciated. Thanks, M
Intermediate & Advanced SEO | | Martin_S0 -
Using rel="nofollow" when link has an exact match anchor but the link does add value for the user
Hi all, I am wondering what peoples thoughts are on using rel="nofollow" for a link on a page like this http://askgramps.org/9203/a-bushel-of-wheat-great-value-than-bushel-of-goldThe anchor text is "Brigham Young" and the page it's pointing to's title is Brigham Young and it goes into more detail on who he is. So it is exact match. And as we know if this page has too much exact match anchor text it is likely to be considered "over-optimized". I guess one of my questions is how much is too much exact match or partial match anchor text? I have heard ratios tossed around like for every 10 links; 7 of them should not be targeted at all while 3 out of the 10 would be okay. I know it's all about being natural and creating value but using exact match or partial match anchors can definitely create value as they are almost always highly relevant. One reason that prompted my question is I have heard that this is something Penguin 3.0 is really going look at.On the example URL I gave I want to keep that particular link as is because I think it does add value to the user experience but then I used rel="nofollow" so it doesn't pass PageRank. Anyone see a problem with doing this and/or have a different idea? An important detail is that both sites are owned by the same organization. Thanks
Intermediate & Advanced SEO | | ThridHour0 -
Are Incorrectly Set Up URL Rewrites a Possible Cause of Panda
On a .NET site, there was a url rewrite done about 2 years ago. From a visitor's perspective, it seems to be fine as the urls look clean. But, Webmaster tools reports 500 errors from time to time showing /modules/categories... and /modules/products.... which are templates and how the original urls were structured. While the developer made it look clean, I am concerned that he could have set it up incorrectly. He acknowledged that IIS 7 on a Windows server allows url rewrites to be set up, but the site was done in another way that forces the urls to change to their product name. So, he has believed it to be okay. However, the site dropped significantly in its ranking in July 2013 which appears to be a Panda penalty. In trying to figure out if this could be a factor in why the site has suffered, I would like to know other webmasters opinions. We have already killed many pages, removed 2/3 of the index that Google had, and are trying to understand what else it could be. Also, in doing a header check, I see that it shows the /modules/products... page return a 301 status. I assume that this is okay, but wanted to see what others had to say about this. When I look at the source code of a product page, I see a reference to the /modules/products... I'm not sure if any of this pertains, but wanted to mention in case you have insight. I hope to get good feedback and direction from SEOs and technical folks
Intermediate & Advanced SEO | | ABK7170 -
Is a dynamic online user list bad for SEO?
Hello everyone, I have a question that is currently puzzling me, and I hope you can help me with. On musicianspage.com (one of our websites), we show a list of online users embedded within the page which, as you may expect, changes all the time according to who's online at that moment. That list appears on every page of the site, so at any time any page on the site has a different content and different link profile (sometimes we have just a few users connected, other times we may have over 50 users connected at the same time). My question is: is such a "dynamical-embedded" list bad, good or neutral from a SEO stand point? If it is bad, what do you suggest to do? Put it inside a frame? Using AJAX? Any thoughts and suggestions are very welcome! Thanks in advance to anyone reading this. All the best, Fabrizio
Intermediate & Advanced SEO | | fablau0 -
DNS Settings went wrong....
Hi, I'm going to have to give you a little bit of a back story here... In July last year we launched a brand new website, www.turnkeylandlords.co.uk. It was on a new domain. The IT department set it live, and unfortunately messed up the DNS settings so that the site was launched under the wrong domain, smartloan.co.uk. This error was rectified within hours. Unfortunately in those few hours, Google indexed it! I then had to set up webmaster tools for both domains, so I could use the 'remove URLs' tool in there, to remove all the URLs from the smartloan domain. That all worked fine, the Landlords site was probably set back a bit, but we're now achieving some quite good results for it. 2 weeks ago we launched Smartloan as a product, and of course launched the website we've been working on for months... You guessed it! Google is now looking for all those old Landlords pages under Smartloan. My first thought is that we should do 301s. Would that be the best course of action, do you think? Webmaster tools has found 25 of them so far, but I know there are more - the Landlords site launched with about 90 pages... And where should I send the 301's? To the Landlords site, or to the smartloan root? Is there anything else I should do? Thanks for your help! Amelia
Intermediate & Advanced SEO | | CommT0 -
Altering Breadcrumbs based on User Path to Product URL
Hi, Our products are listed in multiple categories, and as the URLs are path dependent (example.com/fruit/apples/granny-smith/, example.com/fruit/green-fruit/granny-smith/ and so forth) we canonicalise to the 'default' URL (in this case example.com/fruit/apples/granny-smith/). For mainly crawling bandwidth issues I'm looking to change all product URL's to path neutral so there is only ever one URL per product (example.com/granny-smith/), but still list the product in multiple categories. If a user comes directly to example.com/granny-smith/ then the breadcrumbs will use the default path "Fruit > Apples", however if the user navigated to the product via another category then I'd like the breadcrumbs to reflect this. I'm not worried about cloaking as it's not based on user-agent and it's very logical why it's being done so I don't expect a penalty. My question is - how do you recommend this is achieved from a technical standpoint? Many sites use path neutral product URL's (Ikea, PCWorld etc) but none alter the breadcrumbs depending upon path. Our site is mostly behind a CDN so it has to be a client side solution. I currently view the options as: Store Path to product in a cookie and/or browsers local-cache Attach the Path details after a # in the URL and use Javascript to alter breadcrumbs onload with JQuery When a user clicks to a product from a listing page, use AJAX to pull in the product info but leave the rest of the page (including the breadcrumbs) as-is, updating the URL accordingly Do you think any of these wouldn't work? Do you have a preference on which one is best? Is there another method you'd recommend? We also have "Next/Previous" functionality (links to the previous and next product URLs) on the page so I suspect we'd need to attach the path after a # and make another round trip to the server onload to update the previous and next links. Finally, does anyone know of any sites that do update the breadcrumbs depending upon path? Thanks in advance for your time FashionLux
Intermediate & Advanced SEO | | FashionLux1 -
Stop Google crawling a site at set times
Hi All I know I can use robots.txt to block Google from pages on my site but is there a way to stop Google crawling my site at set times of the day? Or to request that they crawl at other times? Thanks Sean
Intermediate & Advanced SEO | | ske110