Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.

-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does a site that is worse than mine by every objective measure I can find, keep outranking me in search?
I’ve been working on educating myself about SEO all day, again. All-Star Telescope up in Canada. We have a competitor that consistently ranks #1 and I don't get it. Their site is full of duplicate content (straight copy and paste from the manufacturer site). They don't have any meaningful blog or video content to add relevance or value to their site. We have higher page authority, higher domain authority, and they keyword analyzer in moz says that our page is higher quality than the the competitors page. Our site is slow, but theirs is slower. I can’t find a single metric on any tool (ubbersuggest, Moz, ahrefs, semrush) that says Telescopes Canada is a better site, or has a better NexStar 8SE product page (a popular telescope). Here’s the link to Telescope Canada’s page for their Celestron 8SE: https://telescopescanada.ca/products/celestron-nexstar-8se-computerized-telescope-11069?_pos=1&_sid=f0aa91cc2&_ss=r Here’s a link to the Celestron 8SE page from the manufacturer website: https://www.celestron.com/products/nexstar-8se-computerized-telescope?_pos=1&_sid=56abdabd4&_ss=r#description Telescopes Canada has just copied and pasted. There is no original content aside from adding the shipping and return policy to the tab, and having some options for selecting accessories on the page. Here is our page: https://all-startelescope.com/products/celestron-nexstar-8se Our titles are good, our metadata is good (but I don’t think that’s been a serious ranking factor for about ten years). The text is original, it’s relevant, we have healthy internal links to the page. We have invensted in some excellent blog content, we’re adding new products to the website so that we rank for more keywords. All of those things are helping, but I fundamentally don’t understand why Telescopes Canada is #1 almost across the board on every key product in our market. There is something that I’m not seeing here, something that isn't being captured by the tools that I have. Is it simple the fact that they get more traffic? Is that why some people go and buy traffic? Can you see any metric, any tool in your toolbox that indicates why they rank at the top, or even higher than we do for in these search terms specific to that product: Celestron NexStar 8SE
Technical SEO | | nkennett
NexStar 8SE
Celestron NexStar 8SE Canada
NexStar 8SE Canada We've worked with two highly ranked SEO's to try and figure this out, one in Canada, and one in the USA. I haven't seen a confidence inspiring answer from either of them. Posting on a forum is a bit of an act of desperation, I'll continue to work the problem, but it's discouraging to see the leader in my industry look like he's just phoning it in with his website.1 -
How to find orphan pages
Hi all, I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site. However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too). Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract? Thanks!
Technical SEO | | KJH-HAC1 -
How can I stop a tracking link from being indexed while still passing link equity?
I have a marketing campaign landing page and it uses a tracking URL to track clicks. The tracking links look something like this: http://this-is-the-origin-url.com/clkn/http/destination-url.com/ The problem is that Google is indexing these links as pages in the SERPs. Of course when they get indexed and then clicked, they show a 400 error because the /clkn/ link doesn't represent an actual page with content on it. The tracking link is set up to instantly 301 redirect to http://destination-url.com. Right now my dev team has blocked these links from crawlers by adding Disallow: /clkn/ in the robots.txt file, however, this blocks the flow of link equity to the destination page. How can I stop these links from being indexed without blocking the flow of link equity to the destination URL?
Technical SEO | | UnbounceVan0 -
Updating inbound links vs. 301 redirecting the page they link to
Hi everyone, I'm preparing myself for a website redesign and finding conflicting information about inbound links and 301 redirects. If I have a URL (we'll say website.com/website) that is linked to by outside sources, should I get those outside sources to update their links when I change the URL to website.com/webpage? Or is it just as effective from a link juice perspective to simply 301 redirect the old page to the new page? Are there any other implications to this choice that I may want to consider? Thanks!
Technical SEO | | Liggins0 -
Screaming Frog Content Showing charset=UTF-8
I am running a site through Screaming Frog and many of the pages under "Content" are reading text/html; charset=UTF-8. Does this harm ones SEO and what does this really mean? I'm running his site along with this competitors and the competitors seems very clean with content pages reading text/html. What does one do to change this if it is a negative thing? Thank you
Technical SEO | | seoessentials0 -
Links from Instructables.com?
This is a silly newbie question. But will posting on www.instructables.com with some valuable content and url link back to my site help with "linking"? Or do they put a no-follow on all links on their site? Thanks for answering! Ron
Technical SEO | | yatesandcojewelers0 -
How can I find my Webmaster Tools HTML file?
So, totally amateur hour here, but I can't for the life of me find our HTML verification file for webmaster tools. I see nowhere to look at it in Google Webmaster Tools console, I tried a site:, I googled it, all the info out there is about how to verify a site. Ours is verified, but I need the verification file code to sync up with the Google API and no one seems to have it. Any thoughts?
Technical SEO | | healthgrades0 -
Does the Referral Traffic from a Link Influence the SEO Value of that Link?
If a link exists, and nobody clicks on it, could it still be valuable for SEO? Say I have 1000 links on 500 sites with Domain Authority ranging from 35 to 80. Let's pretend that 900 of those links generate referral traffic. Let's assume that the remaining 100 links are spread between 10 domains of the 500, but nobody ever clicks on them. Are they still valuable? Should an SEO seek to earn more links like those, even though they don't earn referral traffic? Does Google take referral data into account in evaluating links? 5343313-zelda-rogers-albums-zelda-pictures-duh-what-else-would-they-be-picture3672t-link-looks-so-lonely.jpg Sad%20little%20link.jpg
Technical SEO | | glennfriesen1