Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Massive Influx of Total Links - But External Links are Dropping?
Hey Moz Community, I was checking out the Links on one of my client's sites, as they were hit with spammy external links about a year ago, and noticed a large influx of Total Links to the site. According to Moz, external links have actually dropped over the last few months, so I can only assume they are internal links. But, I don't see how my client could add so many internal links over the past 5 months, as they don't do much besides upload new products (they're an ecommerce clothing retailer) via Shopify. They haven't added much over the past half year either. Total links were about 130K in Oct 2019; today, the site has almost 1 million. I've attached some screenshots for reference via Moz to better illustrate the issue. Appreciate any insights into this. Thank you in advance! hhCCUsk lyGltZD
Technical SEO | | EdenPrez0 -
Should we Nofollow Social Links?
I've been asked the question of whether if we should nofollow all of our social links, would this be a wise thing to do? I'm not exactly getting a clear answer from search results and thought you guys would be best to ask 🙂 Thanks in advance.
Technical SEO | | JH_OffLimits0 -
Spam link? Links from linguee
Hi Everyone My site received a notification of unnatural links in Webmaster Tools and the site has had a penalty applied. I can see there are a lot of links from a site : linguee.com .de. nl. ect ..more than 30k of them! I am not sure where did those links come from! The suddenly appeared over the weekend. Does anyone has similar experience before and any suggestion? Thanks Ricky
Technical SEO | | SEO-SMB0 -
Google not using redirect
We have a GEO-IP redirect in place for our domain, so that users are pointed to the subfolder relevant for their region, e.g: Visit example.com from the UK and you will be redirected to example.com/uk This works fine when you manually type the domain into your browser, however if you search for the site and come to example.com, you end up at example.com I didn't think this was too much of an issue but our subfolders /uk and /au are not getting ranked at all in Google, even for branded keywords. I'm wondering if the fact that Google isn't picking up the redirect means that the pages aren't being indexed properly? Conversely our US region (example.com/us) is being ranked well. Has anyone encountered a similar issue?
Technical SEO | | ahyde0 -
Should I no follow all external links?
I have worked with a few different SEO firms lately and a lot of them have recommended on the sites I was working on to "no-follow" all external links on the site. On one hand this traps all the link equity/Pagerank. On the other I would think this practice is frowned upon by Google. What are some opinions on this?
Technical SEO | | MarloSchneider0 -
Can Google read onClick links?
Can Google read and pass link juice in a link like this? <a <span="">href</a><a <span="">="#Link123" onClick="window.open('http://www.mycompany.com/example','Link123')">src="../../img/example.gif"/></a> Thanks!
Technical SEO | | jorgediaz0 -
Used SEOMOZ top 100 Directories, my site ranking lowered, what can we do to fix this?
We have made a big mistake.... So what can we do to fix this? A trainee member of staff has used the seomoz 100 top directories and added to sites from PR10 to PR6 approx about 25 sites, using keywords were possible instead of using the website URL "which i now was stupid!. Our website ranking have been lowered big time for all keywords used!, eg from 1st to 10th and even disappeared from the top 100 We are contacting all directories asking for the Title link to be changed to the URL instead of a keyword.. Will this help? I understand that Google give sites a penalty for this!!, but what can i do to put this right and how long would this penalty last for? Any advice would be highly appreciated... Thanks Dean
Technical SEO | | deanpallatt0 -
Value of Twitter Links
Let's ignore the "social metric" value of Twitter links and mentions and look at it from the pure link juice point of view. Twitter accounts such as http://twitter.com/randfish used to have their own PageRank and were treated as separate URLs. Twitter changed that to http://twitter.com/#!/randfish consolidating all their content to a single URL. When I search for "randfish" in Google, however, the result is the first URL version. Some clarification on this matter would be much appreciated.
Technical SEO | | Dan-Petrovic0