Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Internal no follow links
I have just discovered that the WordPress theme I have been using for some time has no follow internal links on the blog. Simply put each post has an image and text link plus a 'read more'. The Read more is a no-follow which is also on my homepage. The developer is saying duplicate follow links are worse than an internal no follow. What is your opinion on this? Should I spend time removing the no follow?
Technical SEO | | Libra_Photographic0 -
Spammy nofollow links
Hello, One of our clients - a cleaning business - has a heck of a lot of spammy nofollow links pointing to their site. The majority of the links are from comments or 'pingbacks', most with the anchor text 'cheap nfl jerseys' or 'cyber monday ugg boots'. After researching the subject of spammy nofollow links, it seems there is a lot of uncertainty regarding the negative affect these could have on your SEO efforts. So I guess my question to the community is: if your site was suddenly hit by a plethora of spammy nofollow links, what would you do and why? Cheers, Lewis
Technical SEO | | PeaSoupDigital0 -
We just can't figure out the right anchor text to use
We have been trying everything we can with anchor text. We have read here that we should try naturalistic language. Our competitors who are above us in Google search results don't do any of this. They only use their names or a single term like "austin web design". Is what we are doing hurting our listings? We don't have any black hat links. Here's what we are doing now. We are going crazy trying to figure this out. We are afraid to do anything in fear it will damage our position. Bob | pallasart web design | 31 | 1,730 |
Technical SEO | | pallasart
| website by pallasart a texas web design company in austin | 15 | 1,526 |
| website by the austin design company pallasart | 14 | 1,525 |
| created by pallasart a web design company in austin texas | 13 | 1,528 |
| created by an austin web design company pallasart | 12 | 1,499 |
| website by pallasart web design an austin web design company | 12 | 1,389 |
| website by pallasart an austin web design company | 11 | 1,463 |
| pallasart austin web design | 9 | 2,717 |
| website created by pallasart a web design company in austin texas | 9 | 1,369 |
| website by pallasart | 8 | 910 |
| austin web design | 5 | 63 |
| pallasart website design austin |0 -
What if I point my canonicals to a URL version that is not used in internal links
My web developer has pointed the "good" URLs that I use in my internal link structure (top-nav/footer) to another duplicate version of my pages. Now the URLs that receive all the canonical link value are not the ones I use on my website. is this a problem and why??? In theory the implementation is good because both have equal content. But does it harm my link equity if it directs to a URL which is not included in my internal link architecture.
Technical SEO | | DeptAgency0 -
Effective use of hReview
Hi fellow Mozzers! I am just in the process of adding various reviews to our site (a design agency), but I wanted to use the ratings in different ways depending on the page. So for the home page and the services (branding, POS, direct mail etc) I wanted to aggregate relevant reviews (giving us an average of all reviews for the home page, an average of ratings from all brand projects and so on). Then, I wanted to put specific reviews on our portfolio pages, so the review relates specifically to that project. This is the easiest to do as the hReview generator is geared up for reviews that come from one source, but I can't find a way of aggregating the star ratings to make an average rating rich snippet. Anyone know where I can get the coding for this? Thanks in advance! Nick.
Technical SEO | | themegroup0 -
If two links from one page link to another, how can I get the second link's anchor text to count?
I am working on an e-commerce site and on the category pages each of the product listings link to the product page twice. The first is an image link and then the second is the product name. I want to get the anchor text of the second link to count. If I no-follow the image link will that help at all? If not is there a way to do this?
Technical SEO | | JordanJudson0 -
We have a ton of legacy links that include /?ref=tracking-goes-here. We need concile this, can the conical tag be used to fix this? How?
www.firehost.com/?ref=pressrelease example - http://cl.ly/2O1d1x2m3b1b3K1K0h2J
Technical SEO | | FirePowered0