Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Spammy nofollow links
Hello, One of our clients - a cleaning business - has a heck of a lot of spammy nofollow links pointing to their site. The majority of the links are from comments or 'pingbacks', most with the anchor text 'cheap nfl jerseys' or 'cyber monday ugg boots'. After researching the subject of spammy nofollow links, it seems there is a lot of uncertainty regarding the negative affect these could have on your SEO efforts. So I guess my question to the community is: if your site was suddenly hit by a plethora of spammy nofollow links, what would you do and why? Cheers, Lewis
Technical SEO | | PeaSoupDigital0 -
Links from Instructables.com?
This is a silly newbie question. But will posting on www.instructables.com with some valuable content and url link back to my site help with "linking"? Or do they put a no-follow on all links on their site? Thanks for answering! Ron
Technical SEO | | yatesandcojewelers0 -
Finding Broken Back Links
Hello there I am new here but really want to mend my broken website by myself as I enjoy a challenge! I used to have great rankings but have moved websites a few times (same domain) and the last move was to wordpress. I now have loads of broken links in the SERPS and wondered if there was an easy way to flush google of them as they are getting lots of 404 errors? They really are too many to do a 301 on (I have done the main pages) Also how do I do a crawl of my website for any internal broken links? Does SEOmoz have something or is there an external program you would recommend? Thanks Victoria
Technical SEO | | vcasebourne0 -
Why would you remove a canonical link?
Currently, my client's blog makes a duplicate page every time someone comments on a post. The previous SEO consultant told the developer to not put a canonical link directing it to the main blog post. Did taking out the canonical link result in these duplicate pages? My question is why would she recommend this action? Is it best to now add in the canonical link in or should we implement a 301 redirect or insert a index: no follow? Would adding a canonical link keep duplicate pages from happening in the future?
Technical SEO | | Scratch_MM0 -
Adding no follow links on my site
I am getting a warning about having too many links on my page www.accessoriesonline.co.uk (152) but I don't want to remove any links from the site. Its an ecommerce site with categories across the top, featured products and then a further category navigation in the footer. Would it be beneficial if I added a rel="nofollow" to the links in the footer as these are duplicates of the one's in the header or would this harm the links in the header and the destination URL's which I definitely want to be crawled? Also, does anyone know if SEOMOZ considers links with a rel=nofollow as an actually link when they calculate their overview? Thanks in advance
Technical SEO | | gavinhoman0 -
Internal linking to subdomains
Hi *, I have a main site called example.org, and a lot of user generated pages to foo.example.org / bar.example.org and so on. Most of those pages link back to example.org. In example.org I have a page that links to all subdomains. How can I optimize the pagerank of the list page? Should I add nofollow to subdomain sites to avoid passing link juice to those sites and keep normal linking from subdomain sites?
Technical SEO | | ngw0 -
Can someone break down 'page level link metrics' for me?
Sorry for the, again, basic question - can someone define page level link metrics for me?
Technical SEO | | Benj250