Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I am losing 1 point of DA at month? What could it be? I have noticed I have lost 50K (out of 300K) of internal links after a website update, could it be related to that?
I am losing 1 point of DA at month? What could it be? I have noticed I have lost 50K (out of 300K) of internal links after a website update, could it be related to that?
Technical SEO | | albertoalchieriefficio0 -
Multilingual Sitewide Links
Multilingual links in the footer section is being counted as backlink and we are getting tons of backlinks from all the 7 lingual websites. Is there a solution where we eliminate these links and still having the option to navigate to other lingual pages? vr24NAv
Technical SEO | | comfortclick0 -
What is link Schemes?
Hello Friends, Today I am reading about link schemes on http://support.google.com/webmasters/bin/answer.py?hl=en&answer=66356 there are a several ways how to avoid Google penalties and also talk about the low quality links. But I can't understand about "Low-quality directory or bookmark site links" Is there he talked about low page rank, Alexa or something else?
Technical SEO | | KLLC0 -
As a wholesale website can our independent retailer's website use (copy) our content?
As a wholesaler of villa rentals, we have descriptions, images, prices etc can our agents (independent retailers) use the content from our website for their site or will this penalize us or them in Google rankings?
Technical SEO | | ewanTHH0 -
Too many on page links
Hello I have about 800 warnings with this. Example of one url with this problem is: http://www.theprinterdepo.com/clearance?dir=asc&order=price I was checking and I think all links are important. But I suppose that if I put a nofollow on the links on the left which are only for navigation purposes I can get rid of these warnings. Any other idea?
Technical SEO | | levalencia10 -
Affiliate links
Is there a best practice for linking out to affiliates URLs post panda? I know some believe it can be a factor.
Technical SEO | | PeterM220 -
How to find artificial or unnatural links in OSE?
Hi, I just got a message from Google Webmaster Tools telling that there are "artificial or unnatural links" pointing to one of my subdomains, and that I should investigate and submit my site for reconsideration. The subdomain in question has inbound links from 4K linking root domains. We are a certificate authority (we provide SSL certificates) so the majority of those links come from the site seal that customers place on their secure pages. We sell certificates to a full spectrum site types, from all sizes of ecommerce sites to .edu, .gov, and even adult. That said, our linking root domains have always been a mixed bunch, which tells me that these offending links were recently added. Here are my questions: Is it possible to slice my link reports with some sort of time element, so that I can narrow the search to only the newest inbound links? How else might I use OSE to find these "artificial or unnatural links"? Are there any particular attributes I should be looking for in a linking root domain that might suggest it's seen by Google as "artificial or unnatural". Any help with any aspect of this issue would be greatly appreciated. Thanks, Dennis p.s. I should probably state that I've never bought links or participated in link schemes.
Technical SEO | | dennis.globalsign0 -
.Nofollow and link count
If i use nofollow on links ( internal or external ), will it reduce the link count as regard to Google. If there are 50 external links, and i nofollow 20 of them, will Google count this as 30 external links.
Technical SEO | | seoug_20050