Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can google bots read my internal post links if they are all listed in a javascript accordian where I list my sources?
I post a JavaScript accordion drop down tab [ a collapsible content area ] at the end of all my posts. I labeled the accordion "Show Article Sources"., and when a user clicks it, then the accordion expands open and it shows all the sources I listed for my article. And this is where I post all of my articles links that I reference per each article. But I read somewhere that google crawlers can not read text in a drop down JavaScript tab. So I am wondering now if this is true because that would mean I have no internal linking SEO going on since it cant read the links? ..... if it is true, then I should remove the accordion from all my articles and some how include the links I reference in the actual body text so I can get SEO benefits from external linking similar content? If that's true, what is an aesthetic way to do this, any example links? Tips ? Thoughts ?
Technical SEO | | ianizaguirre0 -
Can OG titles be used as a substitute for Meta titles
We use og (open graph) titles in lieu of meta titles. Is there any downside to using just one. Should we be using both og and meta titles on our page. Appreciate any insight. Himanshu
Technical SEO | | patilhimanshu0 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
Can i use "nofollow" tag on product page (duplicated content)?
Hi, im working on my webstore SEO. I got descriptions from official seller like "Bosch". I got more than 15.000 items so i cant create unique content for each product. Can i use nofollow tag for each product and create great content on category pages? I dont wanna lose rankings because duplicated content. Thank you for help!
Technical SEO | | pejtupizdo0 -
Using both .co.uk and .com
Hello a client has launched a website with both the .com and .co.uk The content is identical. I understand that you should add rel="alternate" hreflang="x" to the code. However, will there be a problem with the identical content? It would be hard to localise the content to one country. I understand why the client has got both domains, particularly the UK one but the actual content is not specific to one country. It is written for English speaking customers really. Also what about links? In this case do you need to build two sets of links to make them both rank? Thanks for any help.
Technical SEO | | AL123al0 -
Quality links are beneficial, but are neutral links detrimental?
So obviously a link profile featuring quality / authoritative / relavant in-bound links is preferable, but here's my question: If I'm starting work on a brand new domain, should I build links that one would consider neutral (i.e. from a non-spammy, but unrelated site) or should I not bother and only focus on quality links? Thanks
Technical SEO | | underscorelive0 -
Internal links of my website is taken as inbound link ?
Hi, I was checking my links in Open Site Explorer (http://www.opensiteexplorer.org/links?site=www.bons-plans-vacances.fr) this morning and i came up with this: My main domain is taken as outbound links ...! This link : www.bons-plans-vacances.fr/ Anchor Text : (img alt)100% Bons Plans Voyages From this URL : www.bons-plans-vacances.fr/ I have the same problem with my subdomains : voyage.bons-plans-vacances.fr/sejour/Toutes-Destinations I have that HTML code on the link : Any help ? This is very strange .. i have the same result in google webmaster tools. Thanks 🙂 eDE9b.jpg
Technical SEO | | BonsPlansvacances0 -
Where can I find a good definition of "link juice"?
I have heard the term link juice being used in many different contexts. Where can I find a good definition for it?
Technical SEO | | casper4340