Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can someone interpret this entry in my htaccess file into english so that I can understand?
There are a number of entries in my htaccess and I'd like to understand what they are doing so that I can understand if they need to be there or not. So, can someone tell me what this says...in plain english? RewriteCond %{HTTP_HOST} ^legacytravel.com$ [OR]
Technical SEO | | cathibanks
RewriteCond %{HTTP_HOST} ^www.legacytravel.com$
RewriteRule ^carrollton-travel-agent$ "http://www.legacytravel.com/carrollton-travel-agent" [R=301,L] Thank you a million times in advance.0 -
No follow links on a blog
Hi On our blog, we have a section called 'Tags'. I have just noticed that these links are all "no follow" links. The tags section does appear on every single page on the blog - is this recommend to have them as 'no follow' links or should I get our developer to change them. Thanks
Technical SEO | | Andy-Halliday0 -
Keyword links in footer
Hi - I am trying to help a site to get out from under a Google manual action penalty - down as "Partial Matches - Unnatural Links to site".
Technical SEO | | StevieD
I am checking through their links - the site that links most to them is a local directory style site - it has 2,682 links back into 1 page (Home) The directory site was built by the web co. that built my clients' site and they put a keyword link in the footer of the directory site - the keyword was "Buy Truffles". All my instincts say that is a bad thing! But - this is what is perplexing me - they are ranking no.1 for that keyword! Whereas they have lost rankings (i.e. not top 50) for all the other keywords they were targeting. So I don't get it! Can anyone explain why this is. I feel I should I get that link removed but don't want to take out their only ranking keyword! Webmaster shows about 55 different pages in the directory site have a link back to my client. Hope you can help.
Cheers - Steve0 -
External Links Discrepancy
Hello folks Apologies for my ignorance, but a SEO novice here… One of our competitors boasts over 300,000 external links, however when we analysed their links via http://www.opensiteexplorer.org we can only see around 10,000 in there “Number of Domains Linking to this Page” section. Can someone please assist and point out something which I assume is painfully obvious! Cheers, Chris
Technical SEO | | footyfriends0 -
Clickable links in video?
Hi, I am creating a website for mobile devices and have planned to use videos at certain places. I want to have a clickable url inserted at the end of the video. Any suggestions how may i do it? Regards
Technical SEO | | IM_Learner0 -
Can you mark up a page using Schema.org and Facebook Open Graph?
Is it possible to use both Schema.org and Facebook Open Graph for structured data markup? On the Google Webmaster Central blog, they say, "you should avoid mixing the formats together on the same web page, as this can confuse our parsers." Source - http://googlewebmastercentral.blogspot.com/2011/06/introducing-schemaorg-search-engines.html
Technical SEO | | SAMarketing1 -
OSE Link Differential
I have the chrome toolbar installed. In the SERP a site I was looking at had 686 links from 12 domains linking to the root domain. When I checked this site in OSE with filters set to all pages in root domain it shows 65 links from 12 domains. Can anyone explain the difference?
Technical SEO | | waynekolenchuk0 -
Used SEOMOZ top 100 Directories, my site ranking lowered, what can we do to fix this?
We have made a big mistake.... So what can we do to fix this? A trainee member of staff has used the seomoz 100 top directories and added to sites from PR10 to PR6 approx about 25 sites, using keywords were possible instead of using the website URL "which i now was stupid!. Our website ranking have been lowered big time for all keywords used!, eg from 1st to 10th and even disappeared from the top 100 We are contacting all directories asking for the Title link to be changed to the URL instead of a keyword.. Will this help? I understand that Google give sites a penalty for this!!, but what can i do to put this right and how long would this penalty last for? Any advice would be highly appreciated... Thanks Dean
Technical SEO | | deanpallatt0