Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do bad links to a sub-domain which redirects to our primary domain pass link juice and hurt rankings?
Sometime in the distant past there existed a blog.domain.com for domain.com. This was before we started work for domain.com. During the process of optimizing domain.com we decided to 301 blog.domain.com to www.domain.com. Recently, we discovered that blog.domain.com actually has a lot of bad links pointing towards it. By a lot I mean, 5000+. I am curious to hear people's opinions on the following: 1. Are they passing bad link juice? 2. does Google consider links to a sub-domain being passed through a 301 to be bad links to our primary domain? 3. The best approach to having these links removed?
Technical SEO | | Shredward0 -
Should I consider webmaster tools links and linked pages ratio to remove unnatural links?
I don't know this is a suitable place for post this question. Anyway I have done it. According to the Google webmaster tools, Links to your site page. My blog has considerable amount of links, from linked pages (from certain domain names). For an instance please refer following screenshot. When I am removing unnatural links, should I consider these, links from linked pages ratio? Almost all of these sites are social bookmarking sites. When I publish a new bookmark on those sites, they automatically add a homepage link. As a result of that, I got a huge number of home page links from linked pages. What is your recommendation? Thanks! webmaster.png web_master_tools.png
Technical SEO | | Godad0 -
DropDown Menu with 175 links in headers, Can it hurt SEO?
I'm planning to add a dropdown menu in my online store header. The dropdown menu will have about 175 options with 175 internal links to different products. Can it hurt my SEO for aving more then 175 internal links on my header. This header will be on every pages. Thank you, BigBlaze
Technical SEO | | BigBlaze2050 -
While SEOMoz currently can tell us the number of linking c-blocks, can SEOMoz tell us what the specific c-blocks are?
I know it is important to have a diverse set of c-blocks, but I don't know how it is possible to have a diverse set if I can't find out what the c-blocks are in the first place. Also, is there a standard for domain linking c-blocks? For instance, I'm not sure if a certain amount is considered "average" or "above-average."
Technical SEO | | Todd_Kendrick0 -
Linking to unrelated content
Hi, Just wanted to know, linking to unrelated content will harm the site? I know linking to unrelated content is not good. But wanted to know weather any chances are there or not. I have a site related to health and the other one related to technology. The technology site is too good having PR 6 and very good strong backlinks. And the health related site has very much tough competition, So i wanted to know may be i could link this health site to technology site to get good link from it. Can you suggest me about it. waiting for your replies...
Technical SEO | | Dexter22387874870 -
HTTP301 or link ?
We have a page on a website (let's name it ABC) which ranks very well on Google for a specific keyword but this keyword is not the main activity of website ABC. For this reason we created website XYZ for offering the services related to the specific keyword. How shall we redirect the visitors from website ABC to website XYZ so XYZ gets all the weight ? Is it best to do an HTTP301 from the specific page on site ABC or from site ABC, remove nearly all content related to the keyword and create a link to website XYZ ? Your advice is well appreciated.
Technical SEO | | netbuilder0 -
How can i redirect a url that has % in it?
Google webmaster tools shows a 400 eroor for an old link that contains a 30% off in it. The problem is the % I would like to 301 redirect this link : http://www.geographics.com/Graduation-Stationery,-35%-OFF-Printable-Certificates-Blank-Gift-Certificates/c1353_1354_1359/index.html to http://www.geographics.com/Graduation-Stationery-Printable-Certificates-Blank-Gift-Certificates/c1353_1354_1359/index.html We do not know how to do this in httaccess. Can you please advise? Thanks a lot! Madlena
Technical SEO | | Madlena0