Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you use no-index to counter duplicate content across separate domains?
Hi Moz Community, I have a client who is splitting out a sub brand from a company website to its own domain. They have lots of content around the theme and they want to migrate most of the content out to the new domain, but they also wanted to keep that content on the main site as the main site gets lots of traffic. My question is, as they want search traffic to go to the new site, but want to keep the best content on the original site too, so it can be found in the nav, if they no-index identical content on main site and index content on the new site will they still be penalised for duplicate content? Our advice has been to keep the thematic content on both sites but make them different enough so they are not considered duplicate - we routinely write the same blog post in 50 different ways for them but their Head of Web asked if the no-index is a route, which means they don't need to pay for and wait for brand new content? They are comfortable in losing traffic until the new domain gets traction. In theory, if they are telling Google not to index or rank the main site content, the new site shouldn't be penalised but I'm not confident giving that advice as I've never been asked to do this before. Thoughts?
Technical SEO | | Algorhythm_jT0 -
how to find broken outbound links ?
hey guy, can anyone help me in finding broken outbound link on my website by using moz ? does Moz has this function ?
Technical SEO | | rogerdam0 -
Use keywords that has another keyword in it for another link
Hi, I have these two links: A1 & A2 and the keywords for them are these: Pest control for A1 Pest control service for A2 is google smart enough to differentiate these two & rank the exact page for them accordingly? or does google guess pest control keyword in A2 link as well? please help me with this issue. & the same with these : termite inspection & termite inspections Arizona!! many thanks Shervin
Technical SEO | | Shervin0 -
Penalised due to links?
Hi, Is there a way to tell if a site has been penalised for it's links? Our site dropped last Friday, and we would like to rule out links, as we plan to move the site to our main site and re-direct the links, unless Google would punish the new url due to this. Our old site does not show any warnings for the link, and neither does our Google WM account, the only thing we have to go by is a big drop in SERP. Many thanks. Quime.
Technical SEO | | Quime0 -
Use of Multiple Tags
Hi, I have been monitoring some of the authority sites and I noticed something with one of them. This high authority site suddenly started using multiple tags for each post. And I mean, loads of tags, not just three of four. I see that each post comes with at least 10-20 tags. And these tags don't always make sense either. Let's say there is a video for "Bourne Legacy", they list tags like bourne, bourney legacy, bourne series, bourne videos, videos, crime movies, movies, crime etc. They don't even seem to care about duplicate content issues. Let's say the movie is named The Dragon, they would inclue dragon and the-dragon in tags list and despite those two category pages(/dragon and /the-dragon) being exactly the same now, they still wouldn't mind listing both the tags underneath the article. And no they don't use canonical tag. (there isn't even a canonical meta on any page of that site) So I am curious. Do they just know they have a very high DA, they don't need to worry about duplicate content issues? or; I am missing something here? Maybe the extra tags are doing more good than harm?
Technical SEO | | Gamer070 -
Too many on page links
Hi All, As we all know, having to much links on a page is an obstacle for search engine crawlers in terms of the crawl allowance. My category pages are labeled as pages with to many "one page" links by the SEOmoz crawler. This probably comes from the fact that each product on the category page has multiple links (on the image and model number). Now my question is, would it help to setup a text-link with a clickable area as big as the product area? This means every product gets just one link. Would this help get the crawlers deeper in these pages and distribute the link-juice better? Or is Google smart enough already to figure out that two links to the same product page shouldn't be counted as two? Thanks for your replies guys. Rich
Technical SEO | | Horlogeboetiek0 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | | SoftzSolutions0 -
Nofollow internal links
Hi, we have problems with having too many links on page. Our website has a menu with 3 level sub-navigation drop down for categories which we want to maintain, for easy-navigation for the users. http://www.redwrappings.com.au/ After reading this article: http://www.seomoz.org/blog/questions-answers-with-googles-spam-guru, and some other articles, we came up with a solution. We can easily reduce the number of links per page by putting 'nofollow' on our categories links menu dropdown and create a separate 'landing page' that contains links to these categories (and allow 'follow' links for robots). Is it wise to do this? Or any better, easy solution that you can suggest? Thanks
Technical SEO | | Essentia1