Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to find orphan pages
Hi all, I've been checking these forums for an answer on how to find orphaned pages on my site and I can see a lot of people are saying that I should cross check the my XML sitemap against a Screaming Frog crawl of my site. However, the sitemap is created using Screaming Frog in the first place... (I'm sure this is the case for a lot of people too). Are there any other ways to get a full list of orphaned pages? I assume it would be a developer request but where can I ask them to look / extract? Thanks!
Technical SEO | | KJH-HAC1 -
I have 2 E-commerce sites - Can i cross link?
Good Morning Everyone, I have 2 e-commerce websites that are similar and sell the same products. The content (text/descriptions/titles) is different so the content is not duplicate. SITE A has a ton of blog posts with highly relevant information and we frequently update the blog with posts about the types of products we carry and how it can help people in their daily lives... SITE B has no blog posts, but the content on the blog from SITE A is extremely relevant and helpful to anyone using SITE B. My question is, do you think it is frowned upon if i were to add links on SITE B that point to specific posts on SITE A... For example, if you are browsing a category page on SITE B, i was thinking of adding links on the bottom that would say "For More Information, Please Check Out These Posts on our Blog" www.sitea.com/blog/relevantinfo1 www.sitea.com/blog/relevantinfo2 www.sitea.com/blog/relevantinfo3 I think this would seriously help our browsers and potential customers get all of the information that they need, but what do you think Google would think about this cross-linking and if it violates their guidelines? Thanks for any opinions and advice.
Technical SEO | | Prime850 -
Are bad links the reason for not ranking?
Hello Moz community. I'm looking here for some input from the experts on what could be wrong with a site I'm working on. The site is in Spanish, but I'm sure you'll get the idea. We want to rank the site first page on Google Mexico (www.google.com.mx) for the keyword "refacciones Audi" and some other brands (refacciones = replacement parts would probably be a good translation, just FYI). Now, our page hasn't been completely optimized, so in my mind it's OK not to be on first page yet. However, our main competitor is ranking first page for all the keywords we want to rank for, but when you check their site, you'll find there is hardly any content, no keywords are being used in their content, all pages have the exact same title and meta description, their catalog is in a completely different domain. In short, no SEO whatsoever. Looking at Moz data, our site has a DA of 26, while our competitor's has a 10. They have no external backlinks at all, while we have a few hundred. This leaves me scratching my head: how can a completely non-optimized site outrank us? I decided to check our backlink profile, and a previous SEO agency seems to have built MANY fake blogs with lots of backlinks with rich anchor text. Quite a big percentage of our backlinks are of this kind, so this is the only thing I can think can be affecting our ranking. Will disavowing be our solution? If you'd like to check, our site is: www.refaccionariaalemana.com.mx Our competitors' is: www.saferefacciones.com ANY help will be extremely appreciated as I feel a bit lost. Thanks!
Technical SEO | | EduardoRuiz1 -
Keyword links in footer
Hi - I am trying to help a site to get out from under a Google manual action penalty - down as "Partial Matches - Unnatural Links to site".
Technical SEO | | StevieD
I am checking through their links - the site that links most to them is a local directory style site - it has 2,682 links back into 1 page (Home) The directory site was built by the web co. that built my clients' site and they put a keyword link in the footer of the directory site - the keyword was "Buy Truffles". All my instincts say that is a bad thing! But - this is what is perplexing me - they are ranking no.1 for that keyword! Whereas they have lost rankings (i.e. not top 50) for all the other keywords they were targeting. So I don't get it! Can anyone explain why this is. I feel I should I get that link removed but don't want to take out their only ranking keyword! Webmaster shows about 55 different pages in the directory site have a link back to my client. Hope you can help.
Cheers - Steve0 -
Removing links - Best practice
Hi I have noticed on webmaster that I have a lot of links to my sites from link building directories. Either I did this many years a go or somehow they've linked to me. Would links to link building directories harm my site? i.e linkspurt.com pingerati.net I have quite a few and just wondering what to do with them. Also I have some customer sites which are massive one site has 38,000 links coming to my site as I have put a credit that I built the site with a link back to mine. It has a low score in Google would this also harm my site? Any advise would be appreciated.
Technical SEO | | Cocoonfxmedia0 -
How is Google finding our preview subdomains?
I've noticed that Google is able to find, crawl and index preview subdomains we set up for new client sites (e.g. clientpreview.example.com). I know now to use "meta name="robots" and robots.txt) to block the search engines from crawling these subdomains. My question though, is how is Google finding these subdomains? We don't link to these preview domains from anywhere else, so I can't figure out how Google is even getting there. Does anybody have any insight on this?
Technical SEO | | ZeeCreative0 -
Internal Linking
Where is the best information on internal linking. I'm so confused and everything I read says something different. Ahhhh Thanks
Technical SEO | | meardna770 -
Should I 301 redirect my country specific sites, or use them as linking root domains?
I have loveelectronics.co.uk, but I also own 10 other country code specific domains. I am short on links (i'm actually still setting up the website) and wondered that until i have country specific content, should I 301 redirect these websites to the homepage of my main site, or could I use them as links which would mean I have more linking root domains? Sorry if this is a beginner question, but it would be good to know so I can sort this.
Technical SEO | | jcarter0