Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can we analyze about duplication?
Howdy all, We have a few pages being hailed as copies by the google search comfort. Notwithstanding, we accept the substance on these pages is unmistakably extraordinary (for instance, they have totally unique list items returned, various headings and so on) An illustration of two pages google discover to be copies is underneath. in the event that anybody can spot what may be causing the copy issue here, would especially see the value in ideas! Much appreciated ahead of time.
Technical SEO | | camerpon090 -
We have 302 redirect links on our forum that point to individual posts. Should we add a rel="nofollow" to these links?
Moz is showing us that we have a HUGE amount of 302 redirects. These are coming from our community forum. Forum URL: https://www.foodbloggerpro.com/community/ Example thread URL: https://www.foodbloggerpro.com/community/viewthread/322/ Example URL that points to a specific reply: https://www.foodbloggerpro.com/community/viewreply/1582/ The above link 302 redirects to this URL: https://www.foodbloggerpro.com/community/viewthread/322/#1582 My two questions would be: Do you think we should we add rel=nofollow to the specific reply URLs? If possible, should we make those redirects 301 vs. 302? Screencast attached. nofollow_302.mp4
Technical SEO | | Bjork1 -
Paid links that are passing link equity from a blog?
We have a well-known blogger in our industry with whom we've had a long-standing relationship. We've had inbound links from his blog for many, many years. Today I noticed that we are running a banner ad listed on all pages of his blog under a heading that says "Sponsors." He has dedicated an entire page of his site giving full disclosure of all advertising. However, all of the links on his site pointing to us are passing link equity. To my knowledge they've been this way ever since they were first established years ago. I am fairly certain this fellow, with whom we have an excellent relationship, neither knows nor cares what a "nofollow" attribute is. I am afraid that if I contact him with a request that he add "nofollow" attributes to all of our links that it will damage our relationship by creating friction. To someone who knows nothing and cares nothing about SEO, asking them to put a "nofollow" on a link could either seem like a technical request they don't know how to handle, or something even potentially "shady" on our part. My question is this: Considering how long these links have been there, is this even worth worrying about? Should I just forget about it and move on to bigger fish, or, is this a potentially serious enough violation of Google Webmaster guidelines that we should pursue getting those links "nofollow" attributes added? I should add that we haven't received any "unnatural" link notifications from Google, ever, and haven't ever engaged in any questionable link-building tactics.
Technical SEO | | danatanseo1 -
Find all 404 links in my site that are indexed
Hi All, Find all 404 links in my site that are indexed. We deleted a lot of URl's from site but now i dont have the track of all we deleted. Any site/Tool can scan the index and give me the exact URL's so I can use https://www.google.com/webmasters/tools/removals?hl=en&rlf=all Regards Martin
Technical SEO | | mtthompsons0 -
How to use rel canonical?
Hi, I am having some questions about this and I think you can help me on this. Here I have the example of my problem: pagination: Suppose that I have a new with 2 pages http://www.espectador.com/noticias/208907/fernando-pereira-encuesta-de-cifra-prendio-una-lucecita-amarilla-en-el-pit-cnt you can access the first page by different ways: www.espectador.com/1v4_contenido.php?m=&id=250419&ipag=1 http://www.espectador.com/1v4_contenido.php?m=&id=250419 http://www.espectador.com/noticias/250419/alvaro-vega-fa-creo-que-cosmo-fue-usada-por-bqb-para-evitar-una-subasta-a-la-baja-y-asi-quedar-con-las-manos-libres Same meta descr, same body with different URLs. Can I use rel canonical in the file 1v4_contenido.php that point to the friendly url? <link rel="<a class="attribute-value">canonical</a>" href="[http://www.espectador.com/noticias/250419/alvaro-vega-fa-creo-que-cosmo-fue-usada-por-bqb-para-evitar-una-subasta-a-la-baja-y-asi-quedar-con-las-manos-libres](view-source:http://www.espectador.com/noticias/250419/alvaro-vega-fa-quotcreo-que-cosmo-fue-usada-por-bqb-para-evitar-una-subasta-a-la-bajaquot-y-asi-quotquedar-con-las-manos-libresquot)"/> do I have a loop here? The rel canonical can goes in the page 1? Thanks
Technical SEO | | informatica8100 -
Remove Links or 301
Howdy Guys, Our main site has been hit pretty hard by penguin and we are just wondering what steps we should now take. For the past 2 months we have been working through our back link profile removing spammy / un-natural links, we have documented everything in a spreadsheet... We recently submitted a reconsideration request to Google and they have now responded saying we still have bad links. I'm just wondering would be it easier just to 301 redirect our site to another TLD we have for our main site? Or Do we keep working through our links 1 by 1 and removing them? Has anyone had any success in 301ing? Thanks, Scott
Technical SEO | | ScottBaxterWW0 -
Different links to to the same page
Hi, Based on the user's actions we post activity into users Facebook timeline. And each activity has link back to our particular page on our website. For example if original page was: www.Domain.com from Facebook timeline it would be like this: www.Domain.com?Ffb_action_ids=101508953168 Do you think this will have a negative effect on our page rankings as we will eded up having a lot of different URL's to the same page? www.Domain.com?Ffb_action_ids=101508953168 www.Domain.com?Ffb_action_ids=456788765609 etc.. Thank you, Karen Bdoyan
Technical SEO | | showme0 -
No inbound links. Should I link-build or create new content?
I have a PR4 site with good traffic but the blog is not very popular--the posts do not generate any backlinks and hardly get any traffic. Yet, I continue to kick out a new post every week. Site: http://www.stadriemblems.com/
Technical SEO | | UnderRugSwept
Blog: http://www.stadriemblems.com/blog/ I keep posting content so that Google keeps crawling the site and viewing it as fresh (and yes, I'm posting for my human visitors' benefit too!), but I'm wondering if eventually this will hurt more than help if Google detects all these new pages are not being linked to, and therefore starts viewing the site as low quality and devalues it. So should I: Keep posting Stop posting and build links to the posts Try to promote my blog to get more traffic and hope people link to it Something else or some combination of the above0