Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Tools/Software that can crawl all image URLs in a site
-
Excluding Screaming Frog, what other tools/software to use in order to crawl all image URLs in a site? Because in Screaming Frog, they don't crawl image URLs which are not under the site domain.
Example of an image URL outside the client site:
http://cdn.shopify.com/images/this-is-just-a-sample.png
If the client is: http://www.example.com, Screaming Frog only crawls images under it like, http://www.example.com/images/this-is-just-a-sample.png
-
Oh I see, I think I looked on the wrong section, I was checking on the Images section instead of External. Thanks for your help!
-
Hi Jay
Actually ScreamingFrog does that perfectly. It depends how you have configured the tool.
I can successfully see all external images within the report. (see attached screenshot)Have you checked your spider configuration?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO advice on ecommerce url structure where categories contain "/c/"
Hi! We use Hybris as plattform and I would like input on which url to choose. We must keep "/c/" before the actual category. c stands for category. I.e. this current url format will be shortened and cleaned:
Technical SEO | | hampgunn
https://www.granngarden.se/Sortiment/Husdjur/Hund/Hundfoder-%26-Hundmat/c/hundfoder To either: a.
https://www.granngarden.se/husdjur/hund/hundfoder/c/hundfoder b.
https://www.granngarden.se/husdjur/hund/c/hundfoder (hundfoder means dogfood) The question is whether we should keep the duplicated category name (hundfoder) before the "/c/" or not. Will there be SEO disadvantages by removing the duplicate "hundfoder" before the "/c/"? I prefer the shorter version ofc, but do not want to jeopardize any SEO rankings or send confusing signals to search engines or customers due to the "/c/" breaking up the url breadcrumb. What do you guys say and prefer from the above alternatives? Thanks /Hampus0 -
Best Web-site Structure/ SEO Strategy for an online travel agency?
Dear Experts! I need your help with pointing me in the right direction. So far I have found scattered tips around the Internet but it's hard to make a full picture with all these bits and pieces of information without a professional advice. My primary goal is to understand how I should build my online travel agency web-site’s (https://qualistay.com) structure, so that I target my keywords on correct pages and do not create a duplicate content. In my particular case I have very similar properties in similar locations in Tenerife. Many of them are located in the same villa or apartment complex, thus, it is very hard to come up with the unique description for each of them. Not speaking of amenities and pricing blocks, which are standard and almost identical (I don’t know if Google sees it as a duplicate content). From what I have read so far, it’s better to target archive pages rather than every single property. At the moment my archive pages are: all properties (includes all property types and locations), a page for each location (includes all property types). Does it make sense adding archive pages by property type in addition OR in stead of the location ones if I, for instance, target separate keywords like 'villas costa adeje' and 'apartments costa adeje'? At the moment, the title of the respective archive page "Properties to rent in costa adeje: villas, apartments" in principle targets both keywords... Does using the same keyword in a single property listing cannibalize archive page ranking it is linking back to? Or not, unless Google specifically identifies this as a duplicate content, which one can see in Google Search Console under HTML Improvements and/or archive page has more incoming links than a single property? If targeting only archive pages, how should I optimize them in such a way that they stay user-friendly. I have created (though, not yet fully optimized) descriptions for each archive page just below the main header. But I have them partially hidden (collapsible) using a JS in order to keep visitors’ focus on the properties. I know that Google does not rank hidden content high, at least at the moment, but since there is a new algorithm Mobile First coming up in the near future, they promise not to punish mobile sites for a collapsible content and will use mobile version to rate desktop one. Does this mean I should not worry about hidden content anymore or should I move the descirption to the bottom of the page and make it fully visible? Your feedback will be highly appreciated! Thank you! Dmitry
Technical SEO | | qualistay1 -
Bulk URL Removal in Webmaster Tools
One of Wordpress sites was hacked (for about 10 hours), and Google picked up 4000+ urls in the index. The site is fixed, but I'm stuck with all those urls in the index. All the urls of of the form: walkerorthodontics.com/index.php?online-payday-cash-loan.htmloncewe The only bulk removal option I could find was to remove an entire folder, but I can't do that, as it would only leave the homepage and kill off everything else. For some crazy reason, the removal tools doesn't support wildcards, so that obvious solution is right out. So, how do it get rid of 4000 results? And no, waiting around for them to 404 out of the index isn't an option.
Technical SEO | | MichaelGregory0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
How to Remove /feed URLs from Google's Index
Hey everyone, I have an issue with RSS /feed URLs being indexed by Google for some of our Wordpress sites. Have a look at this Google query, and click to show omitted search results. You'll see we have 500+ /feed URLs indexed by Google, for our many category pages/etc. Here is one of the example URLs: http://www.howdesign.com/design-creativity/fonts-typography/letterforms/attachment/gilhelveticatrade/feed/. Based on this content/code of the XML page, it looks like Wordpress is generating these: <generator>http://wordpress.org/?v=3.5.2</generator> Any idea how to get them out of Google's index without 301 redirecting them? We need the Wordpress-generated RSS feeds to work for various uses. My first two thoughts are trying to work with our Development team to see if we can get a "noindex" meta robots tag on the pages, by they are dynamically-generated pages...so I'm not sure if that will be possible. Or, perhaps we can add a "feed" paramater to GWT "URL Parameters" section...but I don't want to limit Google from crawling these again...I figure I need Google to crawl them and see some code that says to get the pages out of their index...and THEN not crawl the pages anymore. I don't think the "Remove URL" feature in GWT will work, since that tool only removes URLs from the search results, not the actual Google index. FWIW, this site is using the Yoast plugin. We set every page type to "noindex" except for the homepage, Posts, Pages and Categories. We have other sites on Yoast that do not have any /feed URLs indexed by Google at all. Side note, the /robots.txt file was previously blocking crawling of the /feed URLs on this site, which is why you'll see that note in the Google SERPs when you click on the query link given in the first paragraph.
Technical SEO | | M_D_Golden_Peak0 -
Do we need to manually submit a sitemap every time, or can we host it on our site as /sitemap and Google will see & crawl it?
I realized we don't have a sitemap in place, so we're going to get one built. Once we do, I'll submit it manually to Google via Webmaster tools. However, we have a very dynamic site with content constantly being added. Will I need to keep manually re-submitting the sitemap to Google? Or could we have the continually updating sitemap live on our site at /sitemap and the crawlers will just pick it up from there? I noticed this is what SEOmoz does at http://www.seomoz.org/sitemap.
Technical SEO | | askotzko0 -
Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs.
Technical SEO | | surveygizmo0 -
How long will Google take to stop crawling an old URL once it has been 301 redirected
I need to do a clean-up old urls that have been redirected in sitemap and was wondering about this.
Technical SEO | | Ant-8080