Site crawl warning - concatenated urls from Wordpress
-
I could use some help on how to fix this. I asked at the walkthrough but was told it was a Wordpress issue but so far I can't find anything to point me in the right direction. There are no errors in the files on server side and I have asked my hosting company too. I am hoping someone here may be able to shed some light on it.
One of my websites it giving 404 errors on links that are formed as below and there are over 12.7K of them!
Example: <mydomainurl>/www.instagram.com/www.instagram.com/<instagram username=""></instagram></mydomainurl>
The link that relates to my website is valid and working, but I don't understand the rest. I am totally stumped on how to move forward with this.
Any advice, suggestions, tips on how to fix these errors and stop these types of links getting generated.
Thanks.
-
You're a star Jo! Thanks so much.
Was such a simple fix. The site has been sitting there and I need to get it going again.
Just required the https to be added on the theme. Never complained it was missing.
Recrawling now so hopefully that will sort out the issues with Site Crawler, class tool! I never would have spotted it without it.
Have a great weekend.
Emer
-
Hi Emercarr.
Thanks for reaching out, Jo here from the Moz help team.
I had a look at your Campaign and your site and it looks like there is a link in your social panel that is creating this issue.
https://screencast.com/t/EJHCvTyFj
If you hover over the Instagram button you'll see the url in this format show up as a preview at the bottom of your browser:
<mydomainurl>/www.instagram.com/www.instagram.com/<instagram username=""></instagram></mydomainurl>
To check if this is the cause I would recommend removing the instagram link temporarily, or checking and updating the link format, and then prompting a recrawl of your site.
Please do feel free to reach out to help@moz.com if you get stuck :]
Cheers!
Jo
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl tests stuck in queue
I have tried to run a number of crawl tests recently for our client's sites outside the US and they have been stuck in the queue for over a week. 3 of them completed, but then 5 are stuck. Anyone experience this? I haven't seen anything about crawl tests having issues right now.
Moz Bar | | rmcgrath810 -
How to turn off automated site crawls
Hi there, Is there a way to turn off the automated site crawl feature for an individual campaign? Thanks
Moz Bar | | SEONOW1230 -
Is there a way to export all your crawl errors for multiple Moz campaigns at once?
We're looking for a simple way to export all crawl errors for our Moz campaigns. More than likely we could use the API, but was wondering if there was any functionality already built into Moz for exporting all crawl errors.
Moz Bar | | ReunionMarketing0 -
Not sure where this url has come from
can anyone please let me know why this has happened on my site. I have just done a crawl test and it comes back with the following <colgroup><col width="576"></colgroup>
Moz Bar | | in2townpublicrelations
| http://howtodrinkless.com/web/20150709201150/http:/www.howtodrinkless.com/ |0 -
How can I find duplicate pages from a Moz Crawl?
We have many duplicate pages that show up on the Moz Crawl, and we're trying to fix these but it's very difficult because I can't see a way to isolate the code where the duplicate is found. For instance, http://experiencemission.org/immersion/ is one of our main pages, and the crawl shows one duplicate of http://experiencemission.org/immersion. It appears that one of our staff manually edited the source code in one of our pages but forgot the trailing slash. This would be an easy fix but the problem is that this page is linked to internally on our website 2423 times, so it's next to impossible to find the code that is incorrect. We have many other pages with this same basic problem. We know we have duplicates, but it's next to impossible to isolate them. So my question is this: When viewing the Moz Crawl data is there any way to see where a specific duplicate page link is located on our website? Thanks for any and all help!
Moz Bar | | expmission0 -
Moz Crawl Test Trying to Crawl Contact Form Submit Button Location?
Moz Crawl Test for some reason is trying to Crawl a contact form Widget Submit Location. My obvious guess is that obviously the crawl cannot submit to the required fields…..I believe this because they're only kicking back these errors on the pages I have a contact form widget on. http://crawfordspest.com/pest-control/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404
Moz Bar | | Funk-Creative-Media
http://crawfordspest.com/tree-services/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404
http://crawfordspest.com/lawn-care/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404
http://crawfordspest.com/specialty-services/crawfords@crawfordspest.com 1412553693 404 : Received 404 (Not Found) error response for page. Error attempting to request page; see title for details. 404 Can you shed any insight to this? I'm a bit worried that I'll have to complete gut the contact form which was one of the major requests my client requested. Or in a worse scenario make all fields not required. It would let so much spam in. I have never seem anything like this at all. But I've learned a lot from Moz, and with major errors like 404 damage Domain Authority greatly. I've fixed 404 issues with newly acquired clients existing sites and tracked through Moz and the domain authority flies up once these errors are fixed. Along with fixing what Webmaster Tools through Google reports back. ..... Let me know if you have any expertise on this matter.0 -
Ajax #! URL support?
Hi Moz, My site is currently following the convention outlined here: https://support.google.com/webmasters/answer/174992?hl=en Basically since pages are generated via Ajax we are setup to direct bots that replace the #! in a url with ?escaped_fragment to cached versions of the ajax generated content. For example, if the bot sees this url: http://www.discoverymap.com/#!/California/Map-of-Carmel/73 it will replace it will instead access the page: http://www.discoverymap.com/?escaped_fragment=/California/Map-of-Carmel/73 In which case my server serves the cached html instead of the live page. This is all per Googles direction and is indexing fine. However the MOZ bot does not do this. It seems like a fairly straight-forward feature to support. Rather than ignoring the hash, you look to see if it is a #! and then try to spider the url replaced with ?escaped_fragment. Our server does the rest. If this is something MOZ plans on supporting in the future I would love to know. If there is other information that would be great. Also, pushstate is not practical for everyone due to limited browser support, etc. Thanks, Dustin Updates: I am editing my question because it won't let me respond to my own question. It says I need to sign up for MOZ analytics. I was signed up for Moz Analytics?! Now I am not? I responded to my invitation weeks ago? Anyway, you are misunderstanding how this process works. There is no site-map involved. The bot reads this URL on the page: http://www.discoverymap.com/#!/California/Map-of-Carmel/73 And when it is ready to spider the page for content it, it spider's this URL instead: http://www.discoverymap.com/?escaped_fragment=/California/Map-of-Carmel/73 The server does the rest, it is simply telling Roger to recognize the #! format and replace it with ?escaped_fragment Though I obviously do not know how Roger is coded but it is a simple string replacement. Thanks.
Moz Bar | | oneactlife0 -
Dupe content report showing in 'Errors' section when surely should be in 'Warnings' section ?
Why is the dupe content info showing in errors and not warnings ? Since if dupe content can get your site penalised (as per Panda) or worse banned, surely it should be in that section of reports ? Cheers
Moz Bar | | Dan-Lawrence
Dan0