Help with Roger finding phantom links
-
It Monday and Roger has done another crawl and now I have a couple of issues:
- I have two pages showing 404->302 or 500 because these links do not exist. I have to fix the 500 but the 404 is trapped correctly.
http://www.oznappies.com/nappies.faq & http://www.oznappies.com/store/value-packs/\
The issue is when I do a site scan there is no anchor text that contains these links. So, what I would like to find out is where is Roger finding them. I cannot see any where in the Crawl Report that tells me where the origin of these links is.
- I also created a blog on Tumblr and now every tag and rss feed entry is producing a duplicate content error in the crawl stats. I cannot see anywhere in Tumblr to fix this issue.
Any Ideas?
-
Thanks again Ryan, you have been very helpful answering al lot of my questions.
-
Someone else asked the same question regarding tag pages yesterday. I would suggest asking a separate Q&A on that topic.
Tag pages & forum category pages are both often used as containers. They don't have any content except links to articles. I would ask for feedback as to the best practice. I suspect noindex, following those pages would be best, but I don't have the experience to feel comfortable offering that advice.
-
I have been looking at the data that Roger is reporting for the duplicate content and in ALL cases there is either a 301 or a NoIndex. So now I do not know why Roger is reporting them as a duplicate, robots should not see the second entry.
-
I did not think of looking at the csv report. I see it now thanks Ryan. There should be a soft 404 handler in place to process the bad urls, I will have to see why it is not working.
With tumblr, I was looking for an easy way to add a blog to the site.
The RSS is coming from tumblr as is all the content.
When we specify Tags in tumblr it creates urls e.g. mypage.com/article/tag1 mypage.com/article/tag2 mypage.com/article/tag3 which all contain the content of mypage.com/article with out a canonical to the original. It is a really strange non-seo friendly approach, and so I wondered if anyone had similar problems.
-
The crawl report offers a "referrer" field. That field offers where Roger found the offending link. In my experience that field has always been accurate.
When I try to access www.oznappies.com/faq I receive a 302 redirect and a 500 error. I would recommend adjusting non-existant pages to a soft 404 page. Still provide a 404 response to browsers, but offer users a friendly way to find information (i.e. links / search) and stay on your site.
A great example of a soft 404 page is http://www.orangecoat.com/a-404-page.html
For the Tumblr issue, I am not clear on the problem. Are you writing content and publishing on both the oznappies.com site and your tumblr site? Then this content is being published again on your site via a RSS import?
-
I removed the links and just left the text so these will cut and paste now. It confuses me where Roger found the links.
Thanks for running the Xenu scan. I have tried other site scanner and come up blank.
-
That second link is anchored to the wrong place.
Regardless I also cannot find the .faq page. I just ran Xenu over it to see what it could find, but no broken links showed up.
Afraid I don't use Tumblr either, so eh, pretty useless post. Sorry.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Cleaning Up Bad 301 External Links From Old Site
A relatively new site I'm working on has been hit really hard by Panda, due to over optimization of 301 external links which include exact keyword phrases, from an old site. Prior to the Panda update, all of these 301 redirects worked like a charm, but now all of these 301's from the old url are killing the new site, because all the hyper-text links include exact keyword matches. A couple weeks ago, I took the old site completely down, and removed the htaccess file, removing the 301's and in effect breaking all of these bad links. Consequently, if one were to type this old url, you'd be directed to the domain registrar, and not redirected to the new site. My hope is to eliminate most of the bad links, that are mostly on spammy sites, that aren't worth linking to. My thought is these links would eventually disappear from G. My concern is that this might not work, because G won't re-index these links, because once they're indexed by G, they'll be there forever. My fear is causing me to conclude I should hedge my bets, and just disavow these sites using the disavow tool in WMT. IMO, the disavow tool is an action of last resort, because I don't want to call attention to myself, since this site doesn't have a manual penalty inflected on it. Any opinions or advise would be greatly appreciated.
Moz Pro | | alrockn0 -
Rel=canonical "redirects" to double links
Our devs have set up rel=canonical on our website. First they used relative links href="/dir1/dir2/dir3" for the page http://www.mysite.com/dir1/dir2/dir3/?detail1=1?detail2=2 meaning that it will redirect to http://www.mysite.com/dir1/dir2/dir3, but no luck, the MOZ dashboard showed the tag value to be http://www.mysite.com/dir1/dir2/dir3/dir1/dir2/dir3, then we have decided to rewrite the code, and now the canonical to http://wwwmysite.com/dir1/dir2/dir3/?detail1=1?detail2=2 looks like href="http://www.mysite.com/dir1/dir2/dir3/" but the tag on MOZ looks like http://www.mysite.com/dir1/dir2/dir3http://www.mysite.com/dir1/dir2/dir3. So what is the problem? I really got a problem or MOZ does? The code on website looks exactly like href="http://www.aaa.com/en/bbb/ccc/vvv/nnn/" rel="canonical" /> for the page http://www.aaa.com/en/bbb/ccc/vvv/nnn/
Moz Pro | | apartmentGin0 -
Why are inbound links not showing?
I run the site http://www.eurocheapo.com and am finding that many inbound links are not showing up in OSE and on the toolbar. For example, check out this hotel review: http://www.eurocheapo.com/paris/hotel/hotel-esmeralda.html In OSE it shows only 2 links (from 1 domain), which is crazy. It has dozens of inbound links from many different domains (links:http://www.eurocheapo.com/paris/hotel/hotel-esmeralda.html). I notice this all over my site. Pages that we link between are also showing no internal links -- which is easy to disprove. Was there a problem with this crawl? Or is the problem in our code? Many thanks for your help, Tom
Moz Pro | | TomNYC0 -
How do I see all links to my site in Web Explore?
It appears that the Web Explore looks for exact matches to a page. I'd like to see all matches to any page. Will regular expressions work?
Moz Pro | | Leverage_Marketing0 -
Back Link Verification Checker
Working on client issue. Need tool that will verify of the 10k links which are still live or not. Tried http://www.webconfs.com/reciprocal-link-checker.php but its not functional for the volume I dont believe. I basically have a 10k list of links and need to pull out any dead ones. Tips fast please?
Moz Pro | | bozzie3110 -
Why are we not ranking? We have done all we can think of. Can you help?
We have an old domain that we have had for years now. And recently. December, we completely redesigned the site, I mean a complete overhaul, design, architecture, asp to php for example. Resubmitted to google via webmaster with a new site map etc. Our website for reference is www.completeoffice.co.uk. We are trying to rank in UK google, www.google.co.uk, for the term office refurbishment kent (we are on page 2) and office refurbishment (we are on page 17). This is confusing me, because thats 7 pages lower than we were originally, before the site design, its better seo wise, duplicate content is none, title tage meta tags all correct, domain authority has doubled from 12 to 24 in a few months all in an effort to rank better. our homepage and office refurb pages both got an A on the on page SEOmoz tool test. So we dont understand why we are not ranking highly. It seems sites, much uglier and less seo friendly than our with same or lower domain authority are ranking at least 10 pages higher than us. Why is this? Can anyone help or give us a few suggestions. All help greatly appreciated. SEOmozzers.
Moz Pro | | CompleteOffice0 -
A suggestion to help with linkscape crawling and data processing
Since you guys are understandably struggling with crawling and processing the sheer number of URLs and links, I came up with this idea: In a similar way to how SETI@Home (is that still a thing? Google says yes: http://setiathome.ssl.berkeley.edu/) works, could SEOmoz use distributed computing amongst SEO moz users to help with the data processing? Would people be happy to offer up their idle processor time and (optionally) internet connections to get more accurate, broader data? Are there enough users of the data to make distributed computing worthwhile? Perhaps those who crunched the most data each month could receive moz points or a free month of Pro. I have submitted this as a suggestion here:
Moz Pro | | seanmccauley
http://seomoz.zendesk.com/entries/20458998-crowd-source-linkscape-data-processing-and-crawling-in-a-similar-way-to-seti-home1