PDF web traffic hitting our site
-
Hi there,
Over the last few months our traffic has spiked due to irrelevant pdf documents sending us crap traffic, our bounce rate is sky high as well as other metrics. I don't want to just filter out this traffic in GA rather try and stop our site from being attacked.
Any advice on a way forward would be great.
Thanks
-
Based on this I don't think you have anything to worry about. It doesn't appear to be an attack, as you described in your original post. An actual attack on your website would have much higher volume. The worst this could possibly be is spam, which is mainly just annoying.
Easy solution: you don't want to filter out this traffic from GA because it may be useful at some point. So just create another view in GA, and name it "unfiltered". This view will have no filters and you can see all traffic in its raw glory. In your main view, name it something like "master" or "the one view to view them all" or whatever you want and set filters to remove that traffic from view.
Personally it looks more to me like these are old pdfs that other websites are linking to, which is what your hosting provider has also said. Your best move here is actually to setup redirects to relevant pages to recapture some of those links that are probably ending in 404s and get some link equity to important pages.
-
HI Alick, seems to be coming from an external source, I've included a screen grab for you too.
I've also discussed this with our hosting provider who gave the following response:
Thanks for the info from Webmaster Tools. That screenshot that shows the HTTP response is just showing that a request to http://www.icmp.co.uk/lulu-the-lioness-a-heroines-story.pdf throws a 301 redirect over to https://www.icmp.ac.uk/lulu-the-lioness-a-heroines-story.pdf — this runs because of the standard HTTPS/primary domain redirect code in settings.php and unfortunately doesn’t tell us much here.
I pulled down the database again and ran a search for a few of these filenames, and those came up empty. Looks like these don’t touch Drupal at all. When we saw them in the database before, in the sessions table, that was likely just because that filter module was storing browser history in user session data for some reason.
I did a little research here, and I think that leaves a few potential causes:
Another site is linking to these files (even though they don’t exist), and this is where Google is picking up/indexing the URLs from. This should be checkable in Google Analytics if you look at Referrals to those files.
These were listed on the sitemap at some point (but not any longer: https://www.icmp.ac.uk/sitemap.xml).
These files existed at some point in the past, but have since been deleted.
There was a DNS misconfiguration at some point, and that domain name was pointing to a different server where these files did exist.
While these are a little annoying to see in Analytics, from what I’ve read, 404s don’t negatively impact the site from an SEO standpoint, and there’s no evidence that the site itself is compromised at all, so unless we see evidence otherwise, I wouldn’t worry about these.
-
Hi,
Pdf trafic from your own site or other sites?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Direct traffic coming to URLs with /rss_feedIP#
I'm doing a site audit for an organization that has a bunch of really messy old Drupal sites. In looking at their traffic, I see that a majority of it is coming to landing pages that look like this: http://clientsdomain.com/rss_feed173.8.208.97 plus other IP addresses. The bounce rate is 100% and time on site is less than a second. It looks like something that an RSS feed tool might use, but I've never seen something like it before. It creates its own landing page, hits the site, then appears to bounce. This is making their Analytics data look a whole lot worse than the site is actually doing, since the bounce rate is 100% on all that fake traffic. I have some experience with Drupal, but I've never seen anything like this in Drupal or any other CMS. Has anyone out there ever experienced something like this, where direct traffic comes to an rss feed landing page and bounces immediately?
Reporting & Analytics | | newwhy0 -
Bot Traffic Higher Than Unfiltered?
We filtered bot traffic from one of our Google Analytics accounts and traffic is higher than with the unfiltered view. Does anyone have an idea what might be causing this?
Reporting & Analytics | | Leithmarketing0 -
Why my WMT is showing more clicks than my Google Analytics Organic Search Traffic?
Hello Everybody, Can somebody help me figure out this puzzle: My WMT is showing 6000 clicks, while my Organic Traffic in Google Analytics is showing only 3500 daily ... how is that possible? Best Regards
Reporting & Analytics | | Muhammad_Jabali0 -
How can you tell if Google has already assessed a penalty against your site for spammy links?
Is there any way to tell for sure if there is a penalty? My client has a ton of low quality back links, and I think they are in danger of a Penguin penalty. Any way to know? The links are there for a business reason.... their clients mention them in the footer, with a backlink. It is not a link scheme. but folks are generally not clicking on a footer link, and so there is a pro/con of leaving it as it. Any way, to diagnose whether a Penguin penalty has already hit?
Reporting & Analytics | | DianeDP2 -
Confirmation page gets hit multiple times by some users. How I can I segment out unique visits?
Hi All, I'm web marketing manager at http://www.evenues.com which is like an AirBnB for meeting space. When calculating the number of bookings for our meeting spaces, I've set up a goal in analytics with the confirmation page as the goal URL. The problem is, it seems that some users are looking at the same confirmation page several times. We have unique URLs for each confirmation page, but some users seem to be visiting these unique pages more than 2 to 5 times. This skews our numbers a bit. This makes things a bit problematic when it comes to segmenting visitors. is there anything we can so that each unique URL visited only counts once? Thanks, Kenji
Reporting & Analytics | | eVenuesSEO0 -
Magic UVs - PPC landing pages delivering organic traffic by magic...
I have checked and double checked this. GA is showing over the last couple of weeks mysite.com/ppc/landingpage1 as a landing page for organic traffic, where it shouldn't. Main facts: The entire /ppc/ folder is blocked from the googlebot, and doesn't appear on any internal site maps. As far as I can tell, these pages have never been cached for the main index. I cannot recreate any of the organic searches myself (i.e. typing in keywords that triggered the traffic, even the almost unique long-tail ones). We just don't appear in the organic listings with these pages. The analytics and adwords accounts are linked. We are not paying for this mystery traffic through our PPC - these keywords are not appearing in our AdWords account (though other keywords / traffic are). The traffic is real - we have received phone calls from these pages, tracked to the visits recorded as organic These pages should only receive PPC traffic. They are receiving organic traffic also, but I can't recreate it. Can anyone suggest what's going on? I'm concerned about duplicate content issues and also skewing the analysis of the PPC campaign. Thanks
Reporting & Analytics | | RobPell0 -
Duplicate Content From My Own Site?!
When I ran the SEO Moz report it says that I have a ton of duplicate content. The first one I looked at was my home page. http://www.kisswedding.com/ http://www.kisswedding.com/index.html http://kisswedding.com/index.html All of the above 3 have varying internal links, page authority, and link root domains. Only the first has any external links. All of the others only seem to have 1 other duplicate page. It's a difference between the www and the non-www version. I have a verified acct for www.kisswedding.com in google webmaster tools. The non-www version is in there too but has not been verified. Under settings for the verified account (www.kisswedding.com), "Don't set a preferred domain" is checked off. Is that my mistake. And if so, which should I select? The www version or the non-www version? Thanks!
Reporting & Analytics | | annasus0 -
Analytics, Traffic and Rankings. Something is wrong, can you answer it? ;-)
So I've been monitoring analytics to see where our clients are ranking for terms that have brought visitors to the site over the last month to find that the website isnt ranking in the top 100 for that keyword. What are your thoughts on this? Why do you think this could happen? One of the keywords has brough over 700 visitors in the last month yet is not in the top 100 for this term. I've also looked Google Webmaster Tools and have found that the exact same term hasn't had 700 impressions let alone 700 click throughs! Weird! Cheers, Sean
Reporting & Analytics | | 0111001101100100