How to detect where Google gets indexed URL's
-
Google index some kind of way some links that create duplicate content. We doesn't understand how these are created so we would like detect where Google robots find these links.
We tried:
- Moz Crawl Diagnostics but it shows 0 as Internal Link Count for these kind of links.
- Find some information from Google Analytics, that maybe there is trace (site content - all content) from visitors side. There wan't.
- We tried to find some information in Webmaster Tools under Internal link and HTML Improvements but didn't find any trace.
- Tried some search commands. Is there maybe some good one to search.
- TO search URL's form code with https://search.nerdydata.com.
-
It really isn't possible for an outsider to know why your website is generating those URLs in error; you would have to talk to your developer about that.
As far as canonicals, if your problem is page.com is getting duplicated by added parameters: page.com/?id=1, page.com/?id=2, page.com/?id=3, etc. as long as you have the canonical on page.com, all of the parameter pages will have the correct canonical on them as well. (But you are right, you should track down the source; your developer will know.)
-
Thanks you for your answer but yes I know that these are generated by our site. But problem is that I can use canonical tag for these that are indexed right now but later new ones will be created someway. Problem root isn't that we doesn't know how to use canonical, it's how to get to know where these URL's are find/indexed/detected by Google.
These kind of URL's have been there for months so we can't just hope that somehow these will be droped. We need to find some kind of solution and detect real problem.
-
If you found those URLs by doing a site: search, then those parameters are being generated by your site. (I am surprised that Google is even indexing them; I assume that pretty soon all but one will be dropped.) Here is an article that explains more about those types of duplicate pages: http://moz.com/blog/which-page-is-canonical
You can fix this by using a canonical tag on your homepage with the version that doesn't have the parameter.
-
Our front page has almost 50 duplicate versions. These are shown when we do site:oursite.com, there are /et?id=xx, /et?productId=xx, etc. In URL xx are different numbers.
-
Where are you seeing these duplicate content links? Does Webmaster Tools say that they are duplicate content? Or does this show up in your Moz crawl? What do these URLs look like?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Strange - Search Console page indexing "../Detected" as 404
Anyone seen this lately? All of a sudden Google Search Console is insisting in Page indexing that there is a 404 for a page that has never existed on our client's site: https://........com.au/Detected We've noticed this across a number of sites, precisely in this way with a capitalised "/Detected" To me it looks like something spammy is being submitted to the SERPs (somehow) and Google is trying to index that and then getting a 404. Naturally MOZ isn't picking it up, cause the page simply never existed - it's just happening in Search Console 2afc7e35-71e4-4e25-80a3-690bf10776a7.png It comes and it goes in the 404 alerts in Console and is really annoying. I reckon it started happening late 2022.
Reporting & Analytics | | DanielDL0 -
Tool to check page size for multiple url's at once
In Google Analytics under Site Speed > Page Timings, you can see all pages and their loading time compared to the average. This is very handy to check which pages maybe need some optimization. I would also like to check the size for these pages in a similar way. There are multiple tools out there like GTmetix and Pingdom that give specific information and performance insights. The problem is that they are limited to check one url at a time. Does someone know about a tool to check the page size of multiple url’s at once (and if possible to easily export to Excel)? That way I can check which pages are big in size and research/optimize them. Thanks in advance
Reporting & Analytics | | Mark.0 -
How to get multiple pages to appear under main url in search - photo attached
How do you get a site to have an organized site map under the main url when it is searched as in the example photo? SIte-map.png
Reporting & Analytics | | marketingmediamanagement0 -
URL open with double domain names when click on visit URL link in Google Analytics
I have configured Advance Filter to track the sub-domains traffic as follow : Filter
Reporting & Analytics | | gamesecure
Type: Custom filter > Advanced Field A: Hostname Extract A: (.*) Field B: Request URI Extract B: (.*) Output To: Request URI Constructor: $A1$B1 After that, I am able to see sub-domains record and View Full Page URL In Reports. But when I check reports in All page (e.g. Behavior >> All Pages) or selecting Landing Page as a Primary Dimension. Further I click on Icon given next to displayed Full URL to visit to same domain page, in browser the page
opened but the double domain name comes so page not open successfully. For example : In landing page list following URL given : www.sitegeek.com/compareHosting/arvixe_vs_hostgator If I click on icon given next the displayed URL, in browser following URL will
open https://sitegeek.comwww.sitegeek.com/compareHosting/arvixe_vs_hostgator Is this First Domain with HTTPs, coming from Google Analytic 'View' where this is taken ? How Can I remove double domains? Thanks, Rajiv0 -
Getting google impressions for a site not in the index...
Hi all Wondering if i could pick the brains of those wise than myself... my client has an https website with tons of pages indexed and all ranking well, however somehow they managed to also set their server up so that non https versions of the pages were getting indexed and thus we had the same page indexed twice in the engine but on slightly different urls (it uses a cms so all the internal links are relative too). The non https is mainly used as a dev testing environment. Upon seeing this we did a google remove request in WMT, and added noindex in the robots and that saw the index pages drop over night. See image 1. However, the site still appears to getting return for a couple of 100 searches a day! The main site gets about 25,000 impressions so it's way down but i'm puzzled as to how a site which has been blocked can appear for that many searches and if we are still liable for duplicate content issues. Any thoughts are most welcome. Sorry, I am unable to share the site name i'm afraid. Client is very strict on this. Thanks, Carl image1.png
Reporting & Analytics | | carl_daedricdigital0 -
Google analytcics sub domain dot or not?
Buongiorno from 16 degrees C wetherby UK famous for the Wetherby Whaler Chippy 😉 OK... on this site http://www.philpotts.co.uk/ I've set up sub domain tracking as so:
Reporting & Analytics | | Nightwing
Parent site:
http://www.philpotts.co.uk/ Sub domain
http://shop.philpotts.co.uk/ So my question is please: should a dot be placed in the sub domain line as in : _gaq.push(['_setDomainName', '.philpotts.co.uk']); Some advice places a dot in setDomainName other advice doesnt 😞
Any insights welcome, Grazie,David0 -
Homepage disappeared from Google's index
The Title says it all..I just discovered that the Homepage is not in Google's index anymore. Homepage rankings have plummeted, our top keywords are nowhere to be found but most of keywords from deeper pages have dropped just one or two places. We just change the website design and some content but I strongly believe this is definitely something else due to the fact that it all happened so fast! There is one thing that I have to mention that might have been caused all this..our email client (Outlook) is using a lot of resources from our server and for the last couple of days the website was down quite a lot.
Reporting & Analytics | | echo1
There are some crawling errors in GWT but the homepage has been crawled because is not there, no other messages. Where should I look for?0 -
Google Analytics bug?
Hi, I'm getting hits to this page, it's screwing up my Google Analytics stats. I know it's related with Google Website Optimizer, cuz it correlates with the tests I ran at the same dates. I'm just wondering if you've seen this before and what I should do to clean my stats? And is this affecting my SEO? Because it's still registering hits to these pages but I already stopped the tests a while go. Should I disallow this in my robots.txt file? Thanks! ga_bug3.jpg
Reporting & Analytics | | rhenster990