My automated build system is creating a duplicate website

RoxBrock

Because of the tools my company is using for CI/CD (A CI/CD pipeline helps you automate steps in your software delivery process, such as initiating code builds, running automated tests, and deploying to a staging or production environment.) an extra URL is generated. The canonical for the generated site is that of our main website, but other than that it is the same website.

Could this new URL compete with our website?
Will Google count it against us since it is the same content BUT with canonical (it is not noindex-ed)?
Does it matter?
Surely others are using this method?

Answers/thoughts will be greatly appreciated. Thank you.

BlueprintMarketing

Do you have any control over the CI/CD pipeline URL?

If you control the domain enough so that you can be one to have validated and searched console them by all means. But it does not seem like you have the ability to control domain?

my correct?

https://support.google.com/webmasters/answer/7440203?hl=en

If the domain is 3ed party domain then you must trust the third-party or if you control the domain of pages which links or third-party domain URLs are embedded on you can add noindex nofollow

https://www.deepcrawl.com/blog/best-practice/noindex-disallow-nofollow/

I hope that helps,

Tom

RoxBrock

Unfortunately, since URL is generated from the original site, I cannot change the robots.txt. It uses the same one as the main site. That would exclude adding a noindex meta tag, as well. Any other ideas?

Is there a way to add the duplicate URL to search console & tell google not to crawl?

Thank you.

BlueprintMarketing

I understand using CI cool

i agree get the bad content being made by CI blocked ASAP

“have an extra URL is generated. The canonical for the generated site is that of our main website, but other than that it is the same website.”

but it’s not the same content being made that will hurt you unless you’re pointing the canonicals to a similar page (get the automated content off your domain)

Remember to add using self pointing canonicals on the good pages you want to be indexed by Google or Search Engines

Hope this is of help,

Tom

Martijn_Scheijbeler

To answer your questions:

Technically it could compete with your current site as it's on its own domain, in reality, it's unlikely as you're canonicalizing the pages back to its original and making sure that the content itself through that way is attributed to your original site.
What I would recommend is excluding the CI/CD site from the engines, through a robots.txt or a similar technique. That way you're making sure that the staging site itself isn't being crawled at all. In the end, I'd say there's very little upside of having that be the case currently.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

My automated build system is creating a duplicate website

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Does Google need Analytics installed to create metrics?

How to create goal for Events?

Why might my websites crawl rate....explode?

Stripping referrer on website with a mix of both http and https

When will traffic data be working ? also whats with the spike in duplicate listing issues with everyone.

Duplicate content and ways to deal with it.

How to create an advanced segmentation for Google+?

Why is a section of our website dropping in&out of Google SERPs?