Welcome to the Q&A Forum

Ria_

If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.

This is how I tend to locate those one-liners among hundreds of files.

Good luck!

Ria_

I'd recommend checking this out for LD-JSON Schema examples for location pages for local businesses:

https://developers.google.com/webmasters/business-location-pages/schema.org-examples

https://developers.google.com/webmasters/business-location-pages/

Check out branchOf schema and how Google recommends structuring your location pages for different store locations and departments.

Hope that helps!

Ria_

Thank you!

Ria_

Haha, I think the train passed the station on that one. I would have realised eventually... XD

Thanks for your help!

Ria_

Yep, have done. (Briefly mentioned in my previous response.) Doesn't pass

Ria_

I thought so too, but according to Google the trailing wildcard is completely unnecessary, and only needs to be used mid-URL.

Ria_

Hi Andy,

Disallowing them would be my first priority really, before removing from index. Didn't want to remove them before I've blocked Google from crawling them in case they get added back again next time Google comes a-crawling, as has happened before when I've simply removed a URL here and there. Does that make sense or am I getting myself mixed up here?

My other hack of a solution would be to check the URL in the page.php, and if URL includes par1=ABC then insert noindex meta tag. (Not sure if that would work well or not...)

Ria_

Hi Martijn, thanks for your response!

I'm currently looking at something like this...

**user-agent: *** #disallowing page.php and any parameters after it
disallow: /page.php #but leaving anything that starts with par1=ABC
allow: /page.php?par1=ABC

I would have thought that you could disallow things broadly like that and give an exception, as you can with files in disallowed folders. But it's not passing Google's robots.txt Tester.

One thing that's probably worth mentioning really is that there are only two variables that I want to allow of the par1 parameter. For example's sake, ABC123 and ABC456. So would need to be either a partial match or "this or that" kinda deal, disallowing everything else.

Ria_

So I currently have approximately 1000 of these URLs indexed, when I only want roughly 100 of them.

Let's say the URL is www.example.com/page.php?par1=ABC123=&par2=DEF456=&par3=GHI789=

All the indexed URLs follow that same kinda format, but I only want to index the URLs that have a par1 of ABC (but that could be ABC123 or ABC456 or whatever). Using URL Parameters tool in Search Console, I can ask Googlebot to only crawl URLs with a specific value. But is there any way to get a partial match, using regex maybe?

Am I wasting my time with Search Console, and should I just disallow any page.php without par1=ABC in robots.txt?

Ria_

I'd recommend linking to all your own properties using rel="me". You can see the tag in common usage on Twitter and Instagram profiles, where the user's website link is tagged using rel="me". You can basically connect up all your online properties as belonging to the same person/brand/entity - and who wouldn't want that. You're indicating to Google that all those webpages are related to you. By linking to your social profiles from your website using rel="me", you're confirming that those profiles are officially yours.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Ria_

@Ria_

Posts made by Ria_

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved