If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
I'd recommend checking this out for LD-JSON Schema examples for location pages for local businesses:
https://developers.google.com/webmasters/business-location-pages/schema.org-examples
https://developers.google.com/webmasters/business-location-pages/
Check out branchOf schema and how Google recommends structuring your location pages for different store locations and departments.
Hope that helps!
Haha, I think the train passed the station on that one. I would have realised eventually... XD
Thanks for your help!
Yep, have done. (Briefly mentioned in my previous response.) Doesn't pass
I thought so too, but according to Google the trailing wildcard is completely unnecessary, and only needs to be used mid-URL.
Hi Andy,
Disallowing them would be my first priority really, before removing from index. Didn't want to remove them before I've blocked Google from crawling them in case they get added back again next time Google comes a-crawling, as has happened before when I've simply removed a URL here and there. Does that make sense or am I getting myself mixed up here?
My other hack of a solution would be to check the URL in the page.php, and if URL includes par1=ABC then insert noindex meta tag. (Not sure if that would work well or not...)
Hi Martijn, thanks for your response!
I'm currently looking at something like this...
**user-agent: *** #disallowing page.php and any parameters after it
disallow: /page.php #but leaving anything that starts with par1=ABC
allow: /page.php?par1=ABC
I would have thought that you could disallow things broadly like that and give an exception, as you can with files in disallowed folders. But it's not passing Google's robots.txt Tester.
One thing that's probably worth mentioning really is that there are only two variables that I want to allow of the par1 parameter. For example's sake, ABC123 and ABC456. So would need to be either a partial match or "this or that" kinda deal, disallowing everything else.
So I currently have approximately 1000 of these URLs indexed, when I only want roughly 100 of them.
Let's say the URL is www.example.com/page.php?par1=ABC123=&par2=DEF456=&par3=GHI789=
All the indexed URLs follow that same kinda format, but I only want to index the URLs that have a par1 of ABC (but that could be ABC123 or ABC456 or whatever). Using URL Parameters tool in Search Console, I can ask Googlebot to only crawl URLs with a specific value. But is there any way to get a partial match, using regex maybe?
Am I wasting my time with Search Console, and should I just disallow any page.php without par1=ABC in robots.txt?
I'd recommend linking to all your own properties using rel="me". You can see the tag in common usage on Twitter and Instagram profiles, where the user's website link is tagged using rel="me". You can basically connect up all your online properties as belonging to the same person/brand/entity - and who wouldn't want that. You're indicating to Google that all those webpages are related to you. By linking to your social profiles from your website using rel="me", you're confirming that those profiles are officially yours.