Removing URLs in bulk when directory exclusion isn't an option?
-
I had a bunch of URLs on my site that followed the form:
http://www.example.com/abcdefg?q=&site_id=0000000048zfkf&l=
There were several million pages, each associated with a different site_id. They weren't very useful, so we've removed them entirely and now return a 404.The problem is, they're still stuck in Google's index. I'd like to remove them manually, but how? There's no proper directory (i.e. /abcdefg/) to remove, since there's no trailing /, and removing them one by one isn't an option. Is there any other way to approach the problem or specify URLs in bulk?
Any insights are much appreciated.
Kurus
-
I'd go into Google Webmaster Tools and their parameter settings and tell them to ignore this parameter.
I would need to look up the exact syntax, but Google does accept some dynamic exclusions and parameters in robots.txt, and you may be able to put that into robots and then use the URL removal tools.
-
There are no links to these pages, so no juice. There are also no 'new' replacement pages. We just want them out of the index ASAP by any means necessary.
-
You should have 301 your most important pages to the new urls, so that you would keep your juice.
-
Thanks, but the goal is to expedite the removal process via the URL removal tool. We've already 404'd the pages, so they'll be removed from the index. It's a question of timing, since the pages in question are low quality and hurting us in the context of Panda.
-
try 301 redirect for most important links. http://www.seomoz.org/learn-seo/redirection
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should you 'noindex' Checkout Pages?
Today I was reviewing my Moz analytics and suddenly noticed 1,000 issues with pages without a meta description. I reviewed the list and learned it is 1,000 checkout pages. That's because my website has thousands of agency pages from which you can buy a product, and it reflects that difference on each version of the checkout. So, I was thinking about no-indexing (but continuing to 'follow') these checkout pages, but wondering if it has any knock-on effects I may be unaware of? Any assistance is much appreciated. Luke
Intermediate & Advanced SEO | | Luke_Proctor0 -
How can a recruitment company get 'credit' from Google when syndicating job posts?
I'm working on an SEO strategy for a recruitment agency. Like many recruitment agencies, they write tons of great unique content each month and as agencies do, they post the job descriptions to job websites as well as their own. These job websites won't generally allow any linking back to the agency website from the post. What can we do to make Google realise that the originator of the post is the recruitment agency and they deserve the 'credit' for the content? The recruitment agency has a low domain authority and so we've very much at the start of the process. It would be a damn shamn if they produced so much great unique content but couldn't get Google to recognise it. Google's advice says: "Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article. You can also ask those who use your syndicated material to use the noindex meta tag to prevent search engines from indexing their version of the content." - But none of that can happen. Those big job websites just won't do it. A previous post here didn't get a sufficient answer. I'm starting to think there isn't an answer, other than having more authority than the websites we're syndicating to. Which isn't going to happen any time soon! Any thoughts?
Intermediate & Advanced SEO | | Mark_Reynolds0 -
HTML5: Changing 'section' content to be 'main' for better SEO relevance?
We received an HTML5 recommendation that we should change onpage text copy contained in 'section" to be listed in 'main' instead, because this is supposedly better for SEO. We're questioning the need to ask developers spend time on this purely for a perceived SEO benefit. Sure, maybe content in 'footer' may be seen as less relevant, but calling out 'section' as having less relevance than 'main'? Yes, it's true that engines evaluate where onpage content is located, but this level of granular focus seems unnecessary. That being said, more than happy to be corrected if there is actually a benefit. On a side note, 'main' isn't supported by older versions of IE and could cause browser incompatibilities (http://caniuse.com/#feat=html5semantic). Would love to hear others' feedback about this - thanks! 🙂
Intermediate & Advanced SEO | | mirabile0 -
Mixing static.htm urls and dynamic urls on a Windows IIS Server?
Hi all, We've had a website originally built using static html with .htm extensions ranking well in Google hence we want to keep those pages/urls. We are on a dedicated sever (Windows IIS). However our developer has custom made a new DYNAMIC section for the site which shows new added products dynamically and allows them to be booked online via shopping cart. We are having problems displaying them both on the same domain even if we put the dynamic section withing its own subfolder and keep the static htms in the root. Is it possible to have both function on IIS (even if they may have to function a little separately)? Does anyone have previous experience of this kind of issue or a way of making both work? What setup do we need to do on the dedicated server.
Intermediate & Advanced SEO | | emerald0 -
Short Url vs Medium Urls ?
Hello Moooooooooooz ! I got a SEO fight today and though the best would be to involve more people into the fight ! 😛 Do you think it's better to get A- company.com/services/service1.html or B- company/service1.html I was for A as services is also googled to find the service1. I also think that it's better to help google to understand where the service is on the website My friend was for B as URL has to stay as short as possible What do you think ? ps: I can create the URL I want using Joomla and Sh404. The websites has 4 different categoies: /about, /services/ products, /projects Tks ! 🙂
Intermediate & Advanced SEO | | AymanH0 -
Is this all that is needed for a 'canonical' tag?
Hello, I have a Joomla site. I have put in a plugin to make the page source show: eg. <link href="[http://www.ditalia.com.au/designer-fabrics-designer-fabric-italian-material-and-french-lace](view-source:http://www.ditalia.com.au/designer-fabrics-designer-fabric-italian-material-and-french-lace)" rel="<a class="attribute-value">canonical</a>" /> Is this all that is need to tell the search engines to ignore the any other links or indexed pages with a url which is created automatically by the system before the SEF urls are initiated?
Intermediate & Advanced SEO | | infinart0 -
Have you ever seen this 404 error: 'www.mysite.com/Cached' in GWT?
Google webmaster tools just started showing some strange pages under "not found" crawl errors. www.mysite.com/Cached www.mysite.com/item-na... <--- with the three dots, INSTEAD of www.mysite.com/item-name/ I have just 301'd them for now, but is this a sign of a technical issue? The site is php/sql and I'm doing the URL rewrites/301s etc in .htaccess. Thanks! -Dan EDIT: Also, wanted to add, there is no 'linked to' page.
Intermediate & Advanced SEO | | evolvingSEO0 -
Is 404'ing a page enough to remove it from Google's index?
We set some pages to 404 status about 7 months ago, but they are still showing in Google's index (as 404's). Is there anything else I need to do to remove these?
Intermediate & Advanced SEO | | nicole.healthline0