Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Disallow: /jobs/? is this stopping the SERPs from indexing job posts
-
Hi,
I was wondering what this would be used for as it's in the Robots.exe of a recruitment agency website that posts jobs. Should it be removed?Disallow: /jobs/?
Disallow: /jobs/page/*/Thanks in advance.
James -
Hi James,
So far as I can see you have the following architecture:
- job posting: https://www.pkeducation.co.uk/job/post-name/
- jobs listing page: https://www.pkeducation.co.uk/jobs/
Since from the robots.txt the listing page pagination is blocked, the crawler can access only the first 15 job postings are available to crawl via a normal crawl.
I would say, you should remove the blocking from the robots.txt and focus on implementing a correct pagination. *which method you choose is your decision, but allow the crawler to access all of your job posts. Check https://yoast.com/pagination-seo-best-practices/
Another thing I would change is to make the job post title an anchor text for the job posting. (every single job is linked with "Find out more").
Also if possible, create a separate sitemap.xml for your job posts and submit it in Search Console, this way you can keep track of any anomaly with indexation.
Last, and not least, focus on the quality of your content (just as Matt proposed in the first answer).
Good luck!
-
Hi Istvan,
Sorry I've been away for a while. Thanks for all of your advice guys.
Here is the url if that helps?
https://www.pkeducation.co.uk/jobs/
Cheers,
James
-
The idea is (which we both highlighted), that blocking your listing page from robots.txt is wrong, for pagination you have several methods to deal with (how you deal with it, it really depends on the technical possibilities that you have on the project).
Regarding James' original question, my feeling is, that he is somehow blocking their posting pages. Cutting the access to these pages makes it really hard for Google, or any other search engine to index it. But without a URL in front of us, we cannot really answer his question, we can only create theories that he can test
-
Ah yes when it's pointed out like that, it's a conflicting signal isn't It. Makes sense in theory, but if you're setting it to noindex and then passing that on via a canonical it's probably not the best is it.
They're was link out in that thread to a discussion of people who still do that with success, but after reading that I would just use noindex only as you said. (Still prefer the no index on the robots block though)
-
Sorry Richard, but using noindex with canonical link is not quite a good practice.
It's an old entry, but still true: https://www.seroundtable.com/noindex-canonical-google-18274.html
-
I don't think it should be blocked by robots.txt at all. It's stopping Google from crawling the site fully. And they may even treat it negatively as they've been really clamping down on blocking folders with robots.txt lately. I've seen sites with warning in search console for: Disallow: /wp-admin
You may want to consider just using a noindex tag on those pages instead. And then also use a canonical tag that points back to the main job category page. That way Google can crawl the pages and perhaps pass all the juice back to the main job category page via the canonical. Then just make sure those junk job pages aren't in the sitemap either.
-
Hi James,
Regarding the robots.txt syntax:
Disallow: /jobs/? which basically blocks every single URL that contains /jobs/**? **
For example: domain.com**/jobs/?**sort-by=... will be blocked
If you want to disallow query parameters from URL, the correct implementation would be Disallow: /jobs/*? or even specify which query parameter you want to block. For example Disallow: /jobs/*?page=
My question to you, if these jobs are linked from any other page and/or sitemap? Or only from the listing page, which has it's pagination, sorting, etc. is blocked by robots.txt? If they are not linked, it could be a simple case of orphan pages, where basically the crawler cannot access the job posting pages, because there is no actual link to it. I know it is an old rule, but it is still true: Crawl > Index > Rank.
BTW. I don't know why you would block your pagination. There are other optimal implementations.
And there is always the scenario, that was already described by Matt. But I believe in that case you would have at least some of the pages indexed even if they are not going to get ranked well.
Also, make sure other technical implementations are not stopping your job posting pages from being indexed.
-
I'd guess that the jobs get pulled from a job board. If this is the case, then the content ( job description, title etc.) will just be a duplication of the content that can be found in many other locations. If a plugin is used, they sometimes automatically add a disallow into the robots.txt file as to not hurt the parent version of the job page by creating thousands of duplicate content issues.
I'd recommend creating some really high-quality hub pages based on job type, or location and pulling the relevant jobs into that page, instead of trying to index and rank the actual job pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trying to get Google to stop indexing an old site!
Howdy, I have a small dilemma. We built a new site for a client, but the old site is still ranking/indexed and we can't seem to get rid of it. We setup a 301 from the old site to the new one, as we have done many times before, but even though the old site is no longer live and the hosting package has been cancelled, the old site is still indexed. (The new site is at a completely different host.) We never had access to the old site, so we weren't able to request URL removal through GSC. Any guidance on how to get rid of the old site would be very appreciated. BTW, it's been about 60 days since we took these steps. Thanks, Kirk
Intermediate & Advanced SEO | | kbates0 -
Ogranization Schema/Microformat for a content/brand website | Travel
Hi, One of our clients have a website specific to a place, for eg. California Tourism in which they publish local information related to tourism, blogs & other useful content. I want to understand how useful is to publish Organization Schema on such website mentioning the actual Organization, which in this case is a Travel Agency? Or any other schema would fit in for such websites?
Intermediate & Advanced SEO | | ds9.tech0 -
What does Disallow: /french-wines/?* actually do - robots.txt
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?* Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark? Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL? I think this has been done to block URLs containing query strings. Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Moving html site to wordpress and 301 redirect from index.htm to index.php or just www.example.com
I found page duplicate content when using Moz crawl tool, see below. http://www.example.com
Intermediate & Advanced SEO | | gozmoz
Page Authority 40
Linking Root Domains 31
External Link Count 138
Internal Link Count 18
Status Code 200
1 duplicate http://www.example.com/index.htm
Page Authority 19
Linking Root Domains 1
External Link Count 0
Internal Link Count 15
Status Code 200
1 duplicate I have recently transfered my old html site to wordpress.
To keep the urls the same I am using a plugin which appends .htm at the end of each page. My old site home page was index.htm. I have created index.htm in wordpress as well but now there is a conflict of duplicate content. I am using latest post as my home page which is index.php Question 1.
Should I also use redirect 301 im htaccess file to transfer index.htm page authority (19) to www.example.com If yes, do I use
Redirect 301 /index.htm http://www.example.com/index.php
or
Redirect 301 /index.htm http://www.example.com Question 2
Should I change my "Home" menu link to http://www.example.com instead of http://www.example.com/index.htm that would fix the duplicate content, as indx.htm does not exist anymore. Is there a better option? Thanks0 -
Links from non-indexed pages
Whilst looking for link opportunities, I have noticed that the website has a few profiles from suppliers or accredited organisations. However, a search form is required to access these pages and when I type cache:"webpage.com" the page is showing up as non-indexed. These are good websites, not spammy directory sites, but is it worth trying to get Google to index the pages? If so, what is the best method to use?
Intermediate & Advanced SEO | | maxweb0 -
Is it worth removing date from Blog Posts / Articles
Wondering, is it worth to remove date from articles from seo perspective. Am sure, Google search algorithm would like demote a post written a year back, as against an article on the same post (unless a year old post has very strong Authoritative links) May be it can turn out a bad user experience of removing dates, but if can hide date using Javascripts so as to show it as image to user and hide it from search engines, is it a good idea !!
Intermediate & Advanced SEO | | Modi0 -
Blog posts not showing in serps for exact match title search
hi- my first client ranks #1 for the exact phrase of each blog post title the 2nd client doesnt rank anywhere when i search for the exact post title 2nd client has robots.txt User-agent: *
Intermediate & Advanced SEO | | Ezpro9
Disallow: /wp-admin/
Disallow: /wp-includes/ so that shouldnt noindex any posts right? his site ranks for many kw's - but oddly none of his blog posts are anywhere to be found - i dont mean for a kw search - i mean for searching for the entire title he doesnt rank anywhere in first 5 pages for any of 6-7 posts i checked any idea what could cause this? thanks0 -
Indexed Pages in Google, How do I find Out?
Is there a way to get a list of pages that google has indexed? Is there some software that can do this? I do not have access to webmaster tools, so hoping there is another way to do this. Would be great if I could also see if the indexed page is a 404 or other Thanks for your help, sorry if its basic question 😞
Intermediate & Advanced SEO | | JohnPeters0