Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What’s the best tool to visualize internal link structure and relationships between pages on a single site?
I‘d like to review the internal linking structure on my site. Is there a tool that can visualize the relationships between all of the pages within my site?
Web Design | | QBSEO0 -
Internally linked pages from different subdomain must be well optimised?
Hi all, We have guide/help pages from different subdomain (help.website.com). And we have linked these from 3rd hierarchy level pages of our website (website.com/folder1/topic2). But help.website sumdomain & pages are not well optimised. So, I am not sure linking these subdomain pages from our website pages hurts our rankings? Thanks,
Web Design | | vtmoz0 -
Why would a developer build all page content in php?
Picked up a new client. Site is built on Wordpress. Previous developer built nearly all page content in their custom theme's PHP files. In other words, the theme's "page.php" file contains virtually all the HTML for each of the site's pages. Each individual page's back-end page editor appears blank, except for some of the page text. No markup, no widgets, no custom fields. And no dedicated, page-specific php files either. Pages are differentiated within page.php using: elseif (is_page("27") Has anyone ever come across this approach before? Why might someone do this?
Web Design | | mphdavidson0 -
Infinite Scroll and SEO - Is it enough to only link to the previous and next page in the pagination?
Hi all, We are implementing an eCommerce site where the results pages of the products will be visibile on one page (always loading new products when you scroll down the page). Now, I have read that the Google spiders cannot "load" new products scrolling down the page, hence the spider only sees the first few products of the results page. Our developer wants to implement a system where a users sees the first products on example.com/products Then scrolling down, he will see new products with the URL changing to example.com/page/2 and so on. Is it enough that we add a pagination link that goes from example.com/products to example.com/page/2 Then another link that goes from example.com/page/2 to example.com/page/3 and so on, so the Google spider can make his way through all the pages? Or is that too much deep linking and the spider wouldn't even crawl all the results pages? Any recommendations how to go about this? Many thanks in advance!
Web Design | | Gabriele_Layoutweb0 -
Best techniques for trying to rank a single page website?
I am new to SEO and am currently trying to market a single page website. Its proving to be hard. I have managed to get the site to page one for a few keywords and it is improving (upto page 2 for some desired keywords) but it seems to have stuck there for a few weeks now - with no movement. I am able to develop it if required. However I thought that I would just ask if there was anything that could give it a nudge without this? I have done on-site optimisation. As far as I'm aware that's about as good as it can be. So any advice?
Web Design | | Chstphrjohn0 -
Is there a Joomla! Component For A Blog Page That Is Recommended?
A business partner currently has a page on a Joomla! website that is passing for the blog page. I am not a Joomla! guy so I dont' know much about it. I do know that I don't like a lot of things and prefer Drupal however making a change to Drupal on that site is not an option. We need to upgrade the blog page so that it is more like a blog and I know there has to be an SEO friendly component for a Joomla! blog page. Any ideas?
Web Design | | Atlanta-SMO1 -
Best way of conserving link juice from non important pages
If I have a bunch of non important pages on my website which are of little use in the SE's index - IE contact us pages, pages which are near duplicate and conflict with KW's targetting other pages etc, what is the best way of retaining the link juice that would normally be passed to these pages? Most recent discussion I have read has said that with nofollow you effectively just loose link juice, as opposed to conserving it, so that doesn't seem a great option. If I do "noindex" on these pages, would that conserve the link juice in the site, or again would it be just lost? It seems quite a tricky situation as many pages are legitimate for customer usability, but are not worth having in the SE's index and you better off consolidating link juice - so it seems you are getting penilised for making something "for users". Thanks
Web Design | | James770