Writing A Data Extraction To Web Page Program

KempRugeLawGroup

In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.

I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?

As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.

EGOL

Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.

This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.

CleverPhD

You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.

Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.

Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Writing A Data Extraction To Web Page Program

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Items 30 - 50", however this is not accurate. Articles/Pages/Products counts are not close to this, products are 100+, so are the articles. We would want to either hide this or correct this.

Core Web Vitals hit Mobile Rankings

CMS dynamicly created pages indexed?

A Not Linked Page Question

Best Practices for home page design for ecommerce website

What are the best wordpress theme for getting a good page rank

Are links from main page to inner pages will affect on ranking?

Two home pages?