Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
CAPTCHA Alternatives to Improve Page Load Speed
Recently I had to install reCAPTCHA on my site. The site contains domain name generators and they were being misused, in the words of my host: _Addition of a Captcha will go one of two ways - hit the bruting on the head as intended - OR it will increase the load and the impact by rendering the Captcha's. _ Have noticed that reCAPTCHA adds a fair amount of code 32% of page size and 5 requests. I want to replace reCAPTCHA with an alternative, has anyone got any ideas? Cheers. Justin
Web Design | | GrouchyKids0 -
How do you optimize for online catalog PDFs in regards to Page load time?
Does anyone have any experience with online widgets or apps that can support catalog pdfs? We have tons of catalog PDFs on one page for the website and the more we add, the worse the page load time gets. Any thoughts would be appreciated. Cheers!
Web Design | | FullMedia900 -
Can anyone recommend a great programming company?
I have had terrible luck with programmers who seem to live in their own little world and never get things done on time. Can anyone recommend a great company here in the usa that you have used before that has done great work? I am looking at the nerdery. Anyone use them?
Web Design | | netviper0 -
Internal links, new pages & Domain Authority
I have two questions regarding Domain Authority: 1. Is it possible that a drop in Domain Authority may have been caused by adding a blog and blog posts? In other words, would adding pages/posts dilute the site's authority? And will it catch back up with itself or will that require inbound links to those new pages? (oops! that was 3 questions in one) 2. Would it be detrimental to have internal links coming from blog posts without authority to my Home page and could that have contributed to a drop in Domain Authority? Thanks!
Web Design | | gfiedel0 -
What is the best information architecture for developing local seo pages?
I think I have a good handle on the external local seo factors such as citations but I'd like to determine the best IA layout for starting a new site or adding new content to a local site. I see lots of small sites with duplicate content pages for each town/area which I know is poor practice. I also see sites that have unique content for each of those pages but it seems like bad design practice, from a user perspective, to create so many pages just for the search engines. To the example... My remodeling company needs to have some top level pages on its site to help the customers learn about my product, call these "Kitchen Remodeling" and "Bathroom Remodeling" for our purposes. Should I build these pages to be helpful to the customer without worrying too much about the SEO for now and focus on subfolders for my immediate area which would target keywords like "Kitchen Remodeling Mytown"? Aside from my future site, which is not a priority, I would like to be equipped to advise on best practices for the website development in situations where I am involved at the beginning of the process rather than just making the local SEO fit after the fact. Thanks in advance!
Web Design | | EthanB0 -
Web Design for Ecommerce like wanelo
Hi, I have a client that wants to design an ecommerce site like http://www.wanelo.com/ initially they want to set up a first release to test their market using just he basic functionality thereofre you see a picture click on it add toshopping cart an buy. Does anyone know of a theme that can be used on wordpress or any other ecommerce system that has a pre defined theme as http://www.wanelo.com/ Thanks
Web Design | | VivaArturo
Arthur0 -
How serious is duplicate page content?
We just launched our site on a new platform - Magento Enterprise. We have a wholesale catalog and and retail catalog. We have up to 3 domains pointing to each product. We are getting tons of duplicate content errors. What are the best practices for dealing with this? Here is an example: mysite.com/product.html mysite.com/category/product.html mysite.com/dynamic-url
Web Design | | devonkrusich0