Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do search engines see copy/keywords when it appears only at the bottom of a page?
My client is looking to improve their SEO, and to date I've written meta data and made some initial recommendations. Thing is, on some of their pages, the body copy appears at the bottom of the page, past links and big, splashy images. My question is, will search engines even see that copy to crawl it for keywords? Thanks!
Web Design | | MarcieHill0 -
Log-in page ranking but not homepage
Our homepage is outranked by log-in page for "primary keyword" in Google search results; for which actually our homepage was optimised. I have gone through the other answers for the same question here. But I couldn't find them related with our website. We are not over optimised. We have link from top navigation menu of blog to our homepage. Does this causing this?
Web Design | | vtmoz1 -
Is it still necessary to have a "home" page button/link in the top nav?
Or is it not necessary to have a "home" tab/link because everybody by this time knows you can get to the home page by clicking on the logo?
Web Design | | FindLaw0 -
For a web design firm, should i make a google plus local page or company page?
I have a web design firm located in India, At this moment we are focusing on local clients as the current competition in local market is very low. But in few months we will shift our focus to outsourcing. So I wanted to know if we should make a google plus local page and connect it with my google places account and website or should I make a google plus business page and connect it to website? Our major focus is on seo. Thanks
Web Design | | hard0 -
Page Redirection solution
Page Redirection solution needs, there are 2 sites in the same folder and one page of old site is bxxxxxseo.com/products.php new site bxxxxxseo.com/product_list.php .there are many old page indexed i wanna redirect all old pages to relevant pages of new site using SEO friendly way .Any help really appropriate. Thank you
Web Design | | innofidelity0 -
Why is this page removed from Google & Bing indices?
This page has been removed from indices at Bing and Google, and I can't figure out why. http://www.pingg.com/occasion/weddings This page used to be in those indices There are plenty of internal links to it The rest of the site is fine It's not blocked by meta robots, robots.txt or canonical URL There's nothing else to suggest that the page is being penalized
Web Design | | Ehren0 -
Changing my web design
When you redo your website, I assume that sometimes it might turn out worse. For example, users might just like your prior design better and thus your prior design might actually have better stats. My blog doesn't receive a lot of hits. I have about 1500 hits per months. I don't think that i have enough traffic for A/B testing. Is there a work around to see if my new blog design does better or worse?
Web Design | | jamesjd71 -
Dynamic pages and code within content
Hi all, I'm considering creating a dynamic table on my site that highlights rows / columns and cells depending on buttons that users can click. Each cell in the table links to a separate page that is created dynamically pulling information from a database. Now I'm aware of the google guidelines: "If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few." So we wondered whether we could put the dynamic pages in our sitemap so that google could index them - the pages can be seen with javascript off which is how the pages are manipulated to make them dynamic. Could anyone give us a overview of the dangers here? I also wondered if you still need to separate content from code on a page? My developer still seems very keen to use inline CSS and javascript! Thanks a bundle.
Web Design | | tgraham0