Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can we link back from help documents to product or features pages on website?
Hi, We have all our help documents on subdirectory linked for all the features or products we provide. Like we linked website.com/help/seo-guide from website.com/services/seo-product as that is relevant guide. Do we need to link back from all help guide pages to product pages? Thanks
Web Design | | vtmoz0 -
We're considering making notable changes to our website's navigation. Other than 301 redirects from old pages to new, what do I need to consider with this type of move or update?
We would like to make some navigation changes to our website: www.NetGainIT.com, specifically to the services section. I know that I will need a list of 301 redirects if I do not plan on keeping certain pages, but what else do I need to consider?
Web Design | | NetGainTech0 -
Is it important to keep your website home index page simple to rank better?
My website http://www.endeavourcottage.co.uk/ markets holiday cottages and it's grown from my own singular cottage into a small letting agency and I used to rank at best number 3 for the short tailed keywords like Whitby holiday cottages with its drop-down to position 10 on Google.co.uk. So this week I was looking for a UK business to help me improve my rankings and the first thing they said was my home page is detrimental with the listing too many conflicting info with it advertising all 12 properties on it. They suggested a door entry page into the site keeping it simple but when I run it through the analysing tool here on Moz for "Whitby holiday cottages" as an example it came out looking okay. I do the usual things of title tags and meta descriptions for my keywords etc any suggestions or advice would be very welcome thank you Alan
Web Design | | WhitbyHolidayCottages0 -
404's and a drop in Rank - Site maps? Data Highlighter?
I managed an old (2006 design) ticket site that was hosted and run by the same company that handled our point of sale. (Think, really crappy, customer had to click through three pages to get to the tickets, etc.) In Mid February, we migrated that old site to a new, more powerful site, built by a company that handles sites exclusively for ticket brokers. (My site: TheTicketKing. - dot - com) Before migration, I set up 301's for all the pages that we had currently ranked for, and had inbound links pointing to, etc. The CMS allowed me to set every one of those landing pages up with fresh content, so I created unique content for all of them, ran them through the Moz grader before launch, etc. We launched the site in Mid February, and it seemed like Google responded well. All the pages that we had 301's set up for stayed up fairly well in rank, and some even reached higher positions, while some took a few weeks to get back up to where they were before. Google was also giving us an average of 8-10K impressions per day, compared to 3000 per day with the old site. I started to notice a slow drop in impressions in mid April (after two months of love from Google,) and we lost rank on all our non branded pages around 4/23. Our branded terms are still fine, we didn't get a message from Google, and I reached out to the company that manages our site, asking if they had any issues with their other clients. They suggested that I resubmit our sitemaps. I did, and saw everything bump back up (impressions and rank) for just one week. Now we're back in the basement with all the non branded terms once again. I realize that Google could have penalized us without giving us a message, but what got me somewhat optimistic was the fact that resubmitting our sitemaps did bring us back up for around a week. One other thing that I was working on with the site just before the drop was Google's data highlighter. I submitted a set of pages that now come back with errors, after Google seemed to be fine with the data set before I submitted it. So now I'm looking at over 300 data highlighter errors when I'm in WMT. I deleted that set, but I still get the error listings in WMT, as if Google is still trying to understand those pages. Would that have an effect on our rank? Finally I do see that our 404's have risen steadily since the migration, to over 1000 now, and the people who manage the CMS tell me that it would have no effect on rank overall. And we're going to continue to get 404's as the nature of a ticket site would dictate? (Not sure on that, but that's what I was told.) Would anyone care to chime in on these thoughts, or any other clues as to my drop?
Web Design | | Ticket_King0 -
Is there a Joomla! Component For A Blog Page That Is Recommended?
A business partner currently has a page on a Joomla! website that is passing for the blog page. I am not a Joomla! guy so I dont' know much about it. I do know that I don't like a lot of things and prefer Drupal however making a change to Drupal on that site is not an option. We need to upgrade the blog page so that it is more like a blog and I know there has to be an SEO friendly component for a Joomla! blog page. Any ideas?
Web Design | | Atlanta-SMO1 -
301 Redirect ! Joomla Pages, Already ranking. ( just wanted to change the url
hey guys hope everyone had a new year. I am ranking for a page on my site that i want to ( not specifically move ), but just change the url name: It is too long i think and i want to move it from one portion of my architecture to another menu. I have never physically done a 301 redirect myself, always had someone do it for me. I wanted some pointers. Since it is a fairly new site 4 months old! What are my options. Do i need to 301 redirect the page, if i am changing the Structure and AI of my site, or can i just change the url as is and it will still get ranked? How do i keep that url put delete the page and redirect it ? Sorry its very simple but i wanted to get the communities help to continue on ! Best Wishes HAmpig
Web Design | | BizDetox0 -
Best way of conserving link juice from non important pages
If I have a bunch of non important pages on my website which are of little use in the SE's index - IE contact us pages, pages which are near duplicate and conflict with KW's targetting other pages etc, what is the best way of retaining the link juice that would normally be passed to these pages? Most recent discussion I have read has said that with nofollow you effectively just loose link juice, as opposed to conserving it, so that doesn't seem a great option. If I do "noindex" on these pages, would that conserve the link juice in the site, or again would it be just lost? It seems quite a tricky situation as many pages are legitimate for customer usability, but are not worth having in the SE's index and you better off consolidating link juice - so it seems you are getting penilised for making something "for users". Thanks
Web Design | | James770 -
Site-wide footer links or single "website credits" page?
I see that you have already answered this question before back in 2007 (http://www.seomoz.org/qa/view/2163), but wanted to ask your current opinion on the same question: Should I add a site-wide footer link to my client websites pointing to my website, or should I create a "website credits" page on my clients site, add this to the footer and then link from within this page out to my website?
Web Design | | eseyo0