Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regarding rel=canonical on duplicate pages on a shopping site... some direction, please.
Good morning, Moz community: My name is David, and I'm currently doing internet marketing for an online retailer of marine accessories. While many product pages and descriptions are unique, there are some that have the descriptions duplicated across many products. The advice commonly given is to leave one page as is / crawlable (probably best for one that is already ranking/indexed), and use rel=canonical on all duplicates. Any idea for direction on this? Do you think it is necessary? It will be a massive task. (also, one of the products that we rank highest for, we have tons of duplicate descriptions.... so... that is sort of like evidence against the idea?) Thanks!
Web Design | | DavidCiti0 -
Is there a Joomla! Component For A Blog Page That Is Recommended?
A business partner currently has a page on a Joomla! website that is passing for the blog page. I am not a Joomla! guy so I dont' know much about it. I do know that I don't like a lot of things and prefer Drupal however making a change to Drupal on that site is not an option. We need to upgrade the blog page so that it is more like a blog and I know there has to be an SEO friendly component for a Joomla! blog page. Any ideas?
Web Design | | Atlanta-SMO1 -
Why aren't Images in G+ product page posts showing up in SERPs for brand searches?
Before 1-2 weeks ago, our G+ posts containing links to our product pages would show up in in SERPs (when searching for our brand name) with a thumbnail of the product image. Now, they do not (see image below for visual). Our tech team confirmed there hasn't been any coding change that might be to blame and I see that this isn't happening to other sites. Any idea what may be the problem here? tcnhLgy
Web Design | | znotes0 -
Main page redirect affecting search results?
Question.... A recent change was made to our page www.BGU.edu by a marketing person. So now when you type in www.BGU.edu it actually redirects to a different page www.BGU.edu/inquiry This is a really bad idea isn't it? I do not know enough about SEO to know a lot, and just joined SEOmoz but do I need to tell the admin to change it back?
Web Design | | nongard10 -
Still too many internal links reported on page
Hi Guys I am new here, and very much learning a lot, and enjoying the benefits of being an SEOMoz user. So here goes with my first question (probably of many). I have known for sometime that our website has a top heavy number of links in the primary navigation. But I wasn't too sure how important this was. Our main objective was to make an east to use nav for customers. All of the feedback we have had says that customers really like our navigation, as it is easy to use etc etc. However, when running an SEOMoz campaign on our site, again we got back that there are too many links on the pages. Example, home page has 500+ links. So I decided to do something about this. I have implemented what I think is a good solution where by the drop down navigation isn't loaded on first load. If the user then hovers over one of our "departments" the sub navigation is loaded via Ajax and dropped in. This means if the user wants it, they get it, if not then it's not loaded with the page. My theory being that Google loads the page without all the links, but a user gets the links as and when they need them. I tested with the SEOMoz toolbar and this tells me that when I load the home page there is 167 links in it vs 500+ previously. However, the my campaign still tells me that my home page has 450+ links (and this is a recent crawl of the page). Our site is here: www.uniquemagazines.co.uk Can you tell me is what I have done is a) a good solution and b) does the SEOMoz crawler have the ability to trigger the hover event and cause the AJAX load of the sub navigation content?
Web Design | | TheUniqueSEO0 -
TOP 5 Questions I Should Ask a web designer or developer?
I want to redesign my website to work better with SEO and crawls. I need to make sure I hire a good designer/developer but I don't have a ton of money to spend. What are the top 5 questions I should ask a web designer/developer to ensure they are good designers and have successfully implemented SEO??...or at least a site that is crawled well and has some SEO built in.
Web Design | | CapitolShine0 -
Looking for quality, cheap web design company recommendations
Does anyone have any recommendations of a very cheap web design company from India or other countries where the work is also high quality? I can project manage the development and provide a set of web standards such as to use valid code, no meta keyword tags, no flash, etc etc. I am looking for companies I can trust to perform the actual work. The web is full of companies but examples of quality work with WordPress, Joomla and particular ecommerce platforms is very thin or non-existent. If you can share any companies which you have personal experience with, I would appreciate it.
Web Design | | RyanKent0 -
How not to get penalized by having a Single Page Interface (SPI) ?
Guys, I run a real estate website where my clients pay me to advertise their properties. The thing is, from the beginning, I had this idea about a user interface that would remain entirely on the same page. On my site the user can filter the properties on the left panel, and the listings (4 properties at each time) are refreshed on the right side, where there is pagination. So when the user clicks on one property ad, the ad is loaded by ajax below the search panel in the same page .. there's a "back up" button that the user clicks to go back to the search panel and click on another property. People are loving our implementation and the user experience, so I simply can't let go of this UI "inovation" just for SEO, because it really is something that makes us stand out from our competitors. My question, then, is: how not to get penalized in SEO by having this Single Page Interface, because in the eyes of Google users might not be browsing my site deep enough ?
Web Design | | pqdbr0