Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will Google Judge Duplicate Content on Responsive Pages to be Keyword Spamming?
I have a website for my small business, and hope to improve the search results position for 5 landing pages. I recently modified my website to make it responsive (mobile friendly). I was not able to use Bootstrap; the layout of the pages is a bit unusual and doesn't lend itself to the options Bootstrap provides. Each landing page has 3 main div's - one for desktop, one for tablet, one for phone.
Web Design | | CurtisB
The text content displayed in each div is the same. Only one of the 3 div’s is visible; the user’s screen width determines which div is visible. When I wrote the HTML for the page, I didn't want each div to have identical text. I worried that
when Google indexed the page it would see the same text 3 times, and would conclude that keyword spamming was occurring. So I put the text in just one div. And when the page loads jQuery copies the text from the first div to the other two div's. But now I've learned that when Google indexes a page it looks at both the page that is served AND the page that is rendered. And in my case the page that is rendered - after it loads and the jQuery code is executed – contains duplicate text content in three div's. So perhaps my approach - having the served page contain just one div with text content – fails to help, because Google examines the rendered page, which has duplicate text content in three div's. Here is the layout of one landing page, as served by the server. 1000 words of text goes here. No text. jQuery will copy the text from div id="desktop" into here. No text. jQuery will copy the text from div id="desktop" into here. ===================================================================================== My question is: Will Google conclude that keyword spamming is occurring because of the duplicate content the rendered page contains, or will it realize that only one of the div's is visible at a time, and the duplicate content is there only to achieve a responsive design? Thank you!0 -
Location of body text on page - at top or bottom - does it matter for SEO?
Hi - I'm just looking at the text on a redesigned homepage. They have moved all the text to the very bottom of the page (which is quite common with lots of designers, I notice - I usually battle to move the important text back up to the top). I have always ensured the important text comes at the top, to some extent - does it matter where on the page the text comes, for SEO? Are there any studies you can point me to? Thanks for your help, Luke
Web Design | | McTaggart2 -
Google result showing old Meta Title / Description even though page view source shows new info.
Hey guys! I'm struggling with why Google is ignoring my Meta Title / Description. I made a pretty drastic change to both about a week ago and on the results it hasn't changed. I'm on first page with several keywords and I think this weird caching is hurting me on where I'm at on the page. Thoughts / Ideas?
Web Design | | curtis_williams0 -
301 forwarding during site migration problem - several url versions of the same page....
Hello, I'm migrating from an old site to a new site, and 301 forwarding many of the pages... My key problem is this I'm seeing www.website.com/ indexed in SE and www.website.com/default.aspx in showing as URL when I'm on homepage - should I simply 301 forward both of these? Then for several internal pages there are 2/3 versions of each page indexed. Canonicalization issues. Again, I'm wondering whether I should 301 forward each URL even if there are several different indexed URLs for the same page? Your advice will be welcome! Thanks in advance - Luke
Web Design | | McTaggart0 -
Nav / Sitemap Question. Using a "services" page vs just linking directly to individual service page?
Okay, so our company offers video production, web design, and web marketing services. While we do offer these services individually, our goal is to get our clients to integrate these services together. Our nav is currently like so : home - about - video - web design - web marketing - blog - contact Now I've seen businesses and agencies also use a nav with a "services" button instead of listing out their service offerings (if they have more than 1, like us). The services button usually links to a category page or has a drop down with links to the company's individual services. I'm wondering if there is any benefit to having a main services page like this and linking to the individual pages off of it (video ,web design, marketing, etc). Or if we should just keep it the way we have it now (since we've already got some page authority on the individual service pages). I know this may not be the most important aspect of our site and we may be over-thinking it but any thoughts/ideas would be greatly appreciated, thanks!
Web Design | | RenderPerfect0 -
Adding breadcrumbs in the body of a page
We want to implement breadcrumbs to improve the usability of our website - if we manually input breadcrumbs into the body of every page via our CMS are there any negative effects?
Web Design | | braunna0 -
Are there any studies, statistics or measurable impact of using mixed fonts on landing pages?
Are there any studies, statistics or measurable impact of using mixed fonts on landing pages? One of our landing pages is using five variations of the Arial font where size, strength (bold, italics) and color all vary. One camp internally believes that this okay, whereas another camp wants to standardize the presentation where there's less variance (such as a heading as one and the body copy as another). Have you been through a similar trial or test in the past? I've seen some instances of a Marketing Sherpa study on the topic, but no real numbers to support one thing or another. I've attached an example image of our current LP. I have a lot of strong opinions on a number of items - but we're looking to have an immediate internal discussion on the font issue first. Thanks! GIzA8.jpg
Web Design | | eMagineSEO0