Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does having too many wordpress portfolio pages with little content hurt a site's SEO?
I have a site that is for a service company, not image based like a photographer or artist. We utilize the Portfolio feature to create a gallery of floor coating finishes (images of all the flooring finish options available) but this solution has created /portfolio/file-name pages for each image. These pages have no other content besides the image. I've run SEMrush audits on this site which shows a high percentage of pages with low text/code ratio and duplicate content (a lot of the finishes have very similar names). This site has been extremely slow to improve any visibility online (more than 9 months) and I'm wondering if this is a factor by possibly having a negative effect on our site. We initially chose the portfolio option because it was the best-looking solution for our users but we can certainly change it to another format if that is better. Thanks!
Web Design | | WillGMG0 -
Flickr Gallery Effect on Page Ranking
Hello there We are working on a redesign for our site, and our business is very image intensive (sign company) On a typical product page, we have 5 images we are placing directly in the site optimized to try to rank the images in image search We also have about 30-50 sets of images, with 3-5 images each - hosted on flickr, that we are displaying as galleries on the page (user clicks, opens a light box to view the set, etc) Here is the page - http://impactsigns.ugmade.com/sample-page/ If you look at the page code, you will see that the flickr gallery (additional examples) section - adds ALOT of code to the page (lines 498 to 837) My question is : Does adding that flick gallery block negatively impact the page SEO, all else being equal? It seems like a lot of lines of code. And dont want it to seem spammy to the search engines. Thanks for your help and advice
Web Design | | Jumman0 -
Why is Google displaying meta descriptions for pages that are nowhere contained in said page metas?
Certain search keywords are pulling up incorrect page titles and meta descriptions for our site. I've looked through our code, and the text used by Google in the search results is nowhere found inside our site. I've also looked at previous iterations of our site from over a decade ago and still haven't found it. I then searched specifically for the exact phrased incorrect meta descriptions and found a long list of spammy sites linking to our domain with the exact, incorrect meta description. Is this why Google is displaying the incorrect data, and how do I get Google to use the meta descriptions from my actual site?
Web Design | | Closetstogo0 -
On-page SEO opinion on this Wordpress theme
Hi everyone. As an SEO agency we've been moving more toward genesis themes, however we have a client who really wants to redesign his website using the following theme: http://themeforest.net/item/this-way-wp-full-videoimage-background-with-audio/943634 - the theme would be images with no audio on the homepage. He is a remodeling contractor and likes the design and functionality of the theme. I'd like to get others feedback and opinions on what you think about the on-page SEO of this theme? Thanks.
Web Design | | WillWatrous0 -
H Tags for an Events Page
I wanted to get the thoughts of people here about how to best structure an events listing page for SEO. I have a list of events, all with dates, event titles, location name, city and zip. What I do currently is listed below. I also show a version for how I could revise it, but it would require me to duplicate the event date on the page. Any ideas, suggestions or best practice examples you can point me to would be greatly appreciated. Current Structure <state>Events - H1 Tag
Web Design | | abiondo
Friday, December 5, 2014 - H2 Tag
Event Title 1 - H3
Location Name, City, State - P Tags Event Title 2 - H3
Location Name, City, State - P Tags</state> I was wondering if I would see better results by doing the following instead. The benefits I see of this approach are the event titles are h2 instead of h3 tags and the con I see is duplicating the event dates <state>Events - H1 Tag</state>
Event Title 1 - H2
Friday, December 5, 2014
Location Name, City, State - P Tags
Event Title 2 - H2
Friday, December 5, 2014
Location Name, City, State - P Tags thanks, Anthony0 -
Site with no ads hit by Page Layout update?
Hi there! Can a site that has no ads on it be hit by Google's latest Page Layout update? Can it be hit for just one or two keywords? My site (www.ink2paper.com) has a decline in Google organic traffic in early Feb so my suspicion is the Page Layout update. However I have no ads on the site. Digging into GWMT I find that it is only one or 2 keywords that seems to have taken a dive, mainly [photo paper]. I used to get around 80 imps a day for this term. Then on 6 Feb it was down to 50; 7 Feb = 34; 8 Feb just 4 impressions! I got a spike back at usual levels on 10 & 11 Feb, but since then it has been back down to only 5 or so impressions a day. [photographic paper] took a small hit at the start of February, but has nose dived since the start of April. The homepage performs well for Google organic traffic - low bounce (22%) and good ecom conversion rate (14%) - although this is likely to be largely branded traffic. I feel my site is a 'good' result for the search term [photo paper], although there is always room for improvement of course! Any suggestions as to why Google has stopped showing my site for these keywords? All help is greatly appreciated. Cheers,
Web Design | | SimonHogg
Simon0 -
Responsive design to serve different page for IE8 - SEO Implications?
A client is planning on developing a responsive designed website which redirects visitors using IE8 to a static webpage that encourages users to visit in another browser. What are the SEO implications of a server redirect just for IE8 visitors? Possible solutions: would containing a link on the static page to "continue browsing" and give the visitor access to the entire site in IE8 work well? Or should a CSS overlay message appear to IE8 visitors, no redirect, that encourages them to visit in another browser? Or serving a separate stylesheet for IE8 visitors, and not giving a responsive experience be optimal? Any suggestions or thoughts are appreciated. Cheers, Alex
Web Design | | Alex.Weintraub0 -
Two home pages?
One of my campaigns shows duplicate page content for domain xxx and xxx/index. There is only one index (home) page, so why does it report on two?
Web Design | | Beemer0