Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Reducing cumulative layout shift for responsive images - core web vitals
In preparation for Core Web Vitals becoming a ranking factor in May 2021, we are making efforts to reduce our Cumulative Layout Shift (CLS) on pages where the shift is being caused by images loading. The general recommendation is to specify both height and width attributes in the html, in addition to the CSS formatting which is applied when the images load. However, this is problematic in situations where responsive images are being used with different aspect ratios for mobile vs desktop. And where a CMS is being used to manage the pages with images, where width and height may change each time new images are used, as well as aspect ratios for the mobile and desktop versions of those. So, I'm posting this inquiry here to see what kinds of approaches others are taking to reduce CLS in these situations (where responsive images are used, with differing aspect ratios for desktop and mobile, and where a CMS allows the business users to utilize any dimension of images they desire).
Web Design | | seoelevated3 -
Internally linked pages from different subdomain must be well optimised?
Hi all, We have guide/help pages from different subdomain (help.website.com). And we have linked these from 3rd hierarchy level pages of our website (website.com/folder1/topic2). But help.website sumdomain & pages are not well optimised. So, I am not sure linking these subdomain pages from our website pages hurts our rankings? Thanks,
Web Design | | vtmoz0 -
Regarding rel=canonical on duplicate pages on a shopping site... some direction, please.
Good morning, Moz community: My name is David, and I'm currently doing internet marketing for an online retailer of marine accessories. While many product pages and descriptions are unique, there are some that have the descriptions duplicated across many products. The advice commonly given is to leave one page as is / crawlable (probably best for one that is already ranking/indexed), and use rel=canonical on all duplicates. Any idea for direction on this? Do you think it is necessary? It will be a massive task. (also, one of the products that we rank highest for, we have tons of duplicate descriptions.... so... that is sort of like evidence against the idea?) Thanks!
Web Design | | DavidCiti0 -
HELP! IE secure page display issue on new live site
For some reason IE 7, 8, & 9 do not display the following page: https://www.jwsuretybonds.com/protools.htm All they show is the Norton seal. It shows properly in all other browsers without issue (including IE 10+), but the earlier versions flash the page for a split second, then hides everything. Can someone shed some light on this? This is a new live site we just launched minutes ago and these browsers account for 12% of our overall traffic. UGH I hate you microsoft!!! Thanks all 🙂
Web Design | | TheDude0 -
Hi, I have a doubt. If we want to hide unwanted text in a web page its possible with "" tag. And my question "does a search engine crawl those text? help me.
I want to hide a lot of text behind my site page. I know its possible with that tag. But in what way a search engine looks at those text? Hidden or they are crawled and indexed.
Web Design | | FhyzicsBCPL0 -
How do I gain full SEO value from individual property pages?
A client of ours has a vacation rental business with rental locations all over the country. Their old sites were a messy assembly of black hat, broken links and htaccess files that were used over and over on each site. We are redoing everything for them, in one site, with multiple subdirectories for individual locations, like Aspen, Fort Meyers, etc. Anyhow, I'm putting together the SEO plan for the site and I have a problem. The individual rental properties have great SEO value (lots of text, indexable pictures, can create google/bing location pages), and are great for linking in social media (Look at this wonderful property, rental price just reduced!). However, I don't want individual properties, which will have very similar keywords, links, descriptions, etc, competing with each other when indexed. Truth be told, I don't really want search engines linking directly to the individual property pages at all. The intended browsing experience should allow a user to "narrow down" exactly what they're seeking using the site until the perfect rental appears. What I want is for searchers to be directed to the property listing index that most closely matches what they're seeking (Ft. Meyers Rental Condos or Breckenridge Rental Homes), and then allow them to narrow it down from there. This is ideal for the users, because it allows them to see all available properties that match what they want, and ideal for the customer, because it applies dozens of pages of SEO mojo to a single index, rather than dozens of pages. So I can't "noindex" or "nofollow", because I want all that good SEO mojo. I can't REL=CANONICAL, because the property pages aren't similar enough to the index. I can't 301 Redirect because I want the users to be able to see the property pages at some point. I'm stymied.
Web Design | | SpokeHQ0 -
Web Developer Using Stock Photos
Hello, The organization is selling a cms system in a niche market across the country. It has the normal SEO challenges, in addition he is using purchased stock images. This seemed ok, while he was smaller but now we are growing rapidly and these images are VERY STOCK- and well used ( I have checked with Tiny Eye). I remember a few years ago this was a flag to the search engines who went through manual review, is this still true? It seems to me that the theme's that come with the images, are duplicated ( including navigation & footers), so having the duplicated images would be another negative. Thank you for your suggestions!
Web Design | | TammyWood0 -
What are the top web design agencies?
I have a big client who is looking to have some design work done on their websites. Really just redoing them essentially. They were asking me who the top web design companies in the country are. Anyone have any suggestions? Who are the most renown agencies?
Web Design | | DanDeceuster0