Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Internal linking: Repeating same low level pages from high hierarchy level pages
Hi all, We have 3 editions of our product we are trying to rank better. Some of our features level pages from these editions are repeating in these 3 editions. Exactly like below example: clothes.com/cotton-fabrics/shirts clothes.com/wool-fabrics/shirts clothes.com/polyester-fabrics/shirts "Shirts" pages repeat in "cotton-fabrics", "wool-fabrics" and "polyester-fabrics". We have added rel=canonical to rank "shirts" in rank only one category. I wonder do we need to take any other measures to make sure that these pages don't affect us negatively. Thanks
Web Design | | vtmoz0 -
Will HTTPS Effect SERPS Depending on Different Page Content?
I know that HTTPS can have a positive influence on SERPS. Does anyone have any thoughts or evidence of this effect being different depending on the page content? For example, I would think that for e-commerce sites HTPS is a must, and I guess the change in rankings would be more significant. But what about other situations, AMP pages for example? Of if you run Adsense, or Affiliate links? Or if your page contains a form?
Web Design | | GrouchyKids1 -
Will numbers & data be considered as user generated content by Google OR naturally written text sentences only refer to user generated content.
Hi, Will numbers & data be considered as user generated content by Google OR naturally written text sentences only refer to user generated content. Regards
Web Design | | vivekrathore0 -
2 Menu links to same page. Is this a problem?
One of my clients wants to link to the same page from several places in the navigation menu. Does this create any crawl issues or indexing problems? It's the same page (same url) so there is no duplicate content problems. Since the page is promotional, the client wants the page accessible from different places in the nav bar. Thanks, Dino
Web Design | | Dino640 -
Dedicated landing pages vs responsive web design
I've been doing some research into web design and page layout as my company is considering a re-design. However, we have come to an argument around responsive webdesign vs SEO. The argument is around me (SEO specialist) arguing that I want dedicated pages for all my content as it's good for SEO since it focuses keywords and content properly, and it still adheres to good user journeys (providing it's done correctly), and my web designer arguing that mobile traffic is on the rise (which it is I know) so we should have more content under 1 URL and use responsive web design so that users can just scroll through content instead of having to keep be direct to different pages. What do I do... I can't find any blogs, questions, or whiteboards that really touches on this topic, so can anyone advise me on whether I should: Create dedicated landing pages for each bit of content which is good for SEO and taking users on a journey around my site OR All content that is relative to a landing page, put all under that one URL (e.g. "About us" may have info on the company, our team, our history, careers) and allow people to scroll down what could be a very long page on any device, but may effect SEO as I can't focus keywords/content under one URL properly, so it may effect rankings. Any advice SEO and user experience whizzes out there?
Web Design | | blackboxideas0 -
404's and a drop in Rank - Site maps? Data Highlighter?
I managed an old (2006 design) ticket site that was hosted and run by the same company that handled our point of sale. (Think, really crappy, customer had to click through three pages to get to the tickets, etc.) In Mid February, we migrated that old site to a new, more powerful site, built by a company that handles sites exclusively for ticket brokers. (My site: TheTicketKing. - dot - com) Before migration, I set up 301's for all the pages that we had currently ranked for, and had inbound links pointing to, etc. The CMS allowed me to set every one of those landing pages up with fresh content, so I created unique content for all of them, ran them through the Moz grader before launch, etc. We launched the site in Mid February, and it seemed like Google responded well. All the pages that we had 301's set up for stayed up fairly well in rank, and some even reached higher positions, while some took a few weeks to get back up to where they were before. Google was also giving us an average of 8-10K impressions per day, compared to 3000 per day with the old site. I started to notice a slow drop in impressions in mid April (after two months of love from Google,) and we lost rank on all our non branded pages around 4/23. Our branded terms are still fine, we didn't get a message from Google, and I reached out to the company that manages our site, asking if they had any issues with their other clients. They suggested that I resubmit our sitemaps. I did, and saw everything bump back up (impressions and rank) for just one week. Now we're back in the basement with all the non branded terms once again. I realize that Google could have penalized us without giving us a message, but what got me somewhat optimistic was the fact that resubmitting our sitemaps did bring us back up for around a week. One other thing that I was working on with the site just before the drop was Google's data highlighter. I submitted a set of pages that now come back with errors, after Google seemed to be fine with the data set before I submitted it. So now I'm looking at over 300 data highlighter errors when I'm in WMT. I deleted that set, but I still get the error listings in WMT, as if Google is still trying to understand those pages. Would that have an effect on our rank? Finally I do see that our 404's have risen steadily since the migration, to over 1000 now, and the people who manage the CMS tell me that it would have no effect on rank overall. And we're going to continue to get 404's as the nature of a ticket site would dictate? (Not sure on that, but that's what I was told.) Would anyone care to chime in on these thoughts, or any other clues as to my drop?
Web Design | | Ticket_King0 -
Page Size
Hello Mozers, What is the best page size ( or the max page size ie KB ) for a home page or a 2nd level page. Thank you - I appreciate you looking at this question. Vijay
Web Design | | vijayvasu0 -
Too Many On-Page Links
Most of my pages have "Too Many On-Page Links". If you view the website you will see this is mainly down to the top navigation drop down menu: http://www.cwesolutions.co.uk So if I wanted to reduce the number of links I would have to have category links with landing pages. How much does having "Too Many On-Page Links" effect my website ranking? Is it really important and would I notice a difference if I changed it?
Web Design | | petewinter0