Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redesign Just Starting - Should I Leave The Previous Incomplete Site or Setup A Temporary Holding Page and Redirect Previous URL'S?
Hi All I've picked up a new website project and wanted to ask about the best way to proceed with the current site during the development process. The current site is incomplete although it has been live for a while and has over 80 pages in the sitemap. Link to site https://tinyurl.com/ychwftup The business owner wants to take down the current site and simply add a landing page stating "new website coming soon". From an SEO perspective, am I better to keep the current site live until the new site is ready? Or would it not make any difference if I setup the landing page and add 301 redirects from each page in the sitemap to the landing page. Many Thanks In Advance For Any Assistance
Web Design | | ruislip180 -
Log-in page ranking but not homepage
Our homepage is outranked by log-in page for "primary keyword" in Google search results; for which actually our homepage was optimised. I have gone through the other answers for the same question here. But I couldn't find them related with our website. We are not over optimised. We have link from top navigation menu of blog to our homepage. Does this causing this?
Web Design | | vtmoz1 -
Does having too many wordpress portfolio pages with little content hurt a site's SEO?
I have a site that is for a service company, not image based like a photographer or artist. We utilize the Portfolio feature to create a gallery of floor coating finishes (images of all the flooring finish options available) but this solution has created /portfolio/file-name pages for each image. These pages have no other content besides the image. I've run SEMrush audits on this site which shows a high percentage of pages with low text/code ratio and duplicate content (a lot of the finishes have very similar names). This site has been extremely slow to improve any visibility online (more than 9 months) and I'm wondering if this is a factor by possibly having a negative effect on our site. We initially chose the portfolio option because it was the best-looking solution for our users but we can certainly change it to another format if that is better. Thanks!
Web Design | | WillGMG0 -
How to deal with 100s of Wordpress media link pages, containing images, but zero content
I have a Wordpress website with well over 1000 posts. I had a SEO audit done and it was highlighted that every post had clickable images. If you click the image a new webpage opens containing nothing but the image. I was told these image pages with zero content are very bad for SEO and that I should get them removed. I have contacted several Wordpress specialists on People Per Hour. I have basically been offered two solutions. 1 - redirect all these image pages to a 404, so they are not found by Google 2 - redirect each image page to the main post page the image is from. What's my best option here? Is there a better option? I don't care if these pages remain, providing they are not crawled by Google and classified as spam etc. All suggestions greatly received!
Web Design | | xpers0 -
Web development - License CSS/Markup/Code
In development of a website, is it typical for the developer to retain rights to the CSS, Markup and other Coding? If so, why is this done?
Web Design | | DemiGR0 -
Site Ranks on Page 1 - Would launching new site hurt that
Hello, I currently have a website ranking in the top 7 for my main keyword. The website was built in 2004 and is definitely outdated, yet still ranks very high and brings in business. If i launched a new site on this domain, what would happen to my rankings? Would they drop? would they rise? If i don't launch the new site, will this site eventually drop due to being old and outdated? Any advice would be helpful...
Web Design | | Prime850 -
Landing pages vs internal pages.
Hey everyone I have run into a problem and would greatly appreciate anyone that could weigh in on it. I have a web client that went to an outside vendor for marketing. The client asked me to help them target some keywords and since I am new to the SEO world I have proceeded by researching the best keywords for the client. I found 6 that see excellent monthly searches. I then registered the .com and or .net domain names that match these words. I then started building landing pages that make reference to the keyword and then have links to his site to get more info. My customer sent the first of these sites to the marketer and he says I am doing things all wrong. He says rather then having landing pages like this I should just point the domain names at internal pages to the website. He also says that I should not have different looks for the landing pages from the main site and that I should have the full site menu on each landing page. I wanted to here what everyone here has to say about the pros and cons of the way to do this cause the guy giving the advice to me has a lower ranking site then I do and I have only started working on getting my site ranked this year. He has atleast according to him been doing this forever. Thanks, Ron
Web Design | | bsofttech0 -
Question about web site structure
Is there an SEO advantage for individual pages to be in sub folders vs not being in a folder? Of course site managemnt is easier with folders if you have 100;s of pages...clearly a shorter URL is easier for humans to naviagte. store.com/gadgets store.com/lasers vs. store.com/gadgets/lasers
Web Design | | johnshearer0