Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Items 30 - 50", however this is not accurate. Articles/Pages/Products counts are not close to this, products are 100+, so are the articles. We would want to either hide this or correct this.
We are running into this issue where we see items 30 -50 appear underneath the article title for google SERP descriptions . See screenshot or you can preview how its appearing in the listing for the site here: https://www.google.com/search?source=hp&ei=5I5fX939L6qxytMPh_el4AQ&q=site%3Adarbyscott.com&oq=site%3Adarbyscott.com&gs_lcp=CgZwc3ktYWIQAzoICAAQsQMQgwE6BQgAELEDOgIIADoECAAQCjoHCAAQsQMQClDYAljGJmC9J2gGcAB4AIABgwOIAYwWkgEIMjAuMy4wLjKYAQCgAQGqAQdnd3Mtd2l6sAEA&sclient=psy-ab&ved=0ahUKEwjd_4nR_ejrAhWqmHIEHYd7CUwQ4dUDCAk&uact=5 Items 30 - 50", however this is not accurate and we are not sure what google algorithm is counting. . Articles/Pages/Products counts are not close to this, products are 100+, so are the articles. Anyone have any thoughts on what google is pulling for the count and how to correct this? We would want to either hide this or correct this. view?usp=sharing
Web Design | | Raymond-Support0 -
How to fix non-crawlable pages affected by CSS modals?
I stumbled across something new when doing a site audit in SEMRUSH today ---> Modals. The case: Several pages could not be crawled because of (modal:) in the URL. What I know: "A modal is a dialog box/popup window that is displayed on top of the current page" based on CSS and JS. What I don't know: How to prevent crawlers from finding them.
Web Design | | Dan-Louis0 -
Any body can help me to make my web site seo freindly?
any body can help me to make my web site seo freindly? i have not big budget please email me fabric35@hotmail.com
Web Design | | fabric-fabric0 -
How to find internal pages linking to a URL?
Hey, I had an issue where a client found a bad link on their site then I went to fix it and couldn't figure out where on earth it was. I tried using different software which would find the link, but not tell me where it was linked from. I asked for some help from someone in my office and they found it in about 15 seconds. Their strategy was "think like a client - just click everywhere". Is there a way to quickly find what URLs are pointing to a specific URL? Cheers
Web Design | | renegadeempire0 -
Link colour on page?
I always thought that the link colour has to be different from text colour? I have come across a site http://www.printandpackaging.co.uk/ and it has made me question this belief, they seem to only have bolded the link which would be very nice if this is fine.
Web Design | | BobAnderson0 -
Splash Pages For App Downlowds
Hi, We currently have a very simple splash page that Android and iPhone users see when they land on our homepage. The screen gives them the option to download our app or move on to the full website. If they choose to go to the site they are redirected to our homepage. Is this going to have any negative impacts on our rankings? I'm not sure how the Google bot treats this type of page. We have also talked about replacing the splash page with a modal window, but I'm concerned that this will increase the load time of the home page on mobile devices. Does anyone have any experience with a similar situation or any advice? Thanks in advance!
Web Design | | Cash4Books0 -
Site Ranks on Page 1 - Would launching new site hurt that
Hello, I currently have a website ranking in the top 7 for my main keyword. The website was built in 2004 and is definitely outdated, yet still ranks very high and brings in business. If i launched a new site on this domain, what would happen to my rankings? Would they drop? would they rise? If i don't launch the new site, will this site eventually drop due to being old and outdated? Any advice would be helpful...
Web Design | | Prime850 -
Are there any studies, statistics or measurable impact of using mixed fonts on landing pages?
Are there any studies, statistics or measurable impact of using mixed fonts on landing pages? One of our landing pages is using five variations of the Arial font where size, strength (bold, italics) and color all vary. One camp internally believes that this okay, whereas another camp wants to standardize the presentation where there's less variance (such as a heading as one and the body copy as another). Have you been through a similar trial or test in the past? I've seen some instances of a Marketing Sherpa study on the topic, but no real numbers to support one thing or another. I've attached an example image of our current LP. I have a lot of strong opinions on a number of items - but we're looking to have an immediate internal discussion on the font issue first. Thanks! GIzA8.jpg
Web Design | | eMagineSEO0