Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Core Web Vitals hit Mobile Rankings
Hey all, Ever since Google announced "Core Web Vitals" are mobile rankings have nose-dived. At first, I thought it was optimisation changes to the page titles we had made which might still be part of the issue. However, Desktop rankings actuallyy increased for the same pages where mobile decreased. There is the plan to introduce a new ranking signal into the Google algorithm called the "core web vitals: and this was discussed around late May. even though it's supposed to get fully indexed into a ranking signal later this year or early next; I think Google continuously test and release this items before any official release. If you weren't aware, there is a section in Google Webmaster Tools related to "core web visits", which looks at:1. Loading2. Interactivity3. Visual StabilityThis overlays some of the other basic requirements of a good website and mobile experience. Taking a look at our Google Search Console, it appears to be the following:1. Mobile- 1,006 poor URLs, 100URLs need improvement and 475 good URLs.2. desktop- 0 poor URLs, 379 need improvements and 1,200 good URLsSOURCE: https://search.google.com/search-console/core-web-vitals?resource_id=https%3A%2F%2Fwww.griffith.ie%2FIn the report, we can see two distinct issues with the mobile pages:CLS Issue: more than 0.25 (mobile)- 1,006 casesLCP issue: longer than 4secs (mobile) - 348 case_CLS (Cumulative Layout Shift)This is a developer issue, and needs fixing. It's basically when a mobile screen jumps for the user. It is explained in this article: https://web.dev/cls/Seems to be an issue with all pages. **LCP (Largest Contentful Paint)_**Again, another developer fix that needs to be implemented. It's connected to page speed, and can be viewed here: https://web.dev/lcp/Looking at GCS, it looks like the blog content is mostly to blame.It's worth fixing these issues and again looking at the other items on page speed score tests:1. Leverage browser caching- https://gtmetrix.com/reports/griffith.ie/rBtvUC0F2. https://developers.google.com/speed/pagespeed/insights/?url=griffith.ie- mobile score for home page is 16/100, https://www.griffith.ie/people/thamil-venthan-ananthavinayagan is 15/100I think here is the biggest indicator of the issue at hand. Has anybody else noticed their mobile rankings go down and desktop stay the same of increase.Kind regards,
Web Design | | robhough909
Rob0 -
When rel canonical tag used, which page does Google considers for ranking and indexing? A/B test scenario!
Hi Moz community, We have redesigned our website and launched for A/B testing using canonical tags from old website to new website pages, so there will be no duplicate content issues and new website will be shown to the half of the website visitors successfully to calculate the metrics. However I wonder how actually Google considers it? Which pages Google will crawl and index to consider for ranking? Please share your views on this for better optimisation. Thanks
Web Design | | vtmoz0 -
2 Menu links to same page. Is this a problem?
One of my clients wants to link to the same page from several places in the navigation menu. Does this create any crawl issues or indexing problems? It's the same page (same url) so there is no duplicate content problems. Since the page is promotional, the client wants the page accessible from different places in the nav bar. Thanks, Dino
Web Design | | Dino640 -
Footer links on my site... bad for passing page rank?
i've been told that it is possible that google discounts the weight or page rank passed in footer links of websites and my website has the navigation to many of my pages in the footer of each page. My whole website is about 20 pages so each page has links to the 5 most popular pages at the top and the rest of the links are in the footer of each page. Am i losing page rank by having these links in the footer? Should i make my navigation different? I have lots of articles on my site so i thought it might be not only helpful to my readers but give my pages an seo boost if i placed in context links in the body of my articles to other pages of my site. Does this sound like a good idea? Thanks mozzers! Thanks mozzers!
Web Design | | Ron100 -
Can someone help me understand Structured Data?
So I'm wondering if someone could explain Structured Data a little better to me and what the importance is. I also am wondering how to best add Scheme.org markup to certain pages. I tried a plugin for wordpress and I don't think it was working correctly. I'm specifically wanting to make sure my Google Profile is showing with my website in SERP. I have the ?rel=author tag in on the front page and when I checked it when the Google Structured Data checker it shows it to be correct but its not displaying in SERP. Thanks!
Web Design | | jonnyholt0 -
A Not Linked Page Question
Hello, I have a page for opening an account in my website, this page is not accessible from my website menu, the only way to reach this page if you have the URL, I send the URL for specific users I want them to open an account in my system. I have two questions regarding this: does this harm or cause damages to my website SEO ? blocking this page in Robots.txt will cause any issue/will help ? waiting for your answer.
Web Design | | JonsonSwartz
Thanks in advance,
Joni.0