Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Items 30 - 50", however this is not accurate. Articles/Pages/Products counts are not close to this, products are 100+, so are the articles. We would want to either hide this or correct this.
We are running into this issue where we see items 30 -50 appear underneath the article title for google SERP descriptions . See screenshot or you can preview how its appearing in the listing for the site here: https://www.google.com/search?source=hp&ei=5I5fX939L6qxytMPh_el4AQ&q=site%3Adarbyscott.com&oq=site%3Adarbyscott.com&gs_lcp=CgZwc3ktYWIQAzoICAAQsQMQgwE6BQgAELEDOgIIADoECAAQCjoHCAAQsQMQClDYAljGJmC9J2gGcAB4AIABgwOIAYwWkgEIMjAuMy4wLjKYAQCgAQGqAQdnd3Mtd2l6sAEA&sclient=psy-ab&ved=0ahUKEwjd_4nR_ejrAhWqmHIEHYd7CUwQ4dUDCAk&uact=5 Items 30 - 50", however this is not accurate and we are not sure what google algorithm is counting. . Articles/Pages/Products counts are not close to this, products are 100+, so are the articles. Anyone have any thoughts on what google is pulling for the count and how to correct this? We would want to either hide this or correct this. view?usp=sharing
Web Design | | Raymond-Support0 -
Core Web Vitals hit Mobile Rankings
Hey all, Ever since Google announced "Core Web Vitals" are mobile rankings have nose-dived. At first, I thought it was optimisation changes to the page titles we had made which might still be part of the issue. However, Desktop rankings actuallyy increased for the same pages where mobile decreased. There is the plan to introduce a new ranking signal into the Google algorithm called the "core web vitals: and this was discussed around late May. even though it's supposed to get fully indexed into a ranking signal later this year or early next; I think Google continuously test and release this items before any official release. If you weren't aware, there is a section in Google Webmaster Tools related to "core web visits", which looks at:1. Loading2. Interactivity3. Visual StabilityThis overlays some of the other basic requirements of a good website and mobile experience. Taking a look at our Google Search Console, it appears to be the following:1. Mobile- 1,006 poor URLs, 100URLs need improvement and 475 good URLs.2. desktop- 0 poor URLs, 379 need improvements and 1,200 good URLsSOURCE: https://search.google.com/search-console/core-web-vitals?resource_id=https%3A%2F%2Fwww.griffith.ie%2FIn the report, we can see two distinct issues with the mobile pages:CLS Issue: more than 0.25 (mobile)- 1,006 casesLCP issue: longer than 4secs (mobile) - 348 case_CLS (Cumulative Layout Shift)This is a developer issue, and needs fixing. It's basically when a mobile screen jumps for the user. It is explained in this article: https://web.dev/cls/Seems to be an issue with all pages. **LCP (Largest Contentful Paint)_**Again, another developer fix that needs to be implemented. It's connected to page speed, and can be viewed here: https://web.dev/lcp/Looking at GCS, it looks like the blog content is mostly to blame.It's worth fixing these issues and again looking at the other items on page speed score tests:1. Leverage browser caching- https://gtmetrix.com/reports/griffith.ie/rBtvUC0F2. https://developers.google.com/speed/pagespeed/insights/?url=griffith.ie- mobile score for home page is 16/100, https://www.griffith.ie/people/thamil-venthan-ananthavinayagan is 15/100I think here is the biggest indicator of the issue at hand. Has anybody else noticed their mobile rankings go down and desktop stay the same of increase.Kind regards,
Web Design | | robhough909
Rob0 -
CMS dynamicly created pages indexed?
Hey Moz'erz, Looking at the indexed pages of my clients eCommerce website I noticed that dynamically created pages are being indexed. For example this page does not "exist" but is created by a drop down filter menu that sorts by product tag: /collections/tools/TAG I can only conclude that this page got indexed either through a backlink or once upon a time there was an internal link pointing to this URL and got indexed (currently there is not). Are either of these cases possibilities? In either case before considering removal or any action I would of-course reference analytics to check for conversions, traffic and any backlinks for those "pages". I believe at the end of the day is recommend a drop down filer that doesn't create new pages as the best solution. Thoughts, comments and experience is greatly welcomed 🙂
Web Design | | paul-bold0 -
A Not Linked Page Question
Hello, I have a page for opening an account in my website, this page is not accessible from my website menu, the only way to reach this page if you have the URL, I send the URL for specific users I want them to open an account in my system. I have two questions regarding this: does this harm or cause damages to my website SEO ? blocking this page in Robots.txt will cause any issue/will help ? waiting for your answer.
Web Design | | JonsonSwartz
Thanks in advance,
Joni.0 -
Best Practices for home page design for ecommerce website
I know this question is not directly related to SEO, but I figured I have been getting some good help from this forum, so why not? The website is www.vrtack.com. I am looking to redesign the home page. It is an ecommerce website selling equestrian clothing and leather goods. My goals are: 1. Reduce the very high bounce rate and drop-off rate. 2. Fine tune the relevancy of the website towards a handful of keyword phrases. 3. Engage the visitor to create better click-through and to increase the average time spent on the page/site. 4. Page Loading time is of importance. It has to load quickly. I would love to hear some specific suggestions, examples, best practices.
Web Design | | amitramani0 -
Are links from main page to inner pages will affect on ranking?
About 3 weeks ago I converted index.html to index.php. Both are 301 redirect to main url. Also I have about 70 links on main page pointing to internal pages. The Website is about 11 years old,and was on active link building . Is this conversion from html to php and also 70 links pointing to inner pages will affect on ranking?Since all links are passing juice to inner pages.
Web Design | | LosAngelesLimo0 -
Two home pages?
One of my campaigns shows duplicate page content for domain xxx and xxx/index. There is only one index (home) page, so why does it report on two?
Web Design | | Beemer0