Writing A Data Extraction To Web Page Program
-
In my area, there are few different law enforcement agencies that post real time data on car accidents. One is http://www.flhsmv.gov/fhp/traffic/crs_h501.htm. They post the accidents by county, and then in the location heading, they add the intersection and the city. For most of these counties and cities, our website, http://www.kempruge.com/personal-injury/auto-and-car-accidents/ has city and county specific pages. I need to figure out a way to pull the information from the FHP site and other real time crash sites so that it will automatically post on our pages. For example, if there's an accident in Hillsborough County on I-275 in Tampa, I'd like to have that immediately post on our "Hillsborough county car accident attorney" page and our "Tampa car accident attorney" page.
I want our pages to have something comparable to a stock ticker widget, but for car accidents specific to each pages location AND combines all the info from the various law enforcement agencies. Any thoughts on how to go about creating this?
As always, thank you all for taking time out of your work to assist me with whatever information or ideas you have. I really appreciate it.
-
-
Write a Perl program (or other language script) that will: a) read the target webpage, b) extract the data relevant for your geographic locations, c) write a small html file to your server that formats the data into a table that will fit on the webpage where you want it published.
-
Save that Perl program in your /cgi-bin/ folder. (you will need to change file permissions to allow the perl program to execute and the small html file to be overwritten)
-
Most servers allow you to execute files from your /cgi-bin/ on a schedule such as hourly or daily. These are usually called "cron jobs". Find this in your server's control panel. Set up a cron job that will execute your Perl program automatically.
-
Place a server-side include the size and shape of your data table on the webpage where you want the information to appear.
This set-up will work until the URL or format of the target webpage changes. Then your script will produce errors or write garbage. When that happens you will need to change the URL in the script and/or the format that it is read in.
-
-
You need to get a developer who understands a lot about http requests. You will need to have one that knows how to basically run a spidering program to ping the website and look for changes and scrape data off of those sites. You will also need to have the program check and see if the coding on the page changes, as if it does, then the scraping program will need to be re-written to account for this.
Ideally, those sites would have some sort of data API or XML feed etc to pull off of, but odds are they do not. It would be worth asking, as then the programming/programmer would have a much easier time. It looks like the site is using CMS software from http://www.cts-america.com/ - they may be the better group to talk to about this as you would potentially be interfacing with the software they develop vs some minion at the help desk for the dept of motor vehicles.
Good luck and please do produce a post here or a YouMoz post to show the finished product - it should be pretty cool!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Reducing cumulative layout shift for responsive images - core web vitals
In preparation for Core Web Vitals becoming a ranking factor in May 2021, we are making efforts to reduce our Cumulative Layout Shift (CLS) on pages where the shift is being caused by images loading. The general recommendation is to specify both height and width attributes in the html, in addition to the CSS formatting which is applied when the images load. However, this is problematic in situations where responsive images are being used with different aspect ratios for mobile vs desktop. And where a CMS is being used to manage the pages with images, where width and height may change each time new images are used, as well as aspect ratios for the mobile and desktop versions of those. So, I'm posting this inquiry here to see what kinds of approaches others are taking to reduce CLS in these situations (where responsive images are used, with differing aspect ratios for desktop and mobile, and where a CMS allows the business users to utilize any dimension of images they desire).
Web Design | | seoelevated3 -
Heres a puzzle for you... Htags on left hand nav for ecomm category pages
Hi Guys, So I am back asking more questions, but I am slowly learning 🙂 This next one, I have looked everywhere and I can't get my head around h tags on ecomm sites. I have looked at competitors and non competitors and still am not sure which is the right or wrong thing to do, specifically in this instance, category pages. Our I.T. dept is somewhat under resourced and I don't want to waste their time with test and trial on this particular issue. Category landing pages... There are shed loads of category listing pages on our site, at the moment the h1 tag for each sub-category is listed in the end path of the breadcrumb, there is no other spaces on the page accept the left hand navigation. Which is the better to use, breadcrumb or nav? We would have to totally recode our left hand nav which is pretty set up for the whole site. The reason I ask this question to you is because an SEO agency recommended that we... Add the H1 to the left hand navigation and make it customisable so that it is not the same as the breadcrumb keywords. I have attached an image of a competitor of ours that so the same thing currently, to show what I mean... Right now I am not sure what to tell the agency and what the right thing to do is. I read a post saying that we are actually doing the right thing under the circumstances. Does anyone have a best practise good example of generally what we should do for category pages? Your help is always muchly appreciated Kindest, Kay new
Web Design | | eLab_London0 -
Will numbers & data be considered as user generated content by Google OR naturally written text sentences only refer to user generated content.
Hi, Will numbers & data be considered as user generated content by Google OR naturally written text sentences only refer to user generated content. Regards
Web Design | | vivekrathore0 -
Reasons Why Our Website Pages Randomly Loads Without Content
I know this is not a marketing question but this community is very dev savvy so I'm hoping someone can help me. At random times we're finding that our website pages load without the main body content. The header, footer and navigation loads just fine. If you refresh, it's fine but that's not a solution. Happens on Chrome, IE and Firefox, testing with multiple browser versions Happens across various page types - but seems to be only the main content section/container Happens while on the company network, as well as externally Happens after deleting cookies, temporary internet files and restarting computer We are using a CMS that is virtually unheard of - Bridgeline/Iapps Codebase is .net Our IT/Dev group keeps pushing back, blaming it on cookies or Chrome plugins because they apparently are unable to "recreate the problem". This has been going on for months and it's a terrible experience for the user to have. It's also not great when landing PPC visitors on pages that load with no content. If anyone has ideas as to why this may be happening I would really appreciate it. I'm not sure if links are allowed, by today the issue happened on this page serversdirect.com/dm/geek-biz Linking to an image example below knEUzqd
Web Design | | CliqStudios0 -
Pointless copy on product list pages makes me feel compromised...
When working on ecommerce websites we insist that product list pages need at least 250 words of copy that is optimised for our keyword phrase ... lets say "17 inch bike frames". So we have some crappy copy written that goes something like this.... "We have a great 17 inch bike frame for you whatever your requirement. Take a look at the frames below .... blah blah blah totally pointless text blah blah blah........." This text is of no use to the user as the page is merely a means of them getting to a suitable product page. However, the copy is pretty essential if we want to rank well for "17 inch bike frames" and not having copy on product list pages could land us in hot water with Panda ...especially if we have lots of them on a site using the same page template and with no copy on them. Does anyone else feel uneasy with adding this crappy text to pages? It's only there for search engines and that is something that Google say's we shouldn't do but I know for sure they're not going to rank me as well if I don't have it. I'd be interested to hear other people's opinion on this. It's always annoyed me. Does anyone have any good tips for making this type of copy on product list pages less forced and crappy?
Web Design | | QubaSEO0 -
How to do a non-spammy "doorway page"?
Hi there, ISSUE: I have a client who wishes to use a "doorway" page, but not in a spammy way. He would like to have a nice crisp URL for use in ads/brochures. The page is strictly a landing page (just with a separate URL). DOORWAY/LANDING PAGE WILL BE: Non-spammy -- There will be no attempt to optimize the landing page/no attempt to get the page to rank. Strictly a vanity URL -- he likes the way a separate website looks in ads as opposed to a landing page on the existing website (i.e., www.websitename.com/landing page) WHAT I'M TRYING TO DO: I'm basically trying to figure out what the best things to do to protect his other sites (which are very high quality valuable sites which rank well) from getting punished. STEPS I'M CONSIDERING: Robots no follow Separate hosting server Different person's name on a private domain registration Adding additional pages, so it's not a 1-page "doorway" Many thanks in advance to anyone who would share their experience and help me protect my client in the best way possible. I've told him there are risks, but he still wants to go ahead. MC
Web Design | | marketingcupcake1 -
Indexing Dynamic Pages
Hi, I am having an issues among others, regarding indexing dynamic pages. Our website, www.me-by-melia, was just put live and I am concerned the bottom naviagtion pages (http://www.me-by-melia.com/#store, http://www.me-by-melia.com/#facebook, etc) will not be indexed and create duplicate pages. Also, when you open these pages in a new tab, it takes you to homepage. The website was created in HTML5. Please advise.
Web Design | | Melia0