Nutanix and Meditech

Sep 18, 2016 nutanix meditech backup

Share on:

How to make Meditech better by using Nutanix

Introduction

I have dealt with the Patient Administration System made by Meditech for 9 years. I don’t consider myself a guru on how to use this application but I think I know a thing or two that might help others. I will not comment on how well or bad the application is designed but only on the infrastructure side of things.

Let’s say it’s very “traditional” … and to make Meditech accept changes can be challenging. Here is what was done to make it better.

Environment(s)

Meditech has multiple versions and you can’t easily move from one to the other so customers usually end up running them in parallel. Once you’re done with rolling out the new version, you keep the old one for reference.

So it was not just one environment but 3 :

Meditech 5.5 for most sites
Meditech 6.07 for 2 newer sites
Meditech 6.1, pilot for 1 sites

A bit of history …

Meditech 5.5

This implementation is the one i first worked on, starting with physical servers in 2007, virtualised in 2008.

18 File Servers (that’s the way the Meditech DB backend is called) are doing most of the IOPS (about 20k on average with burst at 25k, mostly limited to a handful of servers, not much bandwidth around 5MB/s, 70 to 80% writes).

Then there is a bunch of AS (application Servers to handle sessions) and BG servers (Background servers to handle tasks). Those are mostly identical and stateless servers.

There is also some imaging server to host forms and bills being scanned.

Total server footprint is 180 servers.

All those were originally hosted on traditional SAN (EMC VNX after migrating from an HP EVA that couldn’t cope). But even the VNX wasn’t enough and FS were offloaded to xtremio to offload the iops.

Backup is done via Meditech who makes a gold copy of one partition of the FS servers to another partition (E drive is the source, F drive the destination). Netbackup then backs up the F drive. We had to ask for special authorization to support vmdk on xtremio (because xtremio has an EMC badge then Meditech kind of agreed to it)

Meditech 6.07

This version is a slight departure in terms of design and smaller in size as it covers only 2 sites.

AS servers are gone and replaced by fewer Connection Manager servers.

FS Servers are fronted by Transaction servers depending on Meditech modules (things like billing, obstetrics etc).

As a result FS are either in a NPR group (old way) or M-AT group (new transactional way).

Those were hosted on HP Blade servers and VNX as the backup required RDM and Clones for FS servers (other roles are VMDK). The backup is a complex beast that took about 2 years to put in place with lots of redesign and help from EMC and Meditech to make it work. Backup relies on Networker and a module (NMMEDI) for networker to talk to Meditech (NMMEDI looks like a VB app fronting a bunch of script/commands to send to Meditech). So Backup process runs as follow:

Meditech flushes some cache
Networker initiate Meditech quiescing (using a binary called MBF64.exe)
Once NPR and MAT groups are quiesced, Meditech instructs Networker to break clones via a “proxy” server (physical server hooked on FC)
Once clones are broken, Backup starts.

It takes about 40 minutes to quiesce and break clones, backup total runtime is 4h15

DR is also covered by the same process with hourly quick quiesce and Recoverpoint bookmarking to provide application consistent restore points.

It has been advised not to go under 1 hour as it would put too much pressure on storage to quiesce/unquiesce and could therefore affect performance. One thing to note is because DR and backup are using the same MBF64 tool, DR bookmarking is paused during backup (only crash consistent replication is done via Recoverpoint).

Restore process is equally complicated as Networker needs to restore at LUN level and uses the “proxy” server. No proxy = no backup or restore.

Meditech 6.1 : this is where it gets interesting …

Same as 6.07 with more modules being converted to the MAT transactional model. We were allowed to access the MBF64 command and try to make it run on Nutanix. We were succesfull in trying to replicate the process used by NMMEDI in a Nutanix way …

One of the transaction servers has the Nutanix Cmdlets as well as MBF64 installed. Because quiescing needs to be triggered on servers as a whole and not individually, one server acts as the scheduler (in this case the main TS server). At a specific time, the server starts a script that does the following :

Quiesce NPR
Nutanix snapshot of the NPR group
Unquiesce NPR
Quiesce MAT
Nutanix snapshot of the MAT group
Unquiesce MAT
after a small wait for snapshots to be fully registered, snapshots are restored as VM copies (with a path and name prefix)
VM copies are then automatically registered in vCenter and NetBackup can then back them up

Nothing too complicated, the script is available here : MTSnapNTNX on GitHub

This has been working well, with the entire script running in less than 30 seconds (admitedly only covering 4 FS servers) and the fact that we were able to leverage our traditional backup software instead of having to use Networker (not that critical if you are already using Networker).

On the Nutanix sides, 2 Protection Domains have been created to differentiate NPR and MAT. DR/Cloud copy would be just one command away …

Some of the benefits when using Meditech on Nutanix :

ability to run full VM copies with no restore time to check data integrity or run as Dev/Test environment
quick quiescing time
Full VM backup instead of just one partition/Lun
Ability to dedupe and compress all the stateless servers (BG, AS, CM) with no performance impact and better storage efficiency

So the Nutanix goodness can apply to Meditech, like any other vBCA (virtualised Business Critical Application) it’s all about reducing risk (better backup and management) while improving usage (ease to perform data integrity and create test environments as exact production replica).

Hope this helps you make Meditech coexist with Nutanix…