Virtual Design Master: Conceptual, Logical, Physical

This year I am honored to be one of the Virtual Design Master (vDM) judges. If you are unfamiliar with vDM, it is a technology driven reality competition that showcases virtualization community member and their talents as architects. Some competitors are seasoned architect while others are just beginning their design journey. To find out more information, please click here. One of the things that I, along with the other judges, noticed is that many of the contestants did not correctly document conceptual, logical, and physical design.

The best non-IT example that I have seen of this concept the following image:

C-3NztMXYAIRbF9.jpg-large
(Figure 1) Disclaimer: I’m not cool enough to have thought of this. I think all credit goes to @StevePantol.

The way I always describe and diagram design methodology is using the following image:

Screen Shot 2017-07-06 at 9.50.05 PM
(Figure 2) Mapping it all together

I will continue to refer to both images as we move forward in this post.

Conceptual

During the assess phase, the architect reaches out to the business’ key stakeholders for the project and explore what each need and want to get out of the project. The job is to identify key constraints and the business requirements that should be met for the design, deploy, and validation phases to be successful.

The assessment phase typically coincides with building the conceptual model of a design. Effectively, the conceptual model categorizes the assessment findings into requirements, constraints, assumptions, and risks categories.

For example:

 Requirements –

  1. technicloud.com should create art.
  2. The art should be durable and able to withstand years of appreciation.
  3. Art should be able to be appreciated by millions around the world.

Constraints –

  1. Art cannot be a monolithic installation piece taking up an entire floor of the museum.
  2. Art must not be so bourgeoisie that it cannot be appreciated with an untrained eye.
  3. Art must not be paint-by-numbers.

Risks –

  1. Lead IT architect at technicloud.com has no prior experience creating art.
    • Mitigation – will require art classes to be taken at local community college.
  2. Lead IT architect is left-handed which may lead to smearing of art.
    • Mitigation – IT architect will retrain as ambidextrous.

Assumptions –

  1. Art classes at community college make artists.
  2. Museum will provide security as to ensure art appreciators do not damage artwork.

As you read through the requirements and constraints, the idea of how the design should look should be getting clearer and clearer. More risks and assumptions will be added as design decisions are made and the impact is analyzed. Notice that the conceptual model was made up entirely of words? Emphasis on “concept” in the word “conceptual”!

Logical

Once the conceptual model is built out, the architect moves into the logical design phrase (which indicated by the arrows pointing backwards in Figure 2, demonstrating dependence on conceptual). Logical design is where the architect begins making decisions but at a higher level.

Logical art work design decisions –

  1. Art will be a painting.
  2. The painting will be of a person.
  3. The person will be a woman.

For those who are having a hard time following with the art example, a tech example would be:

Screen Shot 2017-07-06 at 10.27.57 PM
(Table 1) Logical design decision example

An example of what a logical diagram may look something like this:

Picture1
(Figure 3) Logical storage diagram example

Notice that these are ‘higher’ level decisions and diagrams. We’re not quite to filling in the details yet when working on logical design. However, note that these design decisions should map back to the conceptual model.

Physical

Once the logical design has been mapped out, architect moves to physical design where hardware and software vendors are chosen and configuration specifications are made. Simply put, this is the phase where the details are determined.

Physical art work design decisions –

  1. The painting will be a half-length portrait.
  2. The medium will be oil on a poplar panel.
  3. The woman will have brown hair.

Once again, if you hate the Mona Lisa then the IT design decision example would be:

  1. XYZ vendor and model of storage array will be purchased.
  2. Storage policy based management will be used to place VMs on the correct storage tier.
  3. Tier-1 LUNs will be replicated hourly.

These are physical design decisions, which directly correlate and extend the logical design decisions with more information. But, again, at the end of the day, this should all tie back to meeting the business requirements.

Screen Shot 2017-07-06 at 10.32.54 PM
(Table 2) Physical design decision example

An example of a physical design would be something like:

phys
(Figure 4) Physical storage diagram example

Notice that in this diagram, we’re starting to see more details: vendor, model, how things are connected, etc. Remember that physical should expand on logical design decisions and fill in the blanks. At the end of the day, both logical and physical design decisions should map back to meeting the business requirements set forth in the conceptual model (as evidenced by Figure 2).

Final Thoughts

Being able to quickly and easily distinguish takes time and practice. I am hoping this clarifies some of the mystery and confusion surrounding this idea. Looking forward to seeing more vDM submissions next week.

Introducing “Alta” – Rubrik CDM 4.0

Rubrik Cloud Data Management (CDM) 4.0 is Rubrik’s ninth and largest product release. The release, named Alta, expands the Rubrik platform to encompass all major hypervisors, adding Oracle support, furthering SQL support by introducing live mount functionality, and Cloud Instantiation. Additionally, Alta closes the gap with traditional backup architectures by introducing support for tape archival.

Screen Shot 2017-06-12 at 9.17.32 PM.png
Rubrik CDM 4.0 Features

A few release highlights…

Rubrik Atlas Infographic

  1. Manage and protect all major hypervisors. Support is added for Microsoft Hyper-V and Nutanix Acropolis hypervisor (AHV); this adds to the already supported VMware vSphere. Enterprise organizations are now able to orchestrate application data management and availability across multi-hypervisor and cloud infrastructures.
  2. Spin up applications in a public cloud using Cloud Instantiation. Any data that has been protected on-premises and sent to Amazon S3 can now be powered on as a fully functioning AMI. This functionality will be available for any VMware virtual machines that have been archived to Amazon S3. There is no requirement for a Rubrik Cloud Cluster to be running in the target Amazon region.
  3. Automate protection and recovery of Oracle databases. Database owners and administrators can leverage Rubrik’s high performing multi-stream backups to massively reduce any impact to production and existing workflows for database backups, replication, archival, and compliance.
  4. Live Mount capabilities for Microsoft SQL. Awesome innovation that delivers near-zero recovery times versus potentially hours or even days using other methods for restoring Microsoft SQL Server. With this feature, administrators can power on a SQL Server directly on Rubrik using any point in time. This delivers self-service access along with the powerful suite of APIs that can be used to automate workflows.
  5. Archive data to tape. This is the least sexy major feature but still an important one. Rubrik automates data archival to tape for enterprises who must meet governance specifications or other any type of compliance regulations.

What I’m excited about…

Nutanix AHV support – Nutanix Acropolis is a turn-key infrastructure platform, delivering enterprise-class storage compute, and native virtualization services capable of running nearly any application. In addition to supporting to supporting VMware vSphere and Microsoft Hyper-V, Acropolis includes its own built in hypervisor, AHV. With the 4.0 Alta release, Rubrik is excited to extend its capabilities to protect AHV virtual machines, making the company one of the first to do so. It’s been a fun journey the past few years watching Nutanix grow their platform and its ecosystem– AHV has matured a lot over the past few releases. I’m looking forward to seeing Rubrik used in large enterprises to protect multi-hypervisor and multi-cloud infrastructures. 

NTNX.png

Cloud Instantiation – There is a clear macro trend of IT workloads moving to the cloud. The use of cloud storage has been a part of the Rubrik story since the initial GA release in 2015. Customers can capture data sources on-premises and leverage cloud resources, such as Amazon S3, as a long-term archival target while still maintaining the ability to search, manage, and restore the data in any location. And now with this release, Rubrik extends the functionality of data in the cloud with cloud instantiation. There’s unlimited use cases for this type of functionality, especially as its feature set grows in future releases. This can assist with DR to cloud strategies or on-premises to cloud or even cross-cloud migrations. 

cloud.png

Mark your calendars — I’m co-presenting webinars on both topics this summer (AHV on 27 July and AWS Cloud Instantiation on 10 August). You can sign up here.

 

 

 

 

Nutanix One-Click Upgrades for ESXi Patching

This is the first guest post of what I hope to be many from the great Herb Estrella:

In my personal experience Nutanix one-click upgrades work as advertised, but there are few items that should be accounted for in preparation of installing ESXi patches on a Nutanix cluster. This post will cover a few pre-requisites to look for, touch on the subtasks of the patching procedure, and finally close out with some troubleshooting tips and links to resources that I found helpful.

If you’ve seen the Dr. Strange movie you’ll find that going through the one-click upgrade process is loosely akin to reading from the “Book of Cagliostro” in that “the warnings come after you read the spell.”

giphy-3

There is a pre-upgrade process that is done prior patching that catches a few items but here a few pre-requisites that I found will set you up nicely for success:

  • vSphere HA/DRS settings need to be set according to best practices aka “recommended practices” as these account for the CVM and a few other items that make a Nutanix cluster in vSphere different.
  • DRS Rules (affinity/anti-affinity rules), if in use, can also cause problems. For example, if you have a 3 node cluster and 3 VMs part of anti-affinity rules, it is a good idea to temporarily disable the rules. Re-enable the rules when patching is complete.
  • ISOs mounted (namely due to VMware Tools installs) are major culprits for VMs not moving when scheduled to by DRS or moved manually. I recommend to unmount any ISOs that aren’t accessible from all hosts within a cluster.

Subtasks are the steps in the one-click upgrade sequence from start to finish. Below are a listing of them with some observations from each.

one-click-upgrade-post

  • Downloading hypervisor bundle on this CVM
    • When the patch is initially uploaded it is stored in a CVM’s following directory:  /home/nutanix/software_uploads/hypervisor/. How you access Prism determines which CVM this hypervisor bundle (aka patch) will reside on first. This should be mostly transparent but this is one of those “good to know” items. The hypervisor bundle needs to be copied from the initial source location onto the CVM for which its host is being upgraded by the one-click upgrade process if this fails “no soup for you.”
  • Waiting to upgrade the hypervisor
    • …nothing to see here…
  • Starting upgrade process on this hypervisor
    • …keep it moving…
  • Copying bundle to hypervisor
    • …business as usual…
  • Migrating user VMs on the hypervisor
    • Huzzah! This is a good one to pay attention to especially if the pre-requisites previously covered are not addressed. The upgrade will most likely timeout/fail here and it may not give you any helpful information as to why.
    • This is also a good spot to watch the Tasks/Events tab on the ESXi host being patched to get some better insight in the process.
  • Installing bundle on the hypervisor
    • If all VMs have been successfully migrated, the host should be in maintenance mode with the CVM shutdown. This step also takes the longest…so patience is key.
  • Completed hypervisor upgrade on this node
    • At this stage the host is ready to run VMs as it should now be out of maintenance mode with the CVM powered on.

In the test environment I was working with I made a lot of assumptions and just dove head first. The results as you can imagine were not good. Here are a few troubleshooting measures I used to help right my wrongs.

  • The upload process for getting the ESXi patch to the CVM is straight forward; however there are two ways to do it: download a json direct from the Nutanix support portal or enter the MD5 info from the patch’s associated KB article. I chose to upload a json and purposefully use the wrong patch and now I can’t delete the json even after completing the upgrade. If I find out how to resolve this issue I’ll update this post. This is where knowing the file location of the patch on the CVM can be helpful (/home/nutanix/software_upload/hypervisor) because the patch can be deleted or replaced.
  • Restarting Genesis! This one is a major key. For example, the one-click upgrade is stuck, a VN didn’t migrate, and even after the VM is manually migrated the one-click upgrade won’t just continue where it left off. In my experience to resolve this you’ll need to give it a little nudge in the form of a genesis restart. Run this command (genesis restart) on the CVM that failed, if that doesn’t work trying restarting genesis on the other hosts in the cluster. I was doing this in a test environment and did an allssh genesis restart and was able to get the process moving, but results may vary. If you err on the side of caution restart genesis one at a time manually.
  • Some helpful commands to find errors in logs
    • grep ERROR ~/data/logs/host_preupgrade.out
    • grep ERROR ~/data/logs/host_upgrade.out
  • For the admins that aren’t about that GUI life you can run the one-click upgrade command from a CVM
    • cluster –md5sum=<md5 from vmware portal> –bundle<full path to the bundle location on the CVM> host_upgrade
  • To check on the status host_upgrade_status

Links:

  • One click upgrades via vmwaremine
  • Troubleshooting KB article via Nutanix Support Portal, may require Portal access to view.
  • vSphere settings via Nutanix Support Portal, may require Portal access to view.

     

Bonus thoughts: Do I need vSphere Update Manager if I’m using Nutanix? This could be a post on it’s own (and it still might be) but I have some thoughts I’d like to share.

  • Resources
    • In a traditional setup you will most likely have vSphere Update Manager installed on a supported Windows VM (unless VCSA 6.5) with either SQL Express or a DB created on an existing SQL server. One-click upgrade is built into Prism.
  • Compliance
    • Prism has visibility into the ESXi hosts for versioning so if a host was “not like the others” then it would pop up on a NCC check or in the Alerts in Prism.
  • vCenter Plugin
    • This one is worth mentioning but really not a huge deal. It’s one less thing to worry about and ties back into the resources statements above.
  • My Answer
    • It depends on if I’m all in with Nutanix because if my entire infrastructure were Nutanix hosts then I would not deploy vSphere Update Manager.

Nutanix Vocabulary

This will be the first post of a series —as I am going to post my study notes for NPP as a general reference and a study tool for others. We’ll start with the basics, Nutanix vocabulary.

The Nutanix Xtereme Computing Platform (XCP) is a converged, scale-out compute and storage system that is purpose built to host and store virtual machines.

XCP is comprised of two components:

Acropolis – data plane made up of App Mobility Fabric (AMF), Distributed Storage Fabric (DSF) and hypervisor integration.

  • App Mobility Fabric (AMF) – logical construct built into Nutting solutions that allows application and data to freely move between environments. The AMF abstracts the workloads (Containers, VMs, etc.) from the hypervisor, which is what provides this ability to easily move applications and datas around.
  • Distributed Storage Fabric (DSF)  – distributed system that pool storage resources and provides storage platform capabilities such as snapshots, disaster recovery, compression, erasure coding, and more. Nodes work together across a 10 GbE network to form a Nutanix cluster and the DSF.
  • Hypervisor –  ESXi, Hyper-V, and Acropolis Hypervisor (AHV)

PRISM – provides management UI for administrators to configure and monitor the cluster. This web interface also provides access to REST APIs and the nCLI.

A few more terms to be familiar with (since I used them in the section above!):

Node – the foundational unit for a Nutanix cluster. Each node runs a standard hypervisor (ESXi, Hyper-V, and AHV) contains processors, memory, network interfaces, and local storage (SSDs and HDDs).

Block – a Nutanix rackable unit containing up to four nodes

Nutanix Node.png

Cluster – set of Nutanix blocks and nodes that forms the Acropolis Distributed Storage Fabric (DSF). A cluster must contain a minimum of three nodes to operate.

The three objects that allow the Nutanix platform to manage storage are:

Storage Pool – is a group of physical storage devices, including SSD and HDD devices, for the cluster. The storage pool can span multiple Nutanix nodes and is expanded as the cluster scales.

  • It’s recommended that a single storage pool be created to manage all physical disks within the cluster.

Container – is a logical segmentation of the storage pool and contains a group of VMs or files (vDisks). Containers are usually mapped to hosts as shared storage in the form of an NFS datastore or an SMB share.

vDisk – is a subset of available storage within a container that provides storage to virtual machines. If the container is mounted as an NFS volume, then the creation and management of vDisks within that container is handled automatically by the cluster. Any file over 512 KB is a vDisk.

nutanix bible.png

(Image above taken from nutanixbible.com)

Some more storage terms:

Datastore – logical container for files necessary for VM operations.

Storage Tiers – utilize MapReduce tiering technology to ensure that data is intelligently placed in the optimal storage tier —flash or HDD —to yield the fastest possible performance.

The general process for provisioning storage is as follows:

  1. Create a storage pool that contains all physical disks in the cluster.
  2. Create a container that uses all of the available storage capacity in the storage pool.
  3. Mount the container as a datastore.

 

More to come over the next few weeks!