Get Mapped: Value Stream Mapping

twain

Value stream mapping (VSM) does exactly that: it is a DevOps framework (“borrowed” from manufacturing) that provides a structured way for cross-functional teams to collectively see where we are today (long release cycles, silos, damage control afterwards, etc.) and where we want to be in the future (short release cycles, infrastructure as code, iterative development, continuous delivery, etc.).

A VSM is a way of getting people to collaborate and see what is really happening. These exercises are often amazing “aha!” moment workshops that make three objectives (flow, feedback, and continuous integration) turn into a sustainable engine of improvement.

Who should participate in a VSM?

  •    Service Stakeholders and Customers
  •    Executors of a Process Tasks
  •    Management

…but not all at the same time.

The VSM process assembles everyone involved with a workflow in the same room to clarify their roles in the product delivery process and identify bottlenecks, friction points and handoff concerns. Realistically, if we include everyone at the same time, the likelihood of honesty decreases. Let’s be for real – if upper management were in the room with you, would you be 100% honest as to where the bodies are buried or exactly what processes each step entails? VSM reveals steps in development, test, release and operations support that waste time or are needlessly complicated and this requires complete transparency.

Lead Time versus Time on Task

If you can’t measure it, you can’t improve it. Why do companies go for Continuous Delivery (CD)? Why do people care about DevOps? The main reason I hear is cycle time. This is the time it takes me to get from an idea to a product or feature that your customers can use. Measurement is one of the core foundations of DevOps, and the VSM is the measurement phase. If you do it right, it’s the sharing phase as well – share the measurements and proposed changes with the entire group. Doing that well allows you to start to change culture simultaneously.

Lead time vs time on task

With a solid foundation in place, it becomes easier to capture more sophisticated metrics around feature usage, customer journeys, and ensuring that service level agreements (SLAs) are met. The information received becomes handy when it’s time for road mapping and spec’ing out the next big project.

“Lead time” is a term borrowed from manufacturing, but in the software domain, lead time can be described more abstractly as the time elapsed between the identification of a requirement and its fulfillment.

The goal of VSM development is to measure how time is spent on each task and identify processes required for each task. It becomes easier to see what processes are inefficient and creating a bottleneck. In turn, this will reduce the lead time to deliver the finished release.

Current State

The following VSM demonstrates a current state analysis of the current software release process. The main thing to note in this example is how linear it is – there are only two feedback loops: at the very beginning and towards the end at new feature testing.

current state

The apparent lack of feedback loops presents a potential problem area – there are 8 steps between the two feedback loops. Imagine getting all the way to the end before realizing there’s an issue and providing feedback. How far will the software release be set back if the problem is not detected and communicated until the new release testing phase?

Future State

Once you have the current state VSM mapped, the next step is to figure out a way to make the mapping more efficient. This is typically driven by the following:

  • How can we significantly increase the percent complete and accurate work for each step in our current state VSM?
  • How can we dramatically reduce, or even eliminate the non-productive time in the lead time of each current state step?
  • How can we improve the performance of the value added time in each current state step?

future

Realistically, no VSM is perfect. However, the future state that we see above demonstrates a set of processes that create a mostly ongoing feedback loop. This allows for continuous communication about the processes and release as it moves forward towards a qualified build.

Demonstrating Business Value

In the manufacturing plants, they would have one pipeline, one production line at a time. As we know, the modern software development world is not like that.

A VSM is about more than just dissecting the software delivery lifecycle to find bottlenecks and pain points, although it is certainly helpful in that area. Analyzing value streams gives management confidence that the business is focusing on the right projects and initiatives. By taking a clearer look at the KPIs and metrics across the tooling and scaling the entire organization, these leaders can make informed decisions the way most business leaders prefer to—with data to back them up.

Architecting a vSphere Upgrade

At the time of writing, there are 197 days left before vSphere 5.5 is end of life and no longer supported. I am currently in the middle of an architecture project at work and was reminded of the importance of upgrading — not just for the coolest new features, but for the business value in doing so.

giphy

Last year at VMworld, I had the pleasure of presenting a session with the indomitable Melissa Palmer entitled “Upgrading to vSphere 6.5 – the VCDX Way.” We approached the question of upgrading by using architectural principles rather than clicking ‘next’ all willy-nilly.

Planning Your Upgrade

When it comes to business justification, simply saying “it’s awesome” or “latest and greatest” simply does not cut it.

Better justification is:

  • Extended lifecycle
  • Compatibility (must upgrade to ESXi 6.5 for VSAN 6.5+)
  • vCenter Server HA to ensure RTO is met for all infrastructure components
  • VM encryption to meet XYZ compliance

It is important to approach the challenge of a large-scale upgrade using a distinct methodology. Every architect has their own take on methodology, it is unique and personal to the individual but it should be repeatable. I recommend planning the upgrade project end-to-end before beginning the implementation. That includes an initial assessment (to determine new business requirements and compliance to existing requirements) as well as a post-upgrade validation (to ensure functionality and that all requirements are being met).

There are many ways to achieve a current state analysis, such as using vRealize Operations Manager, the vSphere Optimization Assessment, VMware {code} vCheck for vSphere, etc.

I tend to work through any design by walking through the conceptual model, logical design, and then physical. If you are unfamiliar with these concepts, please take a look at this post.

An example to demonstrate:

  • Conceptual –
    • Requirement: All virtual infrastructure components should be highly available.
  • Logical –
    • Design Decision: Management should be separate from production workloads.
  • Physical –
    • Design Decision: vCenter Server HA will be used and exist within the Management cluster.

However, keep in mind that this is not a journey that you may embark on solo. It is important to include members of various teams, such as networking, storage, security, etc.

Future State Design

It is important to use the current state analysis to identify the flaws in the current design or improvements that may be made. How can upgrading allow you to solve these problems? Consider the design and use of new features or products. Not every single new feature will be applicable to your current infrastructure. Keep in mind that everything is a trade off – improving security may lead to a decrease in availability or manageability.

When is it time to re-architect the infrastructure versus re-hosting?

  • Re-host – to move from one infrastructure platform to another
  • Re-architect – to redesign, make fundamental design changes

Re-hosting is effectively “lifting-and-shifting” your VMs to a newer vSphere version. I tend to lean toward re-architecting as I view upgrades as an opportunity to revisit the architecture and make improvements. I have often found myself working in a data center and wondering “why the hell did someone design and implement storage/networking/etc. that way?” Upgrades can be the time to fix it. This option may prove to be more expensive, but, it can also be the most beneficial. Now is a good time to examine the operational cost of continuing with old architectures.

Ensure to determine key success criteria before beginning the upgrade process. Doing a proof of concept for new features may prove business value. For example, if you have a test or dev cluster, perhaps upgrade it to the newest version and demo using whatever new feature to determine relevance and functionality.

Example Upgrade Plans

Rather than rehashing examples of upgrading, embedded is a copy of our slides from VMworld which contain two examples of upgrading:

  • Upgrading from vSphere 5.5 to vSphere 6.5 with NSX, vRA, and vROPs
  • Upgrading from vSphere 6.0 to vSphere 6.5 with VSAN and Horizon

These are intended to be examples to guide you through a methodology rather than something that should be copied exactly.

Happy upgrading!

Delving into Immutable Infrastructure

Before getting too far into the topic, let’s first take a quick look at the difference between mutability and immutability.

Mutable Immutable
Continually updated, patched, and tuned to meet the ongoing needs of the purpose it serves. State does not change or deviated once constructed. Any changes result in the deployment of a new version rather than modifying existing.

Read More »

Understanding Erasure Coding with Rubrik

It is imperative for any file system to be highly scalable, performant, and fault tolerant. Otherwise…why would you even bother to store data there? But realistically, achieving fault tolerance is done through data redundancy. On the flipside, the cost of redundancy is increased storage overhead. There are two possible encoding schemes for fault tolerance: triple mirroring (RF3) and erasure coding. To ensure the Scale Data Distributed Filesystem (SDFS, codenamed “Atlas”) is fault tolerant while increasing capacity and maintaining higher performance, Rubrik uses a schema called erasure coding.

Read More »

Understanding MongoDB’s Replica Sets

As a part of its native replication, MongoDB maintains multiple copies of data in a construct called a replica set.

Replica Sets

So, what is a replica set? A replica set in MongoDB is a group of mongod (primary daemon process for the MongoDB system) process that maintains the same data set. Put simply, it is a group of MongoDB servers operating in a primary / secondary failover fashion. Replica sets provide redundancy and high availability.

Read More »

#VirtualDesignMaster Wrap-Up

Part of me feels like it flew by but then I remember the hours spent reviewing all the designs (*ahem* Adam) and then it feels like it took an eternity to get through. Admittedly, Virtual Design Master was probably one of the coolest community driven events in which I have participated. If you are unfamiliar with Virtual Design Master, I strongly encourage you to check out the site and catch up with the five seasons.Read More »

Virtual Design Master: Conceptual, Logical, Physical

This year I am honored to be one of the Virtual Design Master (vDM) judges. If you are unfamiliar with vDM, it is a technology driven reality competition that showcases virtualization community member and their talents as architects. Some competitors are seasoned architect while others are just beginning their design journey. To find out more information, please click here. One of the things that I, along with the other judges, noticed is that many of the contestants did not correctly document conceptual, logical, and physical design.

The best non-IT example that I have seen of this concept the following image:

C-3NztMXYAIRbF9.jpg-large
(Figure 1) Disclaimer: I’m not cool enough to have thought of this. I think all credit goes to @StevePantol.

The way I always describe and diagram design methodology is using the following image:

Screen Shot 2017-07-06 at 9.50.05 PM
(Figure 2) Mapping it all together

I will continue to refer to both images as we move forward in this post.

Conceptual

During the assess phase, the architect reaches out to the business’ key stakeholders for the project and explore what each need and want to get out of the project. The job is to identify key constraints and the business requirements that should be met for the design, deploy, and validation phases to be successful.

The assessment phase typically coincides with building the conceptual model of a design. Effectively, the conceptual model categorizes the assessment findings into requirements, constraints, assumptions, and risks categories.

For example:

 Requirements –

  1. technicloud.com should create art.
  2. The art should be durable and able to withstand years of appreciation.
  3. Art should be able to be appreciated by millions around the world.

Constraints –

  1. Art cannot be a monolithic installation piece taking up an entire floor of the museum.
  2. Art must not be so bourgeoisie that it cannot be appreciated with an untrained eye.
  3. Art must not be paint-by-numbers.

Risks –

  1. Lead IT architect at technicloud.com has no prior experience creating art.
    • Mitigation – will require art classes to be taken at local community college.
  2. Lead IT architect is left-handed which may lead to smearing of art.
    • Mitigation – IT architect will retrain as ambidextrous.

Assumptions –

  1. Art classes at community college make artists.
  2. Museum will provide security as to ensure art appreciators do not damage artwork.

As you read through the requirements and constraints, the idea of how the design should look should be getting clearer and clearer. More risks and assumptions will be added as design decisions are made and the impact is analyzed. Notice that the conceptual model was made up entirely of words? Emphasis on “concept” in the word “conceptual”!

Logical

Once the conceptual model is built out, the architect moves into the logical design phrase (which indicated by the arrows pointing backwards in Figure 2, demonstrating dependence on conceptual). Logical design is where the architect begins making decisions but at a higher level.

Logical art work design decisions –

  1. Art will be a painting.
  2. The painting will be of a person.
  3. The person will be a woman.

For those who are having a hard time following with the art example, a tech example would be:

Screen Shot 2017-07-06 at 10.27.57 PM
(Table 1) Logical design decision example

An example of what a logical diagram may look something like this:

Picture1
(Figure 3) Logical storage diagram example

Notice that these are ‘higher’ level decisions and diagrams. We’re not quite to filling in the details yet when working on logical design. However, note that these design decisions should map back to the conceptual model.

Physical

Once the logical design has been mapped out, architect moves to physical design where hardware and software vendors are chosen and configuration specifications are made. Simply put, this is the phase where the details are determined.

Physical art work design decisions –

  1. The painting will be a half-length portrait.
  2. The medium will be oil on a poplar panel.
  3. The woman will have brown hair.

Once again, if you hate the Mona Lisa then the IT design decision example would be:

  1. XYZ vendor and model of storage array will be purchased.
  2. Storage policy based management will be used to place VMs on the correct storage tier.
  3. Tier-1 LUNs will be replicated hourly.

These are physical design decisions, which directly correlate and extend the logical design decisions with more information. But, again, at the end of the day, this should all tie back to meeting the business requirements.

Screen Shot 2017-07-06 at 10.32.54 PM
(Table 2) Physical design decision example

An example of a physical design would be something like:

phys
(Figure 4) Physical storage diagram example

Notice that in this diagram, we’re starting to see more details: vendor, model, how things are connected, etc. Remember that physical should expand on logical design decisions and fill in the blanks. At the end of the day, both logical and physical design decisions should map back to meeting the business requirements set forth in the conceptual model (as evidenced by Figure 2).

Final Thoughts

Being able to quickly and easily distinguish takes time and practice. I am hoping this clarifies some of the mystery and confusion surrounding this idea. Looking forward to seeing more vDM submissions next week.

The Importance of Mentorship: Building the Mentoring Relationship (Part IV)

Previously discussed topics were the importance of mentoring and leadership, as well as the roles of a mentor and roles of a mentee. To view the entire series:

This post will dive into what is required to build the mentoring relationship. I’m going to take a different approach this time; instead of writing a bunch of paragraphs, I am going to summarize the phases of the mentoring relationship by using an outline. Let’s see how this goes!

At the beginning of the mentoring relationship, the mentor and mentee should discuss how the partnership should be structured. Regardless of whether the mentoring relationship takes a formal structure, there are typically a few phases that take place:

giphy

  • Building rapport
    • In this phase the mentor and mentee are exploring whether or not they can work together.
    • Simply get acquainted with one another (number of years in the industry, technical skillset, common skills, similar career paths, etc.)
    • Determine purpose of relationship and establish expectations.
    • Active listening, being respectful, being open and honest help build reciprocal trust in this stage.
  • Setting direction
    • This phase is all about setting goals. Once there is rapport and the relationship has established its sense of purpose, then determine what should be achieved.
    • Discuss the overall mentoring goals; for example, the following questions could be asked:
      • What are your visions and career aspirations?
      • Where is your career right now?
      • What are your strengths, weaknesses?
      • What is your behavioral style?
      • What is your leadership style?
      • What are your top three goals?
      • How can the mentoring relationship help to build new technical skills, explore new ideas, forge a new career path, expand your network, etc.?
    • If this is a formal mentoring process, a “Mentoring Partnership Agreement” could be established to determine clear goals, roles, and responsibilities, as well as setting a schedule.
  • Recap and progress
    • Recap the mentoring sessions at the end of each session. Also consider reviewing the progress between sessions at the beginning of the session.
    • The progression piece is typically the longest of all the phases. This phase can be perpetual depending on the length of the mentoring relationship.
    • Work together to accomplish the established goals.

Qualities of a successful mentoring relationship:

  • Articulation – the mentor should be able to help the mentee articulate their feelings, thoughts and ideas. The mentee may still be learning this skill.
  • Listening – both the mentee and the mentor should exhibit active listening skills.
  • Respect – without respect, the relationship will not achieve a level of openness required for successful mentoring.
  • Goal clarity – both the mentee and mentor need to have a clear understanding of the mentee’s objectives. It would also be good for the mentee to know the mentor’s goals in order for the relationship to be more reciprocal.
  • Challenging – the mentee and mentor should both be challenged during this relationship.
  • Self-awareness – the mentor should be proactive and insightful in order to appropriately guide the mentee using his or her own experience. The mentee should be self-aware in order to be able learn from the mentor’s example and advice.
  • Commitment to learning – both the mentor and the mentee should be learning and growing as a part of this relationship.
  • Reflection / preparation – a common reason why mentoring relationships fail is because one party or both fails to invest time preparing for or carrying through with the time investment.

 

Nutanix One-Click Upgrades for ESXi Patching

This is the first guest post of what I hope to be many from the great Herb Estrella:

In my personal experience Nutanix one-click upgrades work as advertised, but there are few items that should be accounted for in preparation of installing ESXi patches on a Nutanix cluster. This post will cover a few pre-requisites to look for, touch on the subtasks of the patching procedure, and finally close out with some troubleshooting tips and links to resources that I found helpful.

If you’ve seen the Dr. Strange movie you’ll find that going through the one-click upgrade process is loosely akin to reading from the “Book of Cagliostro” in that “the warnings come after you read the spell.”

giphy-3

There is a pre-upgrade process that is done prior patching that catches a few items but here a few pre-requisites that I found will set you up nicely for success:

  • vSphere HA/DRS settings need to be set according to best practices aka “recommended practices” as these account for the CVM and a few other items that make a Nutanix cluster in vSphere different.
  • DRS Rules (affinity/anti-affinity rules), if in use, can also cause problems. For example, if you have a 3 node cluster and 3 VMs part of anti-affinity rules, it is a good idea to temporarily disable the rules. Re-enable the rules when patching is complete.
  • ISOs mounted (namely due to VMware Tools installs) are major culprits for VMs not moving when scheduled to by DRS or moved manually. I recommend to unmount any ISOs that aren’t accessible from all hosts within a cluster.

Subtasks are the steps in the one-click upgrade sequence from start to finish. Below are a listing of them with some observations from each.

one-click-upgrade-post

  • Downloading hypervisor bundle on this CVM
    • When the patch is initially uploaded it is stored in a CVM’s following directory:  /home/nutanix/software_uploads/hypervisor/. How you access Prism determines which CVM this hypervisor bundle (aka patch) will reside on first. This should be mostly transparent but this is one of those “good to know” items. The hypervisor bundle needs to be copied from the initial source location onto the CVM for which its host is being upgraded by the one-click upgrade process if this fails “no soup for you.”
  • Waiting to upgrade the hypervisor
    • …nothing to see here…
  • Starting upgrade process on this hypervisor
    • …keep it moving…
  • Copying bundle to hypervisor
    • …business as usual…
  • Migrating user VMs on the hypervisor
    • Huzzah! This is a good one to pay attention to especially if the pre-requisites previously covered are not addressed. The upgrade will most likely timeout/fail here and it may not give you any helpful information as to why.
    • This is also a good spot to watch the Tasks/Events tab on the ESXi host being patched to get some better insight in the process.
  • Installing bundle on the hypervisor
    • If all VMs have been successfully migrated, the host should be in maintenance mode with the CVM shutdown. This step also takes the longest…so patience is key.
  • Completed hypervisor upgrade on this node
    • At this stage the host is ready to run VMs as it should now be out of maintenance mode with the CVM powered on.

In the test environment I was working with I made a lot of assumptions and just dove head first. The results as you can imagine were not good. Here are a few troubleshooting measures I used to help right my wrongs.

  • The upload process for getting the ESXi patch to the CVM is straight forward; however there are two ways to do it: download a json direct from the Nutanix support portal or enter the MD5 info from the patch’s associated KB article. I chose to upload a json and purposefully use the wrong patch and now I can’t delete the json even after completing the upgrade. If I find out how to resolve this issue I’ll update this post. This is where knowing the file location of the patch on the CVM can be helpful (/home/nutanix/software_upload/hypervisor) because the patch can be deleted or replaced.
  • Restarting Genesis! This one is a major key. For example, the one-click upgrade is stuck, a VN didn’t migrate, and even after the VM is manually migrated the one-click upgrade won’t just continue where it left off. In my experience to resolve this you’ll need to give it a little nudge in the form of a genesis restart. Run this command (genesis restart) on the CVM that failed, if that doesn’t work trying restarting genesis on the other hosts in the cluster. I was doing this in a test environment and did an allssh genesis restart and was able to get the process moving, but results may vary. If you err on the side of caution restart genesis one at a time manually.
  • Some helpful commands to find errors in logs
    • grep ERROR ~/data/logs/host_preupgrade.out
    • grep ERROR ~/data/logs/host_upgrade.out
  • For the admins that aren’t about that GUI life you can run the one-click upgrade command from a CVM
    • cluster –md5sum=<md5 from vmware portal> –bundle<full path to the bundle location on the CVM> host_upgrade
  • To check on the status host_upgrade_status

Links:

  • One click upgrades via vmwaremine
  • Troubleshooting KB article via Nutanix Support Portal, may require Portal access to view.
  • vSphere settings via Nutanix Support Portal, may require Portal access to view.

     

Bonus thoughts: Do I need vSphere Update Manager if I’m using Nutanix? This could be a post on it’s own (and it still might be) but I have some thoughts I’d like to share.

  • Resources
    • In a traditional setup you will most likely have vSphere Update Manager installed on a supported Windows VM (unless VCSA 6.5) with either SQL Express or a DB created on an existing SQL server. One-click upgrade is built into Prism.
  • Compliance
    • Prism has visibility into the ESXi hosts for versioning so if a host was “not like the others” then it would pop up on a NCC check or in the Alerts in Prism.
  • vCenter Plugin
    • This one is worth mentioning but really not a huge deal. It’s one less thing to worry about and ties back into the resources statements above.
  • My Answer
    • It depends on if I’m all in with Nutanix because if my entire infrastructure were Nutanix hosts then I would not deploy vSphere Update Manager.