Visualizing the Conceptual Model for IT Architecture

I have previously written about putting together the conceptual model with logical and physical design; however, I want to dig a little deeper into the conceptual model. The conceptual model categorizes the assessment findings into requirements, constraints, assumptions, and risks:

  • Business requirements are provided by key stakeholders and the goal of every solution is to achieve each of these requirements.
  • Constraints are conditions that provide boundaries to the design.
    • These often get confused with requirements, but remember that a requirement should allow the architect to evaluate multiple options and make a design decision whereas a constraint dictates the answers and removes the ability for the architect to decide.
  • Assumptions list the conditions that are believed to be true, but are not confirmed:
    • By the time of deployment, all assumptions should be validated.
  • Risks are factors that might have a negatively affect the design.
    • All risks should be mitigated, if possible.

giphy-2.gif

Requirements

Describes what should be achieved in the project; describes what the solution will look like.

  • Example: The organization should comply with Sarbanes-Oxley regulations.
  • Example: The underlying infrastructure for any service defined as strategic should support a minimum of four 9s of uptime (99.99%).

The part that tends to trip people up is functional versus non-functional requirements.

Functional Requirements

A requirement specifies a function that a system or component should perform. These may include:

  • Business Rules
  • Authentication
  • Audit Tracking
  • Certification Requirements
  • Reporting Requirements
  • Historical Data
  • Legal or Regulatory Requirements

Non-Functional Requirements

A non-functional requirement is a statement of how a system should behave. These may include:

  • Performance – Response Time, Throughput, Utilization, Static Volumetric
  • Scalability
  • Capacity
  • Availability
  • Recoverability
  • Security
  • Manageability
  • Interoperability

Often times, non-functional requirements will be laid out as constraints — the part makes this concept murkier. In the context of a VCDX design, these should typically be defined as a constraint, whereas requirements are more typically functional requirements. Be careful how you word a non-functional requirement: if it’s stated as a must and there is no room for the architect to make a decision, then it’s a constraint. But if it is a should statement is gives more than one choice for a design decision then leave it as a requirement.

Constraints

Anything that limits the design choice made by the architect. If multiple options are not available to make a design decision, then it’s a constraint.

  • Example: Due to a pre-existing vendor relationship, host hardware has already been selected.

If this is a bit difficult to grasp, don’t worry, you are in good company. This is a question that appears often.

Untitled.png

In this example, because the business dictates that HP ProLiant blade servers must be used, then it is a constraint. This leaves no room for me, as the architect, to make a design decision — it has been already made for me.

Assumptions

Assumptions are design components that are assumed to be valid without proof. Documented assumptions should be validated during the design process. This means by the time the design is implemented, there should be no assumptions.

  • Example: The datacenter uses shared (core) networking hardware across production and nonproduction infrastructures.
  • Example: The organization has sufficient network bandwidth between sites to facilitate replication.
  • Example: Security policies dictate server hardware separation between DMZ servers and internal servers.

These examples are a bit of low-hanging fruit. Don’t be afraid to dig a little bit deeper. If there’s anything documented or stated without empirical proof, then it is an assumption and needs to be validated.

Risks

A risk is anything that may prevent achieving the project goals. All risks should be mitigated with clear SOPs.

  • Example: The organization’s main datacenter contains only a single core router, which is a single point of failure.
  • Example: The proposed infrastructure leverages NFS storage, with which the storage administrators have no experience.

No design is perfect and it is important to document as many risks as you can identify. This will give you the opportunity to be prepared and craft mitigation plans. Not paying close attention here may leave the design in a vulnerable state.

Additional Examples

Can you specify which conceptual model category is correct for each example?

Category

Description

Requirement The design should provide a centralized management console to manage both data centers.
Assumption The customer provides sufficient storage capacity for building the environment.
Constraint The storage infrastructure must use existing EMC storage arrays for this project.
Requirement The platform should be able to function with project growth of 20% per year.
Assumption Active Directory is available in both sites.
Requirement Solution should leverage and integrate with existing directory services.
Risk Both server racks are subject to the same environmental hazards.
Assumption BC/DR plans will be updated to include new hardware and workloads.
Requirement The SLA is 99% uptime.
Constraint External access must be through the standard corporate VPN client.
Risk Having vMotion traffic and VM data traffic on the same physical network can lead to security vulnerability because vMotion is clear text by default.

Resources

To learn more about the enterprise architecture or the VCDX program, please join me, Brett Guarino, Paul McSharry, and Chris McCain at VMworld on Wednesday, 29 August 2018 from 11:00-11:45 to discuss “Preparing for Your VCDX Defense”.

Problem Solving with the Cynefin Framework

Effective leaders know that problem solving is not “one-size-fits-all”. The action taken depends on the situation and, because the circumstances are changing, better decisions can be by using an adaptive approach. I have previously written about the 75% method that I learned in the military, but there’s another framework that I have consistently used with success.

Cynefin, pronounced “kih-neh-vihn” (don’t worry, I mispronounced it for longer than I’d like to admit), is a Welsh word that means “place”. The Cynefin framework was coined in 1999 by Dave Snowden. Simply, the Cynefin framework is used to help realize that not all situations are equal and to successfully navigate different situations, different responses are required.

Picture1

The 5 Domains

Problems are categorized into five domains using the Cynefin framework (yes, five, don’t forget disorder!).

Ordered Systems

The domains on the right (obvious and complicated) are “ordered” because cause-and-effect are known or can be discovered.

Obvious (fka “Simple”)

This is the domain of best practice.

In this context, the problems are apparent cause-and-effect relationships that are well understood.

The methodology is to “sense – categorize – respond” to obvious problems. This means that the situation should be assessed, categorized by type, and then respond based on an existing process or procedure. These tend to be repeating patterns and/or consistent events…or “known knowns”.

For example, these are problems faced at a helpdesk or call center – often predictable and there are established processes in place to handle the vast majority.

Be careful – some obvious contexts may be oversimplified. This happens when leaders (or organizations, for that matter) experience success and become complacent as a result. Ensure that there are feedback loops in place so that any situations that don’t exactly fit with an established category can be reported.

Another risk with complacency is that leaders may not be receptive to new ideas. Endeavor to stay willing to pursue a new or innovative suggestion.

Complicated

This is the domain of good practice. Sometimes referred to as the “domain of experts.”

Complicated problems may have multiple correct solutions. There is a relationship between cause and effect, but it may not be obvious to everyone because the problem is…well…complicated. There may be several symptoms but you are not sure how to fix them.

The methodology here is to “sense – analyze – respond”. Effectively you should assess the situation, analyze what is known (using the help of experts), and decide what the best response is using good practices. This is generally where we experience “known unknowns” where we know the questions that need to be answered, but may not know the actual answer. It is at this point that we consult the expert. With enough time, you could reasonably identify the known risk and develop a plan. Think evolutionary, not revolutionary.

The danger here is that a leader may lean too heavily on experts while ignoring good solutions from others. In tech, we tend to experience this where we rely on the experts and ignore the generalists – even though the generalist may have the winning answer. Additionally, the leader may experience analysis paralysis. This is where I recommend using the 75% method detailed here.

Unordered Systems

The domains on the left (complex and chaotic) are “unordered” because cause and effect can be deduced only with hindsight or potentially not at all.

Complex

This is the domain of emergent practice.

Sometimes it is impossible to identify a single correct solution or to spot the cause-and-effect relationship. You are likely in a complex context.

This context is typically unpredictable, making the best approach “probe – sense – respond”. Think “unknown unknowns”. You may not know the correct questions to be asking. Regardless of how much time is spent in analysis, it may not possible to accurately identify the risks, predict the solution, or the effort needed to solve the problem.

In this situation, it is best to patiently wait, look for patterns, develop, and experiment to gain more knowledge. As more knowledge is gained, then determine the next steps. Repeat as needed. The goal is to move into the “complicated” domain.

A potential risk is that leaders may fall back into habitual command-and-control modes which are futile in this context. Leaders lacking patience may try to force facts instead of waiting for patterns. It is imperative to have a feedback loop so that open discussion can occur to develop experiments for observing patterns. Think “what if we tried…” Use creativity to solve the problem.

Complicated and complex situations are similar in some ways, and are sometimes confused. If a decision based on incomplete data is being made, you are likely to be in a complex situation.

Chaotic

This is the domain of novel practice.

There is no relationship between cause-and-effect. This means that the primary goal here is to establish order and stability. This is likely a crisis or emergency situation.

The methodology is to “act – sense – respond”. It is necessary to be decisive in order to address the burning issues, determine where there is and isn’t stability, and then work to move the situation from chaos to complexity. Basically, shit has hit the fan – triage time: stop the bleeding and start the breathing… then determine what the real solution should be.

It may feel like in tech we live in this domain (hopefully not!). As an example, there may be an issue in production, say a bad patch that has been installed data center wide. Initially the focus will be on containing the issue and correcting it quickly. The initial solution may not be great, but it gets the job done. Once the bleeding has stopped then you can determine the better long-term solution.

In this situation, the leader must provide clear and direct communication while taking immediate action to re-establish order. A risk is an indecisive leader. This is the time to find “good enough” instead of the perfect answer.

Disorder

Disorder is the space in the middle.

There is no clarity here – decompose and move to another context. Basically, if you have no idea where you are, then you’re likely in “disorder”. The immediate goal is gather information in order to move to a known domain.

In this situation, I tend to try to break the massive disorder into smaller problems and then tackle each one individually. Apply each problem to a domain and work on a solution.

Chaotic problems are dangers, especially when left unaddressed, because there is no process to fix it. This is why it is important to move into a known category.

Final Thoughts

The Cynefin Framework is an excellent model to assist in approaching different situations. Once the situation is defined, then work to solve the problem.

The goal is to adequately lead your team through any of these five domains. Many leaders can only lead effectively in one or two domains (not in all of them) and few, if any, prepare their organizations for diverse contexts. The only way to successfully get through all five domains is to keep an open mind to new and creative solutions, build a feedback loop, and not get stuck in analysis paralysis.

Cynefin_framework_by_Edwin_Stoop

Additional Resources:

Cognitive Edge: The Cynefin Framework (explained by Snowden himself!)

Everyday Kanban: Understanding the Cynefin framework – a basic intro

Sherrieg: The Cynefin Framework

Harvard Business Review: A Leader’s Framework for Decision Making

Ch-ch-change Management

Change management has never been easy for the dev or the ops side of the house. Let’s face it; it’s usually a checklist item and a tool to CYA. However, we are moving to a world where change is a part of the culture and a frequent process. There is no excuse to not improve.

The ultimate goal of change management is to drive organizational results and outcomes by engaging the staff to encourage the adoption of a new way to work. Whether it is a process, system, job role, or organizational structure change (potentially…all of the above), a project can only successful if the individual changes daily behaviors and begins doing the job in a new way. This is the nature of change management.

Therefore, staffing a change management board with a crew of change-adverse individuals will get you nowhere.

Change Management

Often we look at change management as a way to spot problems after they happen. Thus it becomes a tool for responding to change, instead of leveraging change. In this world of DevOps that embraces change as a mechanism to iteratively improve on processes, change management is usually viewed as a blocker to avoid. But in most enterprises and verticals, it cannot be avoided.

Often we look at change management as a way to spot problems afterthey happen. Thus it becomes a tool for responding to change, instead of leveraging change.

In this world of DevOps that embraces change as a mechanism to iteratively improve on processes, change management is usually viewed as a blocker to avoid. But in most enterprises and verticals, it cannot be avoided.

Tooling and implementation can be detached from governance. This decoupling can result in lost communication and a reactive philosophy. Instead consider funneling all changes through the same channel so that nothing gets lost and the change advisory board (CAB) considers all changes. Begin by consolidating change, problem, and incident management into a modular platform that is a part of your DevOps tooling that can streamline everything into one pipeline.

feedback loop

This may seem outlandish at first, but by integrating change into pipelines automates the capture of change records with a set of artifacts. The goal is to ultimately improve collaboration and to build an auditable history.

Companies often establish different modes of change to balance speed, quality, and risk. Consider automating the approval gate for some modes of change. This speeds change processing and increases adoption. This shares the responsibility of effectively making change happen back on to those individuals who conduct the implementations.

Change management should be a priority and used as a single source of truth of all changes. Doing so will increase visibility for risk and compliance management.

We can distill this down to three key ideas to assist in implement efficient change management:

  • Do not decide a new direction and then dump it on your team. Involve them in the decision-making process.
  • Make work visible to all.
  • Embrace value stream mapping to find new ways to increase efficiency.

The bottom line is to be proactive about how change is managed.

Get Mapped: Value Stream Mapping

twain

Value stream mapping (VSM) does exactly that: it is a DevOps framework (“borrowed” from manufacturing) that provides a structured way for cross-functional teams to collectively see where we are today (long release cycles, silos, damage control afterwards, etc.) and where we want to be in the future (short release cycles, infrastructure as code, iterative development, continuous delivery, etc.).

A VSM is a way of getting people to collaborate and see what is really happening. These exercises are often amazing “aha!” moment workshops that make three objectives (flow, feedback, and continuous integration) turn into a sustainable engine of improvement.

Who should participate in a VSM?

  •    Service Stakeholders and Customers
  •    Executors of a Process Tasks
  •    Management

…but not all at the same time.

The VSM process assembles everyone involved with a workflow in the same room to clarify their roles in the product delivery process and identify bottlenecks, friction points and handoff concerns. Realistically, if we include everyone at the same time, the likelihood of honesty decreases. Let’s be for real – if upper management were in the room with you, would you be 100% honest as to where the bodies are buried or exactly what processes each step entails? VSM reveals steps in development, test, release and operations support that waste time or are needlessly complicated and this requires complete transparency.

Lead Time versus Time on Task

If you can’t measure it, you can’t improve it. Why do companies go for Continuous Delivery (CD)? Why do people care about DevOps? The main reason I hear is cycle time. This is the time it takes me to get from an idea to a product or feature that your customers can use. Measurement is one of the core foundations of DevOps, and the VSM is the measurement phase. If you do it right, it’s the sharing phase as well – share the measurements and proposed changes with the entire group. Doing that well allows you to start to change culture simultaneously.

Lead time vs time on task

With a solid foundation in place, it becomes easier to capture more sophisticated metrics around feature usage, customer journeys, and ensuring that service level agreements (SLAs) are met. The information received becomes handy when it’s time for road mapping and spec’ing out the next big project.

“Lead time” is a term borrowed from manufacturing, but in the software domain, lead time can be described more abstractly as the time elapsed between the identification of a requirement and its fulfillment.

The goal of VSM development is to measure how time is spent on each task and identify processes required for each task. It becomes easier to see what processes are inefficient and creating a bottleneck. In turn, this will reduce the lead time to deliver the finished release.

Current State

The following VSM demonstrates a current state analysis of the current software release process. The main thing to note in this example is how linear it is – there are only two feedback loops: at the very beginning and towards the end at new feature testing.

current state

The apparent lack of feedback loops presents a potential problem area – there are 8 steps between the two feedback loops. Imagine getting all the way to the end before realizing there’s an issue and providing feedback. How far will the software release be set back if the problem is not detected and communicated until the new release testing phase?

Future State

Once you have the current state VSM mapped, the next step is to figure out a way to make the mapping more efficient. This is typically driven by the following:

  • How can we significantly increase the percent complete and accurate work for each step in our current state VSM?
  • How can we dramatically reduce, or even eliminate the non-productive time in the lead time of each current state step?
  • How can we improve the performance of the value added time in each current state step?

future

Realistically, no VSM is perfect. However, the future state that we see above demonstrates a set of processes that create a mostly ongoing feedback loop. This allows for continuous communication about the processes and release as it moves forward towards a qualified build.

Demonstrating Business Value

In the manufacturing plants, they would have one pipeline, one production line at a time. As we know, the modern software development world is not like that.

A VSM is about more than just dissecting the software delivery lifecycle to find bottlenecks and pain points, although it is certainly helpful in that area. Analyzing value streams gives management confidence that the business is focusing on the right projects and initiatives. By taking a clearer look at the KPIs and metrics across the tooling and scaling the entire organization, these leaders can make informed decisions the way most business leaders prefer to—with data to back them up.

Architecting a vSphere Upgrade

At the time of writing, there are 197 days left before vSphere 5.5 is end of life and no longer supported. I am currently in the middle of an architecture project at work and was reminded of the importance of upgrading — not just for the coolest new features, but for the business value in doing so.

giphy

Last year at VMworld, I had the pleasure of presenting a session with the indomitable Melissa Palmer entitled “Upgrading to vSphere 6.5 – the VCDX Way.” We approached the question of upgrading by using architectural principles rather than clicking ‘next’ all willy-nilly.

Planning Your Upgrade

When it comes to business justification, simply saying “it’s awesome” or “latest and greatest” simply does not cut it.

Better justification is:

  • Extended lifecycle
  • Compatibility (must upgrade to ESXi 6.5 for VSAN 6.5+)
  • vCenter Server HA to ensure RTO is met for all infrastructure components
  • VM encryption to meet XYZ compliance

It is important to approach the challenge of a large-scale upgrade using a distinct methodology. Every architect has their own take on methodology, it is unique and personal to the individual but it should be repeatable. I recommend planning the upgrade project end-to-end before beginning the implementation. That includes an initial assessment (to determine new business requirements and compliance to existing requirements) as well as a post-upgrade validation (to ensure functionality and that all requirements are being met).

There are many ways to achieve a current state analysis, such as using vRealize Operations Manager, the vSphere Optimization Assessment, VMware {code} vCheck for vSphere, etc.

I tend to work through any design by walking through the conceptual model, logical design, and then physical. If you are unfamiliar with these concepts, please take a look at this post.

An example to demonstrate:

  • Conceptual –
    • Requirement: All virtual infrastructure components should be highly available.
  • Logical –
    • Design Decision: Management should be separate from production workloads.
  • Physical –
    • Design Decision: vCenter Server HA will be used and exist within the Management cluster.

However, keep in mind that this is not a journey that you may embark on solo. It is important to include members of various teams, such as networking, storage, security, etc.

Future State Design

It is important to use the current state analysis to identify the flaws in the current design or improvements that may be made. How can upgrading allow you to solve these problems? Consider the design and use of new features or products. Not every single new feature will be applicable to your current infrastructure. Keep in mind that everything is a trade off – improving security may lead to a decrease in availability or manageability.

When is it time to re-architect the infrastructure versus re-hosting?

  • Re-host – to move from one infrastructure platform to another
  • Re-architect – to redesign, make fundamental design changes

Re-hosting is effectively “lifting-and-shifting” your VMs to a newer vSphere version. I tend to lean toward re-architecting as I view upgrades as an opportunity to revisit the architecture and make improvements. I have often found myself working in a data center and wondering “why the hell did someone design and implement storage/networking/etc. that way?” Upgrades can be the time to fix it. This option may prove to be more expensive, but, it can also be the most beneficial. Now is a good time to examine the operational cost of continuing with old architectures.

Ensure to determine key success criteria before beginning the upgrade process. Doing a proof of concept for new features may prove business value. For example, if you have a test or dev cluster, perhaps upgrade it to the newest version and demo using whatever new feature to determine relevance and functionality.

Example Upgrade Plans

Rather than rehashing examples of upgrading, embedded is a copy of our slides from VMworld which contain two examples of upgrading:

  • Upgrading from vSphere 5.5 to vSphere 6.5 with NSX, vRA, and vROPs
  • Upgrading from vSphere 6.0 to vSphere 6.5 with VSAN and Horizon

These are intended to be examples to guide you through a methodology rather than something that should be copied exactly.

Happy upgrading!

Understanding Erasure Coding with Rubrik

It is imperative for any file system to be highly scalable, performant, and fault tolerant. Otherwise…why would you even bother to store data there? But realistically, achieving fault tolerance is done through data redundancy. On the flipside, the cost of redundancy is increased storage overhead. There are two possible encoding schemes for fault tolerance: triple mirroring (RF3) and erasure coding. To ensure the Scale Data Distributed Filesystem (SDFS, codenamed “Atlas”) is fault tolerant while increasing capacity and maintaining higher performance, Rubrik uses a schema called erasure coding.

Read More »

#VirtualDesignMaster Wrap-Up

Part of me feels like it flew by but then I remember the hours spent reviewing all the designs (*ahem* Adam) and then it feels like it took an eternity to get through. Admittedly, Virtual Design Master was probably one of the coolest community driven events in which I have participated. If you are unfamiliar with Virtual Design Master, I strongly encourage you to check out the site and catch up with the five seasons.Read More »

Virtual Design Master: Conceptual, Logical, Physical

This year I am honored to be one of the Virtual Design Master (vDM) judges. If you are unfamiliar with vDM, it is a technology driven reality competition that showcases virtualization community member and their talents as architects. Some competitors are seasoned architect while others are just beginning their design journey. To find out more information, please click here. One of the things that I, along with the other judges, noticed is that many of the contestants did not correctly document conceptual, logical, and physical design.

The best non-IT example that I have seen of this concept the following image:

C-3NztMXYAIRbF9.jpg-large
(Figure 1) Disclaimer: I’m not cool enough to have thought of this. I think all credit goes to @StevePantol.

The way I always describe and diagram design methodology is using the following image:

Screen Shot 2017-07-06 at 9.50.05 PM
(Figure 2) Mapping it all together

I will continue to refer to both images as we move forward in this post.

Conceptual

During the assess phase, the architect reaches out to the business’ key stakeholders for the project and explore what each need and want to get out of the project. The job is to identify key constraints and the business requirements that should be met for the design, deploy, and validation phases to be successful.

The assessment phase typically coincides with building the conceptual model of a design. Effectively, the conceptual model categorizes the assessment findings into requirements, constraints, assumptions, and risks categories.

For example:

 Requirements –

  1. technicloud.com should create art.
  2. The art should be durable and able to withstand years of appreciation.
  3. Art should be able to be appreciated by millions around the world.

Constraints –

  1. Art cannot be a monolithic installation piece taking up an entire floor of the museum.
  2. Art must not be so bourgeoisie that it cannot be appreciated with an untrained eye.
  3. Art must not be paint-by-numbers.

Risks –

  1. Lead IT architect at technicloud.com has no prior experience creating art.
    • Mitigation – will require art classes to be taken at local community college.
  2. Lead IT architect is left-handed which may lead to smearing of art.
    • Mitigation – IT architect will retrain as ambidextrous.

Assumptions –

  1. Art classes at community college make artists.
  2. Museum will provide security as to ensure art appreciators do not damage artwork.

As you read through the requirements and constraints, the idea of how the design should look should be getting clearer and clearer. More risks and assumptions will be added as design decisions are made and the impact is analyzed. Notice that the conceptual model was made up entirely of words? Emphasis on “concept” in the word “conceptual”!

Logical

Once the conceptual model is built out, the architect moves into the logical design phrase (which indicated by the arrows pointing backwards in Figure 2, demonstrating dependence on conceptual). Logical design is where the architect begins making decisions but at a higher level.

Logical art work design decisions –

  1. Art will be a painting.
  2. The painting will be of a person.
  3. The person will be a woman.

For those who are having a hard time following with the art example, a tech example would be:

Screen Shot 2017-07-06 at 10.27.57 PM
(Table 1) Logical design decision example

An example of what a logical diagram may look something like this:

Picture1
(Figure 3) Logical storage diagram example

Notice that these are ‘higher’ level decisions and diagrams. We’re not quite to filling in the details yet when working on logical design. However, note that these design decisions should map back to the conceptual model.

Physical

Once the logical design has been mapped out, architect moves to physical design where hardware and software vendors are chosen and configuration specifications are made. Simply put, this is the phase where the details are determined.

Physical art work design decisions –

  1. The painting will be a half-length portrait.
  2. The medium will be oil on a poplar panel.
  3. The woman will have brown hair.

Once again, if you hate the Mona Lisa then the IT design decision example would be:

  1. XYZ vendor and model of storage array will be purchased.
  2. Storage policy based management will be used to place VMs on the correct storage tier.
  3. Tier-1 LUNs will be replicated hourly.

These are physical design decisions, which directly correlate and extend the logical design decisions with more information. But, again, at the end of the day, this should all tie back to meeting the business requirements.

Screen Shot 2017-07-06 at 10.32.54 PM
(Table 2) Physical design decision example

An example of a physical design would be something like:

phys
(Figure 4) Physical storage diagram example

Notice that in this diagram, we’re starting to see more details: vendor, model, how things are connected, etc. Remember that physical should expand on logical design decisions and fill in the blanks. At the end of the day, both logical and physical design decisions should map back to meeting the business requirements set forth in the conceptual model (as evidenced by Figure 2).

Final Thoughts

Being able to quickly and easily distinguish takes time and practice. I am hoping this clarifies some of the mystery and confusion surrounding this idea. Looking forward to seeing more vDM submissions next week.