Category Archives: Big Data

The Paradox of Cloud Computing Adoption – Part II

In Part I of this post, I discussed the overview of Cloud Computing adoption and the importance of the primary factors, scale and scope. What are other considerations that dictate Cloud strategy and have a bearing upon customer adoption? Some of these are listed below.

1. Variability of Workload

Workloads are assumed to be fixed or variable; in reality, they are almost always variable. They can vary at low or high rates, and this variability can be predictable or unpredictable (H&R Block during tax season vs. social media spikes during major events). It is difficult to accommodate a highly variable workload in-house or in a private cloud – the two available options are to either overbuild at a large cost or suffer degraded performance during workload spikes. A public Cloud capitalizes upon the fact that workload spikes between its customers are not correlated; if they were, the public Cloud could be susceptible to similar performance issues as well.

2. Data Intensity

Enterprises have different characteristics in terms of data generation, based primarily upon the nature of their business. Some organizations generate a large amount of data (Social Media, Insurance companies, Pharmaceutical companies, and Manufacturing companies). Others generate relatively lesser amounts of data (early-stage startups, mobile app companies). It is relatively easier to migrate low data intensity applications to Cloud providers, and such migrations are likely to involve lower operational costs as well.

3. Externally generated data vs. internal

Companies also differ by where their data is generated. Social media companies experience externally generated data, whereas Pharmaceutical or Manufacturing companies generate their data in-house via test or manufacturing equipment. It is not easy to migrate these to the Cloud due to the data volumes involved, and if there are Compliance and IP considerations involved (see #5 below).

4. Interoperations between applications, and data movement within the enterprise

Companies could have several standalone applications with relatively limited interaction. For example, Twitter’s Operations, internal email, and financial systems minimally interact with each other, if at all. On the other hand, an insurance company’s Claims applications, Data Warehouse, Marketing applications, and email systems (workflow) are all linked with well-defined data flows. Here it would be a significant task to replace any of these components to a Cloud-based service.

Another hurdle with high interoperability of applications is large scale data movement within the organization. Various functions and locations within an organization access data created by each other; internal network connections are created to implement and optimize these data flows, which is a complex task to accomplish across locations and service providers.

5. Legacy issues and IP considerations

Cloud service providers support most modern protocols, such as SOAP and REST within their PaaS offerings. However, they do not have support for legacy protocols that are in use at enterprises. This constitutes a significant hurdle that cannot be easily overcome. This is not an issue for new companies, but a rather significant issue for established companies that have been in business for many years. Organizations will continue to replace such applications with new services where feasible, but this is a slow process.

6. SLAs and OLAs required by the business

Corporations are increasingly required to provide SLAs and OLAs to their customers for all operational aspects of their business. These flow back to the Business Applications and IT Operations as corresponding SLAs/OLAs. In case an application or a set of applications are migrated to Cloud providers, corresponding SLAs/OLAs need to be established with the Cloud providers. This may or may not be possible based upon options that the provider offers, may be financially prohibitive, or difficult to enforce. Amazon offers a 99.9999% availability for its EC2 infrastructure, yet we have had several outages that violate such an SLA. However, the contract would have been written in the provider’s favor, insulating them from serious financial impact, and usually limited to the amount paid for the service feature that failed. However, the damage to the customer’s reputation and financials is immense.

For example, a provider that offers a 99.9% SLA would remain in compliance with a single 8 hour outage during an entire calendar year. Would EBay or Amazon tolerate this one week before Thanksgiving?

7. Regulatory and Compliance limitations

Regulatory and compliance issues are little understood by most players in the Cloud business; even industry experts can be misled by some of the provider certifications. The reality for companies is that they are responsible to their customers, shareholders, and to regulatory authorities for enforcement of privacy, security, and integrity of their customer data. Cloud provider certifications mean that these providers follow best practices; however, in case customer data is lost or compromised, it is hard to see that the Cloud provider would step up to take financial responsibility. Even worse, smaller providers might simply close their doors, leaving all their customers in an extremely difficult situation.

8. Impact of network costs

Network costs are a major source of expense for organizations in the present situation, representing significant chunks of IT/Telecom budgets for most. The reasons that these are exceedingly high are due to the nature of large capital projects for network capacity, and the small number of providers operating as an effective oligopoly.

With migration to the Cloud, there is an increase in network traffic, and a corresponding increase in costs for companies on multiple accounts.

  • All client access will move data to external providers.
  • Movement of data occurs between multiple applications/providers
  • Data communication to/from in-house legacy applications
  • Movement of internally generated data to external providers

Each of the above items represents data that was being moved within the organization’s Local Area Network (LAN), and will now be routed over the Wide Area Network (WAN) of a telecom provider. LAN costs are relatively very low, and provide a stable and high performance network path, whereas WAN costs are quite high, even at much lower bandwidths. In addition, usage of WANs introduces significant latencies that applications may or may not be able to tolerate. WANs are also more susceptible to failure than LANs as well. This represents a reduction in the reliability of network and application infrastructure.

Let us examine the impact of these factors on a few types of companies. Dark colors below indicate strong impact based on this dimension and lighter colors imply lesser impact due to any given factor.

Cloud Computing - Secondary Factor Impact

 

It is clear that each of these types of companies have distinct profiles based upon the combination of these eight dimensions. Existing companies have operations and IT systems that reflect these profiles, and look for these requirements to be met by Cloud services. In case these requirements are not adequately met, it constitutes a barrier to migration. These profiles are likely to be similar for companies within a particular domain, irrespective of size.

In addition, there are a number of factors that inhibit rapid Cloud adoption by customers. I will explore these, and identify conditions that would speed up this process, and offer some recommendations for Cloud providers in the concluding part of this paper.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

My Impressions from VMworld 2013

I attended VMworld this year, during August 25-29 in San Francisco. This is a brief report of what I saw and the impressions I came away with. It is getting bigger each year, with 23,000 attendees in 2013. There were lots of interesting sessions, but I could only attend a fraction of those I would have wished to attend. Reasons? Too many conflicts, with each session offered only once. In addition, there were many meetings/receptions/lunches, and lots of networking. I feel that conference prices keep rising just from the perceived value from networking.

For those looking for a overview of the content, here are the day-by-day highlights from the VMware vCloud blog. An excellent coverage of Day 1, Day 2, and Day 3 by David Davis. In case you are looking for more detail, here is Scott Lowe’s blow-by-blow from the keynotes by Pat Gelsinger (Day 1) and Carl Eschenbach (Day 2). Scott also has a great report on interesting vendor meetings he had during VMworld here, these could be technologies and companies to watch over the coming months.

Picture from the Expert Blogger Session at VMworld 2013

Picture from “Ask the Expert vBloggers” Session

The Exhibit Hall was huge, with over 250 exhibitors. The influence of the major industry players was clearly evident from the size of their operations, and the show they put up. Nearly half the exhibitors were storage companies, or offered storage solutions. It took me over two days to cover all booths of my interest, and got me wondering about how new players and small startups could effectively stand out in this environment. Well, Jerry Chen had the very same impression; see here for his post on how it is becoming harder for startups to get noticed at VMworld. He also analyzes the future of VMware as a company, and the role of many key initiatives to VMware’s strategy.

Finally a plug for the VMware User Group (VMUG). VMUG is a customer led organization, and relies on the dedication and commitment of its members and its leaders (including yours truly) to support the community of VMware users. The VMUG booth was the most visited booth, with over 5480 visitors. One-third of VMworld 2013 attendees were VMUG members and this is expected to grow. Join the VMUG community, and volunteer – it is a great way to be amongst a smart and passionate set of individuals!!

The VMUG Booth at VMworld 2013

A view of the VMUG Booth at VMworld

Robin Matlock, VMware CMO, made a shout out to the VMUG community and organization on the opening day.  It was really great to meet other leaders, with over 190 Leaders, VMware Employees and Partners attended key VMUG Leader events. I will leave you with some press coverage of VMUG; interviews with the President, Mariano Maluf are here and here, and an interview with the Executive Director, Victor Bohnert can be viewed here.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

VMworld is almost here

VMworld is less than 10 days away. Time to plan sessions and schedule meetings. Also, make new friends and have a fun time while learning the latest in technology!!

 

vmw2013-banner-hero-sf-v6

Join us as we celebrate the 10th Annual VMworld US!

We’ve grown from 1,600 attendees in 2004 to over 21,000 expected in 2013. As an industry we’ve moved from discussing the potential benefits of virtualization to leveraging virtualization best practices to radically simplify IT and drive business success. Through the last 10 years, VMware has extended the benefits of our market-leading technology across the entire data center—and VMworld year over year remains the best place to learn best practices to accelerate your business.

http://www.vmworld.com/community/conference/us/

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Moving Big Data Workloads into a Private Cloud

Big Data has taken off as an increasingly mainstream technology, with adoption across verticals. Organizations are using analytics to mine nuggets of information from vast amounts of unstructured data they have amassed. Most of these implementations use physical hardware; does it make sense to move large workloads as these, into a private cloud?

The idea of moving resource intensive analytics workloads into a Cloud-based environment would be unacceptable to purists. Some of the main objections would be centered around the following issues.

  1. Resources consumed by the Hypervisor
  2. Interaction from other Cloud-based workloads
  3. Loss of control for the Big Data Administrator

Before addressing the above issues, let us take a look at the merits of Cloud-hosting workloads. We will use Hadoop as the reference, since it is the dominant platform for Big Data workloads. A recent talk by Richard McDougall (CTO, Storage and Application Services at VMware) addressed usage of Big Data workloads within Cloud environments. Some of the benefits that arise from moving such workloads into a Cloud are:

Agility

Moving from a physical to a virtual environment has greatly reduced the Time-to-Deploy for servers, which could be easily leveraged for these workloads as well. In a physical world, deploying another workload can take hours to days instead of just minutes, even with stored profiles and automated deployment. Granting additional resources to nodes is quite simple within the cloud as well. For example, if Datanode7 needs 2 more cores or 4GB additional memory (and Datanode3 has surplus resources), how easy is this with physical servers?

Elasticity

Hadoop combines compute and storage into the data node, which scales I/O throughput, but also limits its elasticity. Separating compute resources from storage, which is possible in a Virtual environment, enables compute elasticity. Compute resources can be allocated as needed to optimize performance. In addition, each workload can be scheduled to receive greater resources during specified times.

Resource Optimization

Compute resources are not shared within physical environments, and unused resources (CPU cycles, memory, etc.) are wasted. On the other hand, sharing these resources within a Cloud enables true multi-tenancy, and permits mixed workloads as well. This has the benefit of driving up utilization of host resources, as seen in the diagram below. Another benefit results from reduced Hadoop Cluster sprawl that arises from deploying a single purpose cluster for each workload.

pic1

Image – Courtesy, Richard McDougall and VMUG

Security

A Cloud based environment already has a number of policies and templates to manage access. These can quickly be applied to a workload, which is a manual process in the physical cluster. Using an existing cluster for another workload during specific hours is difficult, and comes with security risks. Tasks such as making a copy of the Production dataset for a Development workload comes with its perils. These are greatly magnified when datasets are shared with partners or external companies in a PCI or HIPAA compliance environment.

Higher ROI

Ultimately, organizations deploy Big Data workloads to derive insights. The lower the costs, and the sooner these insights are achieved, the higher will be the ROI. Agility and Elasticity increase performance, while increased utilization from multi-tenancy and reduced sprawl drive costs down.

Let us take a look at the objections mentioned earlier. The Hypervisor might consume 2-3% of CPU cycles, and much lesser fractions of other resources; it recovers far larger amounts of resources that are wasted in single purpose clusters. Interaction from other workloads can be minimized by policy based resource allocation. Moving to a Cloud does not reduce a Hadoop administrator’s role; on the contrary, it increases the administrator’s speed and effectiveness with setting up and managing workloads. Based on these, it seems quite compelling to use Private Clouds for Big Data Workloads.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS