Welcome!

Linux Containers Authors: Pat Romanski, Elizabeth White, Yeshim Deniz, Stackify Blog, Liz McMillan

Related Topics: @DXWorldExpo, Microservices Expo, Linux Containers, Containers Expo Blog, @CloudExpo

@DXWorldExpo: Blog Feed Post

A Review Of @SnapLogic By @TheEbizWizard | @CloudExpo [#BigData]

Squeezing value out of data in the enterprise has always pushed the limits of the available technology

SnapLogic: From ETL to VVV

Squeezing value out of data in the enterprise has always pushed the limits of the available technology. Furthermore, when business needs exceed available capabilities, vendors push to innovate within the processor, storage, and network constraints of the day.

This inherent stress between enterprise demands and vendor innovation gave rise to the Extract, Transform, and Load (ETL) marketplace over twenty years ago. Business realized that building complex, ad hoc SQL queries on increasingly large databases would grind them to a halt, thus requiring an alternate approach to gaining essential business intelligence.

The best solution given the hardware limitations of the time required controlled, pre-planned extraction of data from various databases of record, followed by complex, time-consuming transformation steps, and then loading the transformed data into separate reporting data stores (dubbed data warehouses and data marts) specially optimized for a range of analytical queries.

As available storage and memory ramped up, ad hoc data transformations became increasingly practical, allowing for the transform step to take place as needed, subsequent to the load step – and Extract, Load, and Transform (ELT) became a popular alternative to ETL.

The transition from ETL to ELT represents an important stepping stone to real-time data analysis. ELT still wasn’t truly real-time, as businesses had to extract and load their data ahead of time, but much of the analysis work depended on the now-accelerated and increasingly flexible transformation step.

Hadoop and ELT

Today, all the buzz is about Big Data and the most important technology innovation on the Big Data analysis scene: Hadoop. In spite of all the commotion around Hadoop, this open source platform and its various add-ons are little more than the next generation of the transform capability of ELT, albeit at cloud scale.

The core motivations that drove the Hadoop community to create this tool were the increasing size of data sets (leading to the awkward Big Data terminology), as well as the need to process data of diverse levels of structure – in particular, a mix of unstructured (content-centric) and semi-structured (generally XML-formatted), as well as structured (relational) information.

In other words, traditional ETL and ELT tools weren’t up to the challenge of dealing with the volume and variety of data that enterprises increasingly produced and wished to analyze. Hadoop addressed these challenges with a horizontally scalable, highly redundant file system (the Hadoop Distributed File System, or HDFS), as well as MapReduce, an algorithmic approach to analyzing data appropriate for processing the necessary volumes of data on HDFS.

The first version of Hadoop, however, was essentially a batch analytics platform. Data analysts had to surmount the significant challenges of extracting data from their source locations and loading them properly into HDFS, only to run arcane MapReduce jobs to produce useful results. As a result, the hype surrounding Hadoop 1.0 exceeded its actual usefulness for most organizations brave enough to implement it.

As an open source project, however, Hadoop had enough backing from the community to drive the development of version 2, which offers a resource negotiator for MapReduce tasks dubbed YARN as well as fledgling real-time processing capabilities. Today, real-time Hadoop is at the cutting edge, as various tools in an expanding Hadoop ecosystem mature to address the velocity requirements for real-time data analytics.

Hadoop’s Missing Pieces

In terms of the maturation of ETL technologies, therefore, the current version of Hadoop can be thought of as a modern transformation engine running on a horizontally scalable file system that in theory offers the “three V’s” of Big Data: volume, variety, and velocity. In practice, however, many capabilities are missing from the open source distribution.

As a result, other open source projects as well as commercial software providers have an opportunity to fill in the gaps that Hadoop leaves in the areas of enterprise data integration in the context of modern enterprise infrastructures. Today, such integration scenarios typically fall within hybrid cloud environments that combine on-premise and could-based capabilities.

In the enterprise context, the extract and load steps of ELT require organizations to leverage diverse data sources both on-premise and in the cloud. Those data sources may be a mix of relational, hierarchical, or content-centric. Furthermore, the business may require real-time (or near real-time) analysis of data from such diverse data sources.

To address these challenges, SnapLogic has built a data and application integration platform that resolves many of Hadoop’s shortcomings. As I wrote about in a previous BrainBlog post, SnapLogic separates their technology into a Control Plane and a Data Plane. The Control plane resides in the cloud and contains the Designer, Manager, and Dashboard subcomponents which manage the Data Plane, allowing the Data Plane to act as a cloud-friendly abstraction of the data flows or Pipelines that users can create with the SnapLogic Designer.

The data integrations themselves run as Pipelines, which are sequences of atomic integration steps that SnapLogic calls Snaps – because people literally snap them together. Snaps support the full gamut of data types and levels of structure, facilitating the ability to send the full variety of enterprise data to Hadoop.

SnapLogic has also recently rolled out Hadooplexes, which are Snaplexes (data processing components) that run as YARN apps in Hadoop, as well as SnapReduce, SnapLogic’s support for Big Data integrations that leverage Hadoop to process large amounts of data across large clusters.

SnapReduce enables Pipelines to generate MapReduce jobs and scale them across multiple nodes in a Hadoop cluster. Each Hadooplex then delegates MapReduce-based analytic operations automatically across all Hadoop nodes, thus abstracting the horizontally distributed nature of the Hadoop environment from the user.

The result is an elastic, horizontally scalable integration fabric that provides the extract and load capabilities that Hadoop lacks. Each data integration can be run manually, on a preset schedule, or via a trigger – and SnapLogic exposes such triggers as URLs (either on the Internet or a private network), allowing any authorized piece of software to kick off the integration.

End-to-End Modern ELT

In summary, SnapLogic modernizes each element of ELT for today’s modern, cloud-centric, Big Data world. Instead of traditional extraction of structured data, SnapLogic allows for diverse queries across the full variety of data types and structures by streaming all data as JSON documents. Instead of simplistic, point-to-point loading of data, SnapLogic offers elastic, horizontally scalable Pipelines that hide the underlying complexity of data integration from the user. And within Hadoop, Hadooplexes simplify the distribution of YARN-based MapReduce algorithms, allowing users to treat the Hadoop environment as though it were a traditional reporting database.

Furthermore, SnapLogic can perform each of these steps in real-time, in those situations where the business requires real-time analytics. Each pipeline simply streams the data from the acquisition point to the delivery point, handling the appropriate operations statelessly along the way. The end result is a user-friendly data integration and analysis tool that adroitly hides an extraordinary level of complexity behind the scenes – opening up the power of Big Data to an increasingly broad user base.

SnapLogic is an Intellyx client. At the time of writing, no other organizations mentioned in this article are Intellyx clients. Intellyx retains full editorial control over the content of this article.

Read the original blog entry...

More Stories By Jason Bloomberg

Jason Bloomberg is a leading IT industry analyst, Forbes contributor, keynote speaker, and globally recognized expert on multiple disruptive trends in enterprise technology and digital transformation. He is ranked #5 on Onalytica’s list of top Digital Transformation influencers for 2018 and #15 on Jax’s list of top DevOps influencers for 2017, the only person to appear on both lists.

As founder and president of Agile Digital Transformation analyst firm Intellyx, he advises, writes, and speaks on a diverse set of topics, including digital transformation, artificial intelligence, cloud computing, devops, big data/analytics, cybersecurity, blockchain/bitcoin/cryptocurrency, no-code/low-code platforms and tools, organizational transformation, internet of things, enterprise architecture, SD-WAN/SDX, mainframes, hybrid IT, and legacy transformation, among other topics.

Mr. Bloomberg’s articles in Forbes are often viewed by more than 100,000 readers. During his career, he has published over 1,200 articles (over 200 for Forbes alone), spoken at over 400 conferences and webinars, and he has been quoted in the press and blogosphere over 2,000 times.

Mr. Bloomberg is the author or coauthor of four books: The Agile Architecture Revolution (Wiley, 2013), Service Orient or Be Doomed! How Service Orientation Will Change Your Business (Wiley, 2006), XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996). His next book, Agile Digital Transformation, is due within the next year.

At SOA-focused industry analyst firm ZapThink from 2001 to 2013, Mr. Bloomberg created and delivered the Licensed ZapThink Architect (LZA) Service-Oriented Architecture (SOA) course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, which was acquired by Dovel Technologies in 2011.

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting), and several software and web development positions.

@ThingsExpo Stories
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Michael Maximilien, better known as max or Dr. Max, is a computer scientist with IBM. At IBM Research Triangle Park, he was a principal engineer for the worldwide industry point-of-sale standard: JavaPOS. At IBM Research, some highlights include pioneering research on semantic Web services, mashups, and cloud computing, and platform-as-a-service. He joined the IBM Cloud Labs in 2014 and works closely with Pivotal Inc., to help make the Cloud Found the best PaaS.
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
I think DevOps is now a rambunctious teenager - it's starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
As ridesharing competitors and enhanced services increase, notable changes are occurring in the transportation model. Despite the cost-effective means and flexibility of ridesharing, both drivers and users will need to be aware of the connected environment and how it will impact the ridesharing experience. In his session at @ThingsExpo, Timothy Evavold, Executive Director Automotive at Covisint, discussed key challenges and solutions to powering a ride sharing and/or multimodal model in the age ...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...