data science (12)
The range and depth of applications dependent on IoT sensors continues to swell – from collecting real-time data on the floors of smart factories, to monitoring supply chains, to enabling smart cities, to tracking our health and wellness behaviors. The networks utilizing IoT sensors are capable of providing critical insights into the innerworkings of vast systems, empowering engineers to take better informed actions and ultimately introduce far greater efficiency, safety, and performance into these ecosystems.
One outsized example of this: IoT sensors can support predictive maintenance by detecting data anomalies that deviate from baseline behavior and that suggest potential mechanical failures – thus enabling an IoT-fueled organization to repair or replace components before issues become serious or downtime occurs. Because IoT sensors provide such a tremendous amount of data pertaining to each particular piece of equipment when in good working condition, anomalies in that same data can clearly indicate issues.
Looking at this from a data science perspective, anomalies are rare events which cannot be classified using currently available data examples; anomalies can also come from cybersecurity threats, or fraudulent transactions. It is therefore vital to the integrity of IoT systems to have solutions in place for detecting these anomalies and taking preventative action. Anomaly detection systems require a technology stack that folds in solutions for machine learning, statistical analysis, algorithm optimization, and data-layer technologies that can ingest, process, analyze, disseminate, and store streaming data from myriad IoT sources.
But that said, actually creating an IoT anomaly detection system remains especially challenging given the large-scale nature inherent to IoT environments, where millions or even billions of data events occur daily. To be successful, the data-layer technologies supporting an IoT anomaly detection system must be capable of meeting the scalability, computational, and performance needs fundamental to a successful IoT deployment.
I don’t work for a company that sells anomaly detection, but I – along with colleagues on our engineering team – recently created an experimental anomaly detection solution to see if it could stand up to the specific needs of large-scale IoT environments using pure open source data-layer technologies (in their 100% open source form). The testing utilized Apache Kafka and Apache Cassandra to produce an architecture capable of delivering the features required for IoT anomaly detection technology from the perspectives of scalability, performance, and realistic cost effectiveness. In addition to matching up against these attributes, Kafka and Cassandra are highly compatible and complementary technologies that lend themselves to being used in tandem. Not fully knowing what to expect, we went to work.
In our experiment, Kafka, Cassandra, and our anomaly detection application are combined in a Lambda architecture, with Kafka and our streaming data pipeline serving as the speed layer, and Cassandra acting as the batch and serving layer. (See full details on GitHub, here.) Kafka enables rapid and scalable ingestion of streaming data, while leveraging a “store and forward” technique that acts as a buffer for ensuring that Cassandra is not overwhelmed when data surges spike. At the same time, Cassandra provides a linearly scalable, write-optimized database well-suited to storing the high-velocity streaming data produced by IoT environments. The experiment also leveraged Kubernetes on AWS EKS, to provide automation for the experimental application’s provisioning, deployment, and scaling.
We progressed through the development of our anomaly detection application test using an incremental approach, continually optimizing capabilities, monitoring, debugging, refining, and so on. Then we tested scale: 19 billion real-time events per day were processed, enough to satisfy the requirements of most any IoT use case out there. Achieving this result meant scaling out the application from three to 48 Cassandra nodes, while utilizing 574 CPU cores across Cassandra, Kafka, and Kubernetes clusters. It also included maintaining a peak 2.3 million writes per second into Kafka, for a sustainable 220,000 anomaly checks per second.
In completing this experiment, we’ve demonstrated a method that IoT-centric organizations can use for themselves in building a highly scalable, performant, and affordable anomaly detection application for IoT use cases, fueled by leveraging the unique advantages offered by pure open source Apache Kafka and Cassandra at the all-important data layer.
Internet of Things (IoT) has generated a ton of excitement and furious activity. However, I sense some discomfort and even dread in the IoT ecosystem about the future – typical when a field is not growing at a hockey-stick pace . . .
“History may not repeat itself but it rhymes”, Mark Twain may have said. What history does IoT rhyme with?
I have often used this diagram to crisply define IoT.
Even 10 years ago, the first two blocks in the diagram were major challenges; in 2017, sensors, connectivity, cloud and Big Data are entirely manageable. But extracting insights and more importantly, applying the insights in, say an industrial environment, is still a challenge. While there are examples of business value generated by IoT, the larger value proposition beyond these islands of successes is still speculative. How do you make it real in the fastest possible manner?
In a slogan form, the value proposition of IoT is ”Do more at higher quality with better user experience”. Let us consider a generic application scenario in industrial IoT.
IoT Data Science prescribes actions (“prescriptive analytics”) which are implemented, outcomes of which are monitored and improved over time. Today, humans are involved in this chain, either as observers or as actors (picking a tool from the shelf and attaching it to the machine).
BTW, when I mentioned “Better UX” in the slogan, I was referring to this human interaction elements improved by “Artificial Intelligence” via natural language or visual processing.
Today and for the foreseeable future, IoT Data Science is achieved through Machine Learning which I think of as “competence without comprehension” (Dennett, 2017). We cannot even agree on what human intelligence or comprehension is and I want to distance myself from such speculative (but entertaining) parlor games!
Given such a description of the state of IoT art in 2017, it appears to me that what is preventing us from hockey-stick growth is the state of IoT Data Science. The output of IoT Data Science has to serve two purposes: (1) insights for the humans in the loop and (2) lead us to closed-loop automation, BOTH with the business objective of “Do More at Higher Quality” (or increased throughput and continuous improvement).
Machine Learning has to evolve and evolve quickly to meet these two purposes. One, IoT Data Science has to be more “democratized” so that it is easy to deploy for the humans in the loop – this work is underway by many startups and some larger incumbents. Two, Machine Learning has to become *continuous* learning for continuous improvement which is also at hand (NEXT Machine Learning Paradigm: “DYNAMICAL" ML).
With IoT defined as above, when it comes to “rhyming with history”, I make the point (in Neural Plasticity & Machine Learning blog) that the current Machine Learning revolution is NOT like the Industrial Revolution (of steam engine and electrical machines) which caused productivity to soar between 1920 and 1970; it is more like the Printing Press revolution of the 1400s!
Printing press and movable type played a key role in the development of Renaissance, Reformation and the Age of Enlightenment. Printing press created a disruptive change in “information spread” via augmentation of “memory”. Oral tradition depended on how much one can hold in one’s memory; on the printed page, memories last forever (well, almost) and travel anywhere.
Similarly, IoT Data Science is in the early stages of creating disruptive change in “competence spread” via Machine Learning (which is *competence without comprehension*) based on Big Data analysis. Humans can process only a very limited portion of Big Data in their heads; Data Science can make sense of Big Data and provide competence for skilled actions.
To make the correspondence explicit, "information spread" in the present case is "competence spread"; "memory" analog is "learning" and "printed page" is "machine learning".
Just like Information Spread was enhanced by “augmented memory” (via printed page), Competence Spread will be enhanced by Machine Learning. Information Spread and the Printing Press “revolution” resulted in Michelangelo paintings, fractured religions and a new Scientific method. What will Competence Spread and IoT Data Science “revolution” lead to?!
From an abstract point of view, Memory involves more organization in the brain and hence a reduction in entropy. Printed page can hold a lot more “memories” and hence the Printing Press revolution gave us an external way to reduce entropy of “the human system”. Competence is also an exercise in entropy reduction; data get analyzed and organized; insights are drawn. IoT Data Science is very adept at handling tons of Big Data and extracting insights to increase competence; thus, IoT Data Science gives us an external way to reduce entropy.
What does such reduction in entropy mean in practical terms? Recognizing that entropy reduction happens for Human+IoT as a *system*, the immediate opportunity will be in empowering the human element with competence augmentation. What I see emerging quickly is, instead of a “personal” assistant, a Work Assistant which is an individualized “machine learner” enhancing our *work* competence which no doubt, will lead each of us to “do more at higher quality”. Beyond that, there is no telling what amazing things “competence-empowered human comprehension” will create . . .
I am no Industrial IoT futurist; in the Year 1440, Gutenberg could not have foreseen Michelangelo paintings, fractured religions or a new Scientific method! Similarly, standing here in 2017, it is not apparent what new disruptions IoT revolution will spawn that drop entropy precipitously. I for one am excited about the possibilities and surprises in store in the next few decades.
PG Madhavan, Ph.D. - “LEADER . . . of a life in pursuit of excellence . . . in IoT Data Science”
http://www.linkedin.com/in/pgmad
This post original appeared here.
Analytics has taken world by storm & It it the powerhouse for all the digital transformation happening in every industry.
- Google’s Tensorflow
- Facebook open source modules for Torch
- Amazon released DSSTNE on GitHub
- Microsoft released CNTK, its open source deep learning toolkit, on GitHub
Today we see lot of examples of Deep learning around:
- Google Translate is using deep learning and image recognition to translate not only voice but written languages as well.
- With CamFind app, simply take a picture of any object and it uses mobile visual search technology to tell you what it is. It provides fast, accurate results with no typing necessary. Snap a picture, learn more. That’s it.
- All digital assistants like Siri, Cortana, Alexa & Google Now are using deep learning for natural language processing and speech recognition
- Amazon, Netflix & Spotify are using recommendation engines using deep learning for next best offer, movies and music
- Google PlaNet can look at the photo and tell where it was taken
- DCGAN is used for enhancing and completing the human faces
- DeepStereo: Turns images from Street View into a 3D space that shows unseen views from different angles by figuring out the depth and color of each pixel
- DeepMind’s WaveNet is able to generate speech which mimics any human voice that sounds more natural than the best existing Text-to-Speech systems
- Paypal is using H2O based deep learning to prevent fraud in payments
- Minimize maintenance costs - Don’t waste money through over-cautious time bound maintenance. Only repair equipment when repairs are actually needed.
- Reduce unplanned downtime - Implement predictive maintenance to predict future equipment malfunctioning and failures and minimize the risk for unplanned disasters putting your business at risk.
- Root cause analysis - Find causes for equipment malfunctions and work with suppliers to switch-off reasons for high failure rates. Increase return on your assets.
- Efficient labor planning — no time wasted replacing/fixing equipment that doesn’t need it
- Avoid warranty cost for failure recovery – thousands of recalls in case of automakers while production loss in assembly line
TrainItalia has invested 50M euros in Internet of Things project which expects to cut maintenance costs by up to 130M euros to increase train availability and customer satisfaction.
- Increased need & desire among businesses to gain greater value from their data
- Over 80% of data/information that businesses generate and collect is unstructured or semi-structured data that need special treatment
Data Scientists:
- Typically requires mix of skills - mathematics, statistics, computer science, machine learning and most importantly business knowledge
- They need to employ the R or Python programming language to clean and remove irrelevant data
- Create algorithms to solve the business problems
- Finally effectively communicate the findings to management
Any company, in any industry, that crunches large volumes of numbers, possesses lots of operational and customer data, or can benefit from social media streams, credit data, consumer research or third-party data sets can benefit from having a data scientist or a data science team.
- Kirk D Borne of BoozAllen
- D J Patil Chief Data Scientist at White House
- Gregory Piatetsky of kdnuggets
- Vincent Granville of Analyticsbridge
- Jonathan Goldman of LinkedIn
- Ronald Van Loon
Data science will involve all the aspects of statistics, machine leaning, and artificial intelligence, deep learning & cognitive computing with addition of storage from big data.
Although computers are better for data processing and making calculations, they were not able to accomplish some of the most basic human tasks, like recognizing Apple or Orange from basket of fruits, till now.
Here are a few thoughts from @dataguild on IoT as applied to Data Science. Thanks to @MacSlocum, @JonBruner and the @OReillySolid crew for a great show in San Francisco last month.