Join IoT Central | Join our LinkedIn Group | Post on IoT Central


data science (12)

Industry 4.0 Trends To Look For In 2023

Identifying the best technologies for advancement in the workplace is essential to create a profitable and optimized enterprise. The Industry 4.0 era enjoys the benefit of working with different technologies and techniques that have the potential to improve the business’s bottom line. This article talks about the different Industry 4.0 trends and technologies that will be of importance in 2023.
Read more…

The range and depth of applications dependent on IoT sensors continues to swell – from collecting real-time data on the floors of smart factories, to monitoring supply chains, to enabling smart cities, to tracking our health and wellness behaviors. The networks utilizing IoT sensors are capable of providing critical insights into the innerworkings of vast systems, empowering engineers to take better informed actions and ultimately introduce far greater efficiency, safety, and performance into these ecosystems. 

One outsized example of this: IoT sensors can support predictive maintenance by detecting data anomalies that deviate from baseline behavior and that suggest potential mechanical failures – thus enabling an IoT-fueled organization to repair or replace components before issues become serious or downtime occurs. Because IoT sensors provide such a tremendous amount of data pertaining to each particular piece of equipment when in good working condition, anomalies in that same data can clearly indicate issues.

Looking at this from a data science perspective, anomalies are rare events which cannot be classified using currently available data examples; anomalies can also come from cybersecurity threats, or fraudulent transactions. It is therefore vital to the integrity of IoT systems to have solutions in place for detecting these anomalies and taking preventative action. Anomaly detection systems require a technology stack that folds in solutions for machine learning, statistical analysis, algorithm optimization, and data-layer technologies that can ingest, process, analyze, disseminate, and store streaming data from myriad IoT sources.

But that said, actually creating an IoT anomaly detection system remains especially challenging given the large-scale nature inherent to IoT environments, where millions or even billions of data events occur daily. To be successful, the data-layer technologies supporting an IoT anomaly detection system must be capable of meeting the scalability, computational, and performance needs fundamental to a successful IoT deployment.

I don’t work for a company that sells anomaly detection, but I – along with colleagues on our engineering team – recently created an experimental anomaly detection solution to see if it could stand up to the specific needs of large-scale IoT environments using pure open source data-layer technologies (in their 100% open source form). The testing utilized Apache Kafka and Apache Cassandra to produce an architecture capable of delivering the features required for IoT anomaly detection technology from the perspectives of scalability, performance, and realistic cost effectiveness. In addition to matching up against these attributes, Kafka and Cassandra are highly compatible and complementary technologies that lend themselves to being used in tandem. Not fully knowing what to expect, we went to work.

In our experiment, Kafka, Cassandra, and our anomaly detection application are combined in a Lambda architecture, with Kafka and our streaming data pipeline serving as the speed layer, and Cassandra acting as the batch and serving layer. (See full details on GitHub, here.) Kafka enables rapid and scalable ingestion of streaming data, while leveraging a “store and forward” technique that acts as a buffer for ensuring that Cassandra is not overwhelmed when data surges spike. At the same time, Cassandra provides a linearly scalable, write-optimized database well-suited to storing the high-velocity streaming data produced by IoT environments. The experiment also leveraged Kubernetes on AWS EKS, to provide automation for the experimental application’s provisioning, deployment, and scaling. 

We progressed through the development of our anomaly detection application test using an incremental approach, continually optimizing capabilities, monitoring, debugging, refining, and so on. Then we tested scale: 19 billion real-time events per day were processed, enough to satisfy the requirements of most any IoT use case out there. Achieving this result meant scaling out the application from three to 48 Cassandra nodes, while utilizing 574 CPU cores across Cassandra, Kafka, and Kubernetes clusters. It also included maintaining a peak 2.3 million writes per second into Kafka, for a sustainable 220,000 anomaly checks per second.

In completing this experiment, we’ve demonstrated a method that IoT-centric organizations can use for themselves in building a highly scalable, performant, and affordable anomaly detection application for IoT use cases, fueled by leveraging the unique advantages offered by pure open source Apache Kafka and Cassandra at the all-important data layer.

Read more…

 

Internet of Things (IoT) has generated a ton of excitement and furious activity. However, I sense some discomfort and even dread in the IoT ecosystem about the future – typical when a field is not growing at a hockey-stick pace . . .

“History may not repeat itself but it rhymes”, Mark Twain may have said. What history does IoT rhyme with?

 I have often used this diagram to crisply define IoT.

Even 10 years ago, the first two blocks in the diagram were major challenges; in 2017, sensors, connectivity, cloud and Big Data are entirely manageable. But extracting insights and more importantly, applying the insights in, say an industrial environment, is still a challenge. While there are examples of business value generated by IoT, the larger value proposition beyond these islands of successes is still speculative. How do you make it real in the fastest possible manner?

In a slogan form, the value proposition of IoT is ”Do more at higher quality with better user experience”. Let us consider a generic application scenario in industrial IoT.

IoT Data Science prescribes actions (“prescriptive analytics”) which are implemented, outcomes of which are monitored and improved over time. Today, humans are involved in this chain, either as observers or as actors (picking a tool from the shelf and attaching it to the machine).

BTW, when I mentioned “Better UX” in the slogan, I was referring to this human interaction elements improved by “Artificial Intelligence” via natural language or visual processing.

Today and for the foreseeable future, IoT Data Science is achieved through Machine Learning which I think of as “competence without comprehension” (Dennett, 2017). We cannot even agree on what human intelligence or comprehension is and I want to distance myself from such speculative (but entertaining) parlor games!

Given such a description of the state of IoT art in 2017, it appears to me that what is preventing us from hockey-stick growth is the state of IoT Data Science. The output of IoT Data Science has to serve two purposes: (1) insights for the humans in the loop and (2) lead us to closed-loop automation, BOTH with the business objective of “Do More at Higher Quality” (or increased throughput and continuous improvement).

Machine Learning has to evolve and evolve quickly to meet these two purposes. One, IoT Data Science has to be more “democratized” so that it is easy to deploy for the humans in the loop – this work is underway by many startups and some larger incumbents. Two, Machine Learning has to become *continuous* learning for continuous improvement which is also at hand (NEXT Machine Learning Paradigm: “DYNAMICAL" ML).

With IoT defined as above, when it comes to “rhyming with history”, I make the point (in Neural Plasticity & Machine Learning blog) that the current Machine Learning revolution is NOT like the Industrial Revolution (of steam engine and electrical machines) which caused productivity to soar between 1920 and 1970; it is more like the Printing Press revolution of the 1400s!

Printing press and movable type played a key role in the development of Renaissance, Reformation and the Age of Enlightenment. Printing press created a disruptive change in “information spread” via augmentation of “memory”. Oral tradition depended on how much one can hold in one’s memory; on the printed page, memories last forever (well, almost) and travel anywhere.

Similarly, IoT Data Science is in the early stages of creating disruptive change in “competence spread” via Machine Learning (which is *competence without comprehension*) based on Big Data analysis. Humans can process only a very limited portion of Big Data in their heads; Data Science can make sense of Big Data and provide competence for skilled actions.

 

To make the correspondence explicit, "information spread" in the present case is "competence spread"; "memory" analog is "learning" and "printed page" is "machine learning".

 

Just like Information Spread was enhanced by “augmented memory” (via printed page), Competence Spread will be enhanced by Machine Learning. Information Spread and the Printing Press “revolution” resulted in Michelangelo paintings, fractured religions and a new Scientific method. What will Competence Spread and IoT Data Science “revolution” lead to?!

From an abstract point of view, Memory involves more organization in the brain and hence a reduction in entropy. Printed page can hold a lot more “memories” and hence the Printing Press revolution gave us an external way to reduce entropy of “the human system”. Competence is also an exercise in entropy reduction; data get analyzed and organized; insights are drawn. IoT Data Science is very adept at handling tons of Big Data and extracting insights to increase competence; thus, IoT Data Science gives us an external way to reduce entropy.

What does such reduction in entropy mean in practical terms? Recognizing that entropy reduction happens for Human+IoT as a *system*, the immediate opportunity will be in empowering the human element with competence augmentation. What I see emerging quickly is, instead of a “personal” assistant, a Work Assistant which is an individualized “machine learner” enhancing our *work* competence which no doubt, will lead each of us to “do more at higher quality”. Beyond that, there is no telling what amazing things “competence-empowered human comprehension” will create . . .

I am no Industrial IoT futurist; in the Year 1440, Gutenberg could not have foreseen Michelangelo paintings, fractured religions or a new Scientific method! Similarly, standing here in 2017, it is not apparent what new disruptions IoT revolution will spawn that drop entropy precipitously. I for one am excited about the possibilities and surprises in store in the next few decades.

PG Madhavan, Ph.D. - “LEADER . . . of a life in pursuit of excellence . . . in IoT Data Science” 

http://www.linkedin.com/in/pgmad

This post original appeared here.

Read more…

Do you want to hire a Data Scientist?

As mentioned by Tom Davenport few years back, Data Scientist is still a hottest job of century.
Data scientists are those elite people who solve business problems by analyzing tons of data and communicate the results in a very compelling way to senior leadership and persuade them to take action.
They have the critical responsibility to understand the data and help business get more knowledgeable about their customers.
The importance of Data Scientists has rose to top due to two key issues:
·     Increased need & desire among businesses to gain greater value from their data to be competitive
·     Over 80% of data/information that businesses generate and collect is unstructured or semi-structured data that need special treatment
So it is extremely important to hire a right person for the job.Requirements for being a data scientist are pretty rigorous, and truly qualified candidates are few and far between.
Data Scientists are very high in demand, hard to attract, come at a very high cost so if there is a wrong hire then it’s really more frustrating. 
Here are some guidelines for checking them:
·     Check the logical reasoning ability
·     Problem solving skills
·     Ability to collaborate & communicate with business folks
·     Practical experience on collaborating  Big Data tools
·     Statistical and  machine learning experience
·     Should be able to describe their projects very clearly where they have solved business problems
·     Should be able to tell story from the data
·     Should know the latest of  cognitive computingdeep learning
I have seen smartest data scientists in my career who do the best job best but cannot communicate the results to senior leaders effectively. Ideally they should know the data in depth and can explain its significance properly. Data visualizations comes very handy at this stage.
Today with  digital disrupting every field it has an impact on data science also.
Gartner has called this new breed as citizen data scientists. Their primary job function is outside  analytics, they don’t know much about statistics but can work on ready to use algorithms available in APIs like Watson, Tensor flow, Azure and other well-known tools.
The good data scientist can make use of them to spread the awareness and expand their influence.
It has become more important to hire a right data scientist as they will show you the results which may make or break the company.
Read more…

A to Z of Analytics

Analytics has taken world by storm & It it the powerhouse for all the digital transformation happening in every industry.

Today everybody is generating tons of data – we as consumers leaving digital footprints on social media, IoT generating millions of records from sensors,  Mobile phones are used from morning till we sleep. All these variety of data formats are stored in  Big Data platform. But only storing this data is not going to take us anywhere unless analytics is applied on it. Hence it is extremely important to close the loop with Analytics insights.
Here is my version of A to Z for Analytics:
Artificial Intelligence: AI is the capability of a machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI for self-driving cars. AI should be used to solve real world tough problems like climate modeling to disease analysis and betterment of humanity.
Boosting and Bagging: it is the technique used to generate more accurate models by ensembling multiple models together
Crisp-DM: is the cross industry standard process for data mining.  It was developed by a consortium of companies like SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in developing analytics models. Major 6 steps involved are business understanding, data understanding, data preparation, modeling, evaluation and deployment.
Data preparation: In analytics deployments more than 60% time is spent on data preparation. As a normal rule is garbage in garbage out. Hence it is important to cleanse and normalize the data and make it available for consumption by model.
Ensembling: is the technique of combining two or more algorithms to get more robust predictions. It is like combining all the marks we obtain in exams to arrive at final overall score. Random Forest is one such example combining multiple decision trees.
Feature selection: Simply put this means selecting only those feature or variables from the data which really makes sense and remove non relevant variables. This uplifts the model accuracy.
Gini Coefficient: it is used to measure the predictive power of the model typically used in credit scoring tools to find out who will repay and who will default on a loan.
Histogram: This is a graphical representation of the distribution of a set of numeric data, usually a vertical bar graph used for exploratory analytics and data preparation step.
Independent Variable: is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable like effect of increasing the price on Sales.
Jubatus: This is online Machine Learning Library covering Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
KNN: K nearest neighbor algorithm in  Machine Learning used for classification problems based on distance or similarity between data points.
Lift Chart: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.
Model: There are more than 50+ modeling techniques like regressions, decision trees, SVM, GLM, Neural networks etc present in any technology platform like SAS Enterprise miner, IBM SPSS or R. They are broadly categorized under supervised and unsupervised methods into classification, clustering, association rules.
Neural Networks: These are typically organized in layers made up by nodes and mimic the learning like brain does. Today  Deep Learning is emerging field based on deep neural networks.
 
Optimization: It the Use of simulations techniques to identify scenarios which will produce best results within available constraints e.g. Sale price optimization, identifying optimal Inventory for maximum fulfillment & avoid stock outs
PMML: this is xml base file format developed by data mining group to transfer models between various technology platforms and it stands for predictive model markup language.
Quartile: It is dividing the sorted output of model into 4 groups for further action.
R: Today every university and even corporates are using R for statistical model building. It is freely available and there are licensed versions like Microsoft R. more than 7000 packages are now available at disposal to data scientists.
Sentiment Analytics: Is the process of determining whether an information or service provided by business leads to positive, negative or neutral human feelings or opinions. All the consumer product companies are measuring the sentiments 24/7 and adjusting there marketing strategies.
Text Analytics: It is used to discover & extract meaningful patterns and relationships from the text collection from social media site such as Facebook, Twitter, Linked-in, Blogs, Call center scripts.
Unsupervised Learning: These are algorithms where there is only input data and expected to find some patterns. Clustering & Association algorithms like k-menas & apriori are best examples.
Visualization: It is the method of enhanced exploratory data analysis & showing output of modeling results with highly interactive statistical graphics. Any model output has to be presented to senior management in most compelling way. Tableau, Qlikview, Spotfire are leading visualization tools.
What-If analysis: It is the method to simulate various business scenarios questions like what if we increased our marketing budget by 20%, what will be impact on sales? Monte Carlo simulation is very popular.
What do think should come for X, Y, Z?
Read more…

What is Deep Learning ?

Remember how you started recognizing fruits, animals, cars and for that matter any other object by looking at them from our childhood? 
Our brain gets trained over the years to recognize these images and then further classify them as apple, orange, banana, cat, dog, horse, Toyota, Honda, BMW and so on.
Inspired by these biological processes of human brain, artificial neural networks (ANN) were developed.  Deep learning refers to these artificial neural networks that are composed of many layers. It is the fastest-growing field in  machine learning. It uses many-layered Deep Neural Networks (DNNs) to learn levels of representation and abstraction that make sense of data such as images, sound, and text
Why ‘Deep Learning’ is called deep? It is because of the structure of ANNs. Earlier 40 years back, neural networks were only 2 layers deep as it was not computationally feasible to build larger networks. Now it is common to have neural networks with 10+ layers and even 100+ layer ANNs are being tried upon.
Using multiple levels of neural networks in Deep Learning, computers now have the capacity to see, learn, and react to complex situations as well or better than humans.
Normally  data scientists spend lot of time in data preparation – feature extraction or selecting variables which are actually useful to  predictive analytics. Deep learning does this job automatically and make life easier.
Many technology companies have made their deep learning libraries as open source:
  • Google’s Tensorflow
  • Facebook open source modules for Torch
  • Amazon released DSSTNE on GitHub
  • Microsoft released CNTK, its open source deep learning toolkit, on GitHub

Today we see lot of examples of Deep learning around:

  • Google Translate is using deep learning and image recognition to translate not only voice but written languages as well. 
  • With CamFind app, simply take a picture of any object and it uses mobile visual search technology to tell you what it is. It provides fast, accurate results with no typing necessary. Snap a picture, learn more. That’s it.
  • All digital assistants like Siri, Cortana, Alexa & Google Now are using deep learning for natural language processing and speech recognition
  • Amazon, Netflix & Spotify are using recommendation engines using deep learning for next best offer, movies and music
  • Google PlaNet can look at the photo and tell where it was taken
  • DCGAN is used for enhancing and completing the human faces
  • DeepStereo: Turns images from Street View into a 3D space that shows unseen views from different angles by figuring out the depth and color of each pixel
  • DeepMind’s WaveNet is able to generate speech which mimics any human voice that sounds more natural than the best existing Text-to-Speech systems
  • Paypal is using H2O based deep learning to prevent fraud in payments
Till now, Deep Learning has aided image classification, language translation, speech recognition and it can be used to solve any pattern recognition problem, and all of it is happening without human intervention.
Deep learning is a disruptive  Digital technology that is being used by more and more companies to create new business models.
Read more…

Using Data Science for Predictive Maintenance

Remember few years ago there were two recall announcements from National Highway Traffic Safety Administration for GM & Tesla – both related to problems that could cause fires. These caused tons of money to resolve.
Aerospace, Rail industry, Equipment manufacturers and Auto makers often face this challenge of ensuring maximum availability of critical assembly line systems, keeping those assets in good working order, while simultaneously minimizing the cost of maintenance and time based or count based repairs.
Identification of root causes of faults and failures must also happen without the need for a lab or testing. As more vehicles/industrial equipment and assembly robots begin to communicate their current status to a central server, detection of faults becomes more easy and practical.
Early identification of these potential issues helps organizations deploy maintenance team more cost effectively and maximize parts/equipment up-time. All the critical factors that help to predict failure, may be deeply buried in structured data like equipment year, make, model, warranty details etc and unstructured data covering millions of log entries, sensor data, error messages, odometer reading, speed, engine temperature, engine torque, acceleration and repair & maintenance reports.
Predictive maintenance, a technique to predict when an in-service machine will fail so that maintenance can be planned in advance, encompasses failure prediction, failure diagnosis, failure type classification, and recommendation of maintenance actions after failure.
Business benefits of Data Science with predictive maintenance:
  • Minimize maintenance costs - Don’t waste money through over-cautious time bound maintenance. Only repair equipment when repairs are actually needed.
  • Reduce unplanned downtime - Implement predictive maintenance to predict future equipment malfunctioning and failures and minimize the risk for unplanned disasters putting your business at risk.
  • Root cause analysis - Find causes for equipment malfunctions and work with suppliers to switch-off reasons for high failure rates. Increase return on your assets.
  • Efficient labor planning — no time wasted replacing/fixing equipment that doesn’t need it
  • Avoid warranty cost for failure recovery – thousands of recalls in case of automakers while production loss in assembly line

TrainItalia has invested 50M euros in Internet of Things project which expects to cut maintenance costs by up to 130M euros to increase train availability and customer satisfaction.

Rolls Royce is teaming up with Microsoft for Azure cloud based streaming analytics for predicting engine failures and ensuring right maintenance.
Sudden machine failures can ruin the reputation of a business resulting in potential contract penalties, and lost revenue. Data Science can help in real time and before time to save all this trouble.
Read more…
Digital Transformation has become a burning question for all the businesses and the foundation to ride on the wave is being data driven.
DJ Patil & Thomas Davenport mentioned in 2012 HBR article, that Data Scientist is the sexiest job of the century, and how true!  Even the latest Glassdoor ranked Data Scientist at 1 st in top 25 best jobs in America.
Over the last decade there’s been a massive explosion in both the data generated and retained by companies. Uber, Airbnb, Netflix, Wallmart, Amazon, LinkedIn, Twitter all process tons of data every minute and use that for revenue growth, cost reductions and increase in customer satisfaction.
Most industries such as Retail, Banking, Travel, Financial Sector, Healthcare, and Manufacturing want to be able to make better decisions. With speed of change and profitability pressures on the businesses, the ability to take decisions had gone down to real time. Data has become an asset for every company, hence they need someone who can comb through these data sets and apply their logic and use tools to find some patterns and provide insights for future.
Think about Facebook, Twitter and other  social media platforms, smartphone apps, in-store purchase behavior data, online website analytics, and now all connected devices with  internet of things are generating tsunami of new data streams.
All this data is useless if not analyzed for actions or new insights.
The importance of Data Scientists has rose to top due to two key issues:
  • Increased need & desire among businesses to gain greater value from their data
  • Over 80% of data/information that businesses generate and collect is unstructured or semi-structured data that need special treatment 

Data Scientists:

  • Typically requires mix of skills - mathematics, statistics, computer science, machine learning and most importantly business knowledge
  • They need to employ the R or Python programming language to clean and remove irrelevant data
  • Create algorithms to solve the business problems
  • Finally effectively communicate the findings to management

Any company, in any industry, that crunches large volumes of numbers, possesses lots of operational and customer data, or can benefit from social media streams, credit data, consumer research or third-party data sets can benefit from having a data scientist or a data science team.

Top data scientists in the world today are:
  • Kirk D Borne of BoozAllen
  • D J Patil Chief Data Scientist at White House
  • Gregory Piatetsky of kdnuggets
  • Vincent Granville of Analyticsbridge
  • Jonathan Goldman of LinkedIn
  • Ronald Van Loon

Data science will involve all the aspects of statistics, machine leaning, and artificial intelligence, deep learning & cognitive computing with addition of storage from big data.

Read more…

What is Cognitive Computing?

Although computers are better for data processing and making calculations, they were not able to accomplish some of the most basic human tasks, like recognizing Apple or Orange from basket of fruits, till now.

Computers can capture, move, and store the data, but they cannot understand what the data mean. Thanks to Cognitive Computing, machines are bringing human-like intelligence to a number of business applications.
Cognitive Computing is a term that IBM had coined for machines that can interact and think like humans.
In today's  Digital Transformation age, various technological advancements have given machines a greater ability to understand information, to learn, to reason, and act upon it. 
Today, IBM Watson and Google DeepMind are leading the cognitive computing space.
Cognitive Computing systems may include the following components:
·      Natural Language Processing - understand meaning and context in a language, allowing deeper, more intuitive level of discovery and even interaction with information.
·     Machine Learning with Neural Networks - algorithms that help train the system to recognize images and understand speech
·    Algorithms that learn and adapt with  Artificial Intelligence
·    Deep Learning – to recognize patterns
·    Image recognition – like humans but more faster
·    Reasoning and decision automation – based on limitless data
·    Emotional Intelligence
Cognitive computing can help banking and insurance companies to identify risks and frauds. It analyses information to predict weather patterns. In healthcare it is helping doctors to treat patients based on historical data.
Some of the recent examples of Cognitive Computing:
·   ANZ bank of Australia used Watson-based financial services apps to offer investment advice, by reading through thousands of investments options and suggesting best-fit based on customer specific profiles, further taking into consideration their age, life stage, financial position, and risk tolerance.
·   Geico is using Watson based cognitive computing to learn the underwriting guidelines, read the risk submissions, and effectively help underwrite
·   Brazilian bank Banco Bradesco is using Cognitive assistants at work helping build more intimate, personalized relationships
·   Out of the personal digital assistants we have Siri, Google Now & Cortana – I feel Google now is much easy and quickly adapt to your spoken language. There is a voice command for just about everything you need to do — texting, emailing, searching for directions, weather, and news. Speak it; don’t text it!
As  Big Data gives the ability to store huge amounts of data,  Analyticsgives ability to predict what is going to happen, Cognitive gives the ability to learn from further interactions and suggest best actions.
Read more…

Sponsor