Data Science Methodology

Understand steps and tasks needed for designing and building a Data Driven AI engagement

Data Science grew through our experiences with Business Intelligence or BI, a field that became popular in 1990s. However, the last 20 years have seen unprecedented improvement in our ability to take actions using Artificial Intelligence. As we adopt the BI methodologies to AI deployments, how will these methodologies morph to add considerations needed for model deployment, and machine learning.

What you’ll learn

  • They will be articulate data science process and methodology – BI vs AI differences.
  • They will understand how to analyze data sources and enhancements needed.
  • They will explore how users and experts will be engaged for model measurement and monitoring.
  • Finally they will understand the controls and governance aspects.

Course Content

  • Introduction –> 2 lectures • 17min.
  • Step 1: Describe Use Case –> 3 lectures • 32min.
  • Step 2- Describe Data –> 3 lectures • 40min.
  • Step 3 – Prepare Data –> 3 lectures • 44min.
  • Step 4 – Develop Model –> 5 lectures • 49min.
  • Step 5 – Evaluate Model –> 2 lectures • 30min.
  • Step 6 – Deploy Model –> 3 lectures • 34min.
  • Step 7 – Monitor Model –> 1 lecture • 4min.
  • Summary –> 2 lectures • 12min.

Data Science Methodology

Requirements

  • None.

Data Science grew through our experiences with Business Intelligence or BI, a field that became popular in 1990s. However, the last 20 years have seen unprecedented improvement in our ability to take actions using Artificial Intelligence. As we adopt the BI methodologies to AI deployments, how will these methodologies morph to add considerations needed for model deployment, and machine learning.

Today’s Data Science work deals with big data. It introduces three major challenges:

  1. How to deal with large volumes of data. Data understanding and data preparation must deal with large scale observations about the population. In the world of BI on small samples, the art of data science was to find averages and trends using a sample and then projecting it using universal population measures such as census to project to the overall population. Most of the big data provides significant samples where such a projection may not be needed. However, bias and outliers become the real issues
  2. Data is now available in high velocity. Using scoring engines, we can embed insights into high velocity. Data Science techniques offer significant real-time analytics techniques to make it possible. As you interact with a web site or a product, the marketer or services teams can provide help to you as a user. This is due to insight embedded in high velocity.
  3. Most of the data is in speech, unstructured text or videos. This is high variety. How do we interpret an image of a driver license and extract driver license. Understanding and interpreting such data is now a central part of data science.

As these deployed models ingest learning in real-time and adjust their models, it is important to monitor their performance for biases and inaccuracies. We need measurement and monitoring that is no longer project-based one-time activity. It is continuous, automated and closely monitored. The methodology must be extended to include continuous measurement and monitoring.

The course describes 7 steps methodology for conducting data science /AI driven engagement.

  • Step 1: Understand Use Case – We use illustrative examples and case studies to show the power of data science engagement and will provide strategies for defining use case and data science objectives.
  • Step 2: Understand Data – We will define various characteristics of big data and how one should go about understanding and selecting right data sources for a use case from data science perspective
  • Step 3: Prepare Data – How should one go about selecting, cleaning and constructing big data for data modeling purposes using analytics or AI techniques
  • Step 4: Develop Model – Once you have ingested structured and un-structured data from many sources, how do you go about building models to gain data insights using AI and Analytics
  • Step 5: Evaluate Model – How do you engage users and evaluate decisions? What measurements do you need on models?
  • Step 6: Deploy Model- How do you deploy your AI models and apply learning of AI system from production use for enhancing your model.
  • Step 7: Monitor Model – What measurements and guard-rails should be in place for continuous monitoring and learning of AI system for production use?

If you are a developer and are interested in learning how to do a data science project using Python, we have designed another course titled “Data Science in Action using Python”.

Get Tutorial