Sunday, April 10, 2016

01 Introduction to Analytics

Objectives

  • Understand what is analytics and the difference between analysis and analytics
  • Know the popular tools used in analytics
  • Understand the role of a data scientist
  • Know the processes involved in analytics
  • Define a problem statement
  • Collect and summarize data
  • Detect and treat outliers in the data

Analytics versus Analysis

Analytics

Analytics is the science of analysis whereby statistics, data mining, computer technology, etc. is used in doing analysis

Analysis

Analysis is the process of breaking down a complex object into its simpler forms

What is Analytics?

  • It’s the science of wisely acquiring meaningful results from given data using various methods and technologies.
  • Aims at discovering pattern of variation from the given data.
  • It helps to understand the future from past data and the uncertainty related to business.
  • It’s a sophisticated process that uses statistics, mathematics and economics models to predict the future and prescribe strategies.

How analytics works

Gather Data ==> Organize Data ==> Analyse Data

Analytics Stages

Information Stage

  • Descriptive: What is the wearing rate of MRF tyres in the last 8 months?

Insight Stages

  • Diagnostic: Why have the wearing rate increased in the last 8 months?
  • Predictive: What kind of issues (like mileage) MRF tyres are most likely to face if It don’t address the issue now ?

Decision Stage

  • Prescriptive: On what things should MRF tyres should concentrate to reduce the overall effect ?

Popular Tools:

  • R
  • Revolution R
  • R Studio
  • Tableau
  • SAP HANA
  • Weka
  • KXEN
  • SAS

Role of a Data Scientist

  • Inquisitive, can stare at data and spot trends.
  • Come out with unrevealed stories hidden in data that helps in creating more useful insights and help solving business problems.
  • Work in sync with application developer to get relevant data for analysis.
  • Make an analytical plan in such a way that the results satisfy the business needs.
  • Come up with an effective data mining architecture and prepare suitable models.
  • Respond to and resolve data mining performance issues.
  • Generate reports that are affordable from a business perspective.

Data Analytics Methodology

Discovery ==> Data Preparing ==> Model Planning ==> Model Building ==> Deliver Results ==> Put into use

Problem Definition

  • What is the problem?
  • What is it not?
  • We have this problem because?
  • We don't have a solution because?

Techniques involved in defining a problem

  • State the problem in a general way
  • Understand the nature of the problem
  • Survey the available literature
  • Go for discussions for developing ideas
  • Rephrase the research problem into a working proposition

Types of Data

Data can be of two types – qualitative and quantitative

Qualitative Data

  • Data expressed as groups or categories
  • Descriptive data
  • E.g. Dividing a population into high, medium and low height groups

Quantitative Data

  • Data expressed as numbers
  • Definitive Data
  • E.g. The height of a person

Summarizing Data

  • Summarizing is the process of converting huge amounts of raw data into a format that can be easily analyzed.
  • Summaries differ based on the type of data; and can be descriptive or graphical.
Batsman Frequency of not outs
Sachin 11
Sehwag 2
Dravid 36
Dhoni 32
Virat 7













Summarizing Data

Numeric - Descriptive

  • Mean
  • Median
  • Mode

Categorical - Descriptive

  • Frequency distribution tables

Numeric - Graphical

  • Box plot

Categorical - Graphical

  • Bar charts
  • Histograms

Data Collection

  • Process of collecting relevant data that aids in solving the problem statement
  • Data Collection process needs to be defined, and systematic.
  • Observations need to be recorded and organized for optimal usefulness
  • Collect Relevant Data
  • Categorize the Data
  • Organize the Data

Data Collection Methods

  • Observation
  • Experiment
  • Census
  • Questionnaire
  • Survey
  • Reporting
  • Registration
  • Data Sources
  • Data collection methods fall broadly into two categories – primary and secondary.
  • Primary methods are where the data is gathered directly through investigating, experimenting or observing various entities.
  • Secondary methods refer to the methods where the data has already been gathered before the study, and is available as already published facts and reports.

Data Dictionary

  • A Data Dictionary is a file that describes the structure of the database itself.
  • Includes details like –
    • Number of records
    • Name of each field
    • Characteristic of each field
    • Description of each field
    • Relationships between different fields
  • It helps in analyzing different data variables and their relationships between each other.

Outlier Treatment

  • Outlier is a point or an observation that deviates significantly from the other observations.
  • Due to experimental errors or “special circumstances”
  • Outlier detection tests to check for outliers
  • Outlier treatment –
    • Retention
    • Exclusion
    • Other treatment methods

Summary

  • What is analytics and analysis, and what are the differences between them
  • Popular tools used in analytics
  • What does a data scientist do
  • The processes involved in analytics life cycle
  • How to formally define a problem statement
  • Methods of collecting and summarizing data for analytics
  • Data dictionary and its contents
  • What are outliers and how to detect and treat outliers

00 Business Analytics Foundation With SAS and Excel Tools

Objectives

  • Understand what analytics is
  • Understand where analytics is applied and the process involved in it
  • Get an overview of the various topics that will be covered in different lessons
  • Describe the career path of a business analyst

Analytics

Analytics is a journey that involves a combination of potential skills, advanced technologies, applications, and processes used by firm to gain business insights from data and statistics. This is done to perform business planning.


Places Where Analytics is Applied

Financial services: Credit scoring, fraud detection, pricing, claims analysis
Retail: Promotions, replenishment, demand forecasting, merchandizing optimization
Manufacturing: Inventory replenishment, product customization, supply chain optimization
Health care: Drug interaction, preliminary diagnosis, disease management
Energy: Trading, supply, demand forecasting, compliance
Communications: Customer retention, capacity planning, network optimization

Topics Covered

Lesson 1: Introduction to analytics

  • Definition of analytics
  • The analytical process
  • Applications of analytics
  • Understanding of problem
  • How a data scientist should act
  • Data collection and preparation

Lesson 2: Statistical concepts and their application in business

  • Overview of statistical methods
  • Descriptive statistics
  • Probability theory
  • Concept of test of significance
  • Hypothesis testing
  • Business uses of statistical methods

Lesson 3: Basic analytic techniques

  • Basic introduction to SAS and Excel
  • Data exploration
  • Data visualization
  • Diagnostic analytics
  • SAS and Excel tool—Business usage

Lesson 4: Predictive modeling techniques

  • Regression analysis
  • Types of models in regression analysis
  • Logistic regression
  • Time series analysis
  • Cluster analysis
  • Predictive modeling – Business usage

Career Path

DATA MANAGER
  • Data preparation
  • Deployment services
  • Report administration
DATA MINER
  • Exploratory analysis
  • Descriptive segmentation
  • Predictive modeling
BUSINESS MANAGER
  • Campaign management
  • Domain expertise
  • Process and ROI evaluation