In to the dynamic world of data science, statistics is the compass guiding professionals to navigate the vast sea of information. Whether you’re a newcomer or a seasoned data scientist, these fundamental statistical ideas are your key to unraveling data’s mysteries, making sound decisions, and predicting the future. This cheatsheet compiles all these essential concepts in one place, serving as your quick-reference guide.
The article is organised in the following sub sections:
- Probability Distributions
- Key Hypothesis Testing Concepts
- Non-Parametric Statistics
- Descriptive Statistics
- Inferential Statistics
- Data Sampling
- Regression Analysis
- Feature Selection
- Statistical Evaluation on Model
- Bayesian Statistics
1. Probability Distributions
Understanding probability distributions is crucial in data science because it allows data scientists to model and analyse data, make predictions, and estimate uncertainties. Different distributions provide insights into the nature of data, guiding the selection of appropriate statistical methods and helping in making informed decisions.
⚡Normal Distribution (Gaussian Distribution): A symmetric bell-shaped distribution often used to model continuous data. Its probability density function (PDF) is characterised by mean (μ) and standard deviation (σ).
⚡Binomial Distribution: Used to model the number of successes (e.g., heads in coin tosses) in a fixed number (n) of independent Bernoulli trials. It is characterised by parameters n (number of trials) and p (probability of success).
⚡Poisson Distribution: Models the number of events occurring in a fixed interval of time or space. It’s often used for count data. It’s characterised by the rate parameter (λ).
⚡Exponential Distribution: Describes the time between events in a Poisson process. It’s characterised by the rate parameter (λ).