Learn With the Data

Posts

Learn to Survive with Titanic Dataset

In this tutorial, we will learn about one of the most popular datasets in data science. It will give you idea about how to analyze and relate with real conditions. The Challenge The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive compared to others. In this challenge, we will need to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc). This post will help you start with data science and familiarize yourself with Machine Learning. The competition is simple: use machi...

Read Full »

Z-statistic for a Confidence Level and Estimating a Confidence Interval

Before knowing what a Confidence Level is, it is imperative to understand what a Confidence Interval is. It is an estimate computed from the statistic of the data, a range of possible values for an unknown parameter (e.g. mean of a distribution, standard deviation). In general, it is an interval for an unknown population parameter based on the sampling distribution of the estimator. Read the below few lines before we get to the calculation part of it. A confidence interval is how much uncertainty there is with any particular statistic. Confidence intervals have a margin of error. It tells you how confident you can be that the results from a poll or survey reflect what you would expect to find if it were possible to perform a survey for the entire population. Intrinsically, confidence interval is related to the confidence levels. Confidence Level and Confidence Interval Confidence level is expressed as a percentage (for example, a 90% confidence level). It means tha...

Read Full »

randn and normal in Numpy

There is always a confusion as to why we have two functions of randn and normal for giving the same output. And the key to understanding the difference between them is to understand, what is a normal and standard normal distribution. Normal Distribution : It is a Gaussian distribution or a bell shaped curve which has values distributed around a central value (i.e. the mean) with some standard deviation(i.e. the spread of the distribution). Definition in python is as below : numpy . random . normal ( loc = 0.0 , scale = 1.0 , size =100 ) This draws a random sample from the normal Gaussian distribution of dimensions 1x100 centered i.e. a GENERIC normal distribution loc :- the central value around which values are located scale :- the standard deviation of the sample size :- the dimensions of the array returned Standard Normal Distribution : It is a distribution or a bell shaped curve which has values distributed around 0 wi...

Read Full »

Learn With the Data

Search This Blog

Posts

MMM - Guide to Marketing Mix Modelling

Learn to Survive with Titanic Dataset

Z-statistic for a Confidence Level and Estimating a Confidence Interval

randn and normal in Numpy