Skip to main content

randn and normal in Numpy

There is always a confusion as to why we have two functions of randn and normal for giving the same output. And the key to understanding the difference between them is to understand, what is a normal and standard normal distribution.

Normal Distribution :
It is a Gaussian distribution or a bell shaped curve which has values distributed around a central value (i.e. the mean) with some standard deviation(i.e. the spread of the distribution).

Definition in python is as below :
        numpy.random.normal(loc=0.0, scale=1.0, size=100)
This draws a random sample from the normal Gaussian distribution of dimensions 1x100 centered i.e. a GENERIC normal distribution

loc :- the central value around which values are located
scale :- the standard deviation of the sample
size :- the dimensions of the array returned

Standard Normal Distribution :
It is a distribution or a bell shaped curve which has values distributed around 0 with a standard deviation of 1.

Definition in python is as below :
        numpy.random.randn(100)
This draws a random sample from the standard normal distribution of dimensions 1X100 i.e. a SPECIFIC normal distribution

Now here, the values are located around 0 with standard deviation 1.

In the code lines used above, you will find no difference between the outputs as by default normal has mean 0 and standard deviation 1 i.e. same as the standard normal distribution.

Now try playing around with the loc and scale in normal.

You can use the below lines and see the difference.
numpy.random.normal(loc = 10, scale = 2, size = 100)
numpy.random.normal(loc = -5, scale = 1, size = 100)
Below is a simple trick through which you achieve the same functionality of normal using randn only.

        2 * numpy.random.randn(100) + 10

        1 * numpy.random.randn(100) + (-5)

Both the methods produce the same output. Still it's good to know different ways of doing same thing.

Also don't confuse the above functions with 
        numpy.random.rand(100)

Now I leave this final part for you to explore yourself.

This is it for today.



Comments

Learn More

Learn to Survive with Titanic Dataset

In this tutorial, we will learn about one of the most popular datasets in data science. It will give you idea about how to analyze and relate with real conditions. The Challenge The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive compared to others. In this challenge, we will need to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc). This post will help you start with data science and familiarize yourself with Machine Learning. The competition is simple: use machi

MMM - Guide to Marketing Mix Modelling

The landscape of Indian media and ad-expenditure is constantly evolving and will continue to witness the fastest growth of 10.7% to reach Rs. 91641 crores.  While it is expected to see stable investment across media in India, Digital will garner approx. 65% of incremental ad spends in 2020. Also with the current pandemic situation where the use of print media is on decline, the digital model of marketing is set to gain more popularity. In the digital age, marketing spend is an important component of total expenses by any company. Hence the importance on how it's used and how much actual benefit these campaigns are making can't be understated. These days marketing is done through multiple channels TV, Radio, Newspaper, Banners, Social media, etc. which makes it even more challenging to quantify how much benefit each of these channel is making. Market mix model is a statistical model accepted industry wide to quantify these benefits and optimize the budget allotment to differ