Avada Kedavra (aka The Killing Curse) may be the most deadly curse of all but in this Muggle world, we have to deal with something much worse than that.

The Curse of Dimensionality

There is a particular species of Muggles also known as Data Scientists, who have to deal with this curse in their day to day job.

(Okay, so enough with the harry potter references and let's get right into business.)

The curse of dimensionality is a problem that arises when we are working with a lot of data having multiple features or we can say it as high…

Pandas is an open-source python library that is used for data manipulation, data cleaning and analysis. It provides many functions to speed up the data analysis process. Pandas is built on top of the NumPy package, hence it takes a lot of basic inspiration from it. The two primary data structures are Series which is 1 dimensional and Dataframe which is 2 dimensional.

It is one of the most important and useful tools in the arsenal of a Data Scientist and a Data Analyst.

So, lets get started.

If you have python and pip already installed in your system, then…

Photo by Luke Chesser on Unsplash


Exploratory Data Analysis is a process of analyzing or understanding the data and extracting insights or main characteristics of the data. EDA is generally classified into two methods, i.e. graphical analysis and non-graphical analysis.

EDA is very essential because it is a good practice to first understand the problem statement and the various relationships between the data features before getting your hands dirty.

Exploratory Data Analysis

Technically, The primary motive of EDA is to

  • Examine the data distribution
  • Handling missing values of the dataset(most common issue with every dataset)
  • Handling the outliers
  • Removing duplicate data
  • Encoding the categorical variables
  • Normalizing…

Nikhil Raj

Engineer by profession, Artist by passion.

