Unlocking Insights: EDA with Descriptive Statistics
Descriptive statistics serve as a powerful lens, allowing us to glean valuable insights by condensing complex datasets into digestible summaries. These statistical measures, such as central tendencies (like mean, median, and mode) and measures of dispersion (like variance and standard deviation),
which is the quick way to get an idea about data with just a few calculations
For the past four days, I’ve been preparing to share a topic that’s not only easy to grasp but also essential for your journey into Exploratory Data Analysis (EDA).You see, I’m not one to simply focusing on theoretical concepts; I believe in applying them to real-world challenges.
In this article, I won’t just explain the concepts of central tendency and dispersion. Instead, I’ll take you on an analytical journey through Bangalore’s restaurant ratings.
The dataset I extracted from kaggle, including its multiple columns with relevant information for analysis.
To conduct effective data analysis and extract valuable insights, start by creating a list of questions based on your dataset. Ensure you have a clear understanding of what you want to discover from the data
Before analysis I’ve already have come up with some helpful question that give me understanding what actually I need to achieve with the data
So, before talking about what question I come with for analysis
Let’s first do the initial step of Descriptive Statistic
In my analysis, I focused on calculating the average (mean) of the Rating and Review Counts columns, as well as their Standard. These steps are essential in Descriptive Statistics through which help to get an overview of the dataset insights
Mean(average) which is used to Measure of Central Tendency in a dataset. It’s calculated by adding up all the values in a dataset and dividing by the total number of values. The mean represents the “typical” value in the dataset and provides insight into the dataset’s central tendency.
As, per our analysis the Rating(average mean) is 3.84 which convey that the average rating of all datapoint of Restaurant is 3.84
Standard Deviation (std)
The Standard Deviation measures the spread or dispersion of data points around the mean. It tells us how much individual data points deviate from the mean. A small standard deviation indicates that the data points are close to the mean, while a large standard deviation suggests that the data points are more spread out
In simple word, It helps us to understand the variability and consistency in the dataset
As, per our analysis the Standard-Deviation of Rating is 0.924 which describe that our how much our datapoint is away from the average-mean which also help to identify the outliers in the dataset
Now, let’s Analyze the dataset based on the list of question which I’ve come up with!
1st. Which Restaurant’s has the Highest Rating in Bangalore?
I was just curious to know which Restaurant has highest ratings and the Chai Point’s amazing 5.0 rating means they serve incredible tea, have a friendly atmosphere, and really care about making customers happy
and have average review & feedback
Here, we plot each and every restaurant average rating by all the people who ordered or visited. after Analysing we come to know is 3.9 for all the Restaurants overall gained which means that are good to order or to eat from all the Restaurants in Bangalore.
3rd. Is their any Relationship between the Review_Count’s and Ratings
We often assume that if a restaurant has the highest rating, it must also have the most reviews (feedback). However, my analysis revealed something interesting. There’s actually a weak connection between a restaurant’s rating and the number of reviews it gets. This means that a high rating doesn’t necessarily mean a lot of reviews, and vice versa.
So, in the world of restaurant ratings, things aren’t always as they seem!
4th.How many Restaurant are there in each Category and Sub-Category
Through, this we get an idea how much outlets are available of each Restaurant in the Bangalore based on the Category
We can make sense of the data more easily by using a bar graph. This graph shows the number of restaurants in each category, giving us a clear picture of how many restaurants fall into each Category. It’s like counting the different kinds of restaurants in Bangalore
In summary our analysis uncovered surprising reality:
High restaurant ratings in Bangalore do not necessarily equate to a higher number of reviews.
The weak relationship between these two factors challenges conventional wisdom. This exploration of descriptive statistics, category insights, and exceptional outliers showcases the power of data analysis
So, As we just covered three questions only but you can analysis based on more questions to get more valuable insights all you need the clarity about what you need to achieve with the dataset
Based on this Analysis you can also create you ML end-to-end model with the help of Random Forest Algorithm
One’s you done with the Analysis and gain valuable insight from the data then you can go for Feature Engineering based on the feature you already analysis the ratings,review_counts
Then, split the data and training& testing the dataset
As you can see, Descriptive Statistic methods can open up numerous opportunities and significantly contribute to gaining clarity about the data
Hope you enjoyed the article and learned a lot. Happy analyzing, and may your data adventures be filled with insights! Goodbye for now.
Thank you. Please clap or share this blog.