Exploratory Data Analysis(EDA) using python | by Vidush Agarwal

Purpose of visualisation is to make data easy to understand even for the people with non technical background. That being said, it is very important to make sure that meaningful and relevant graphs and charts are made.

1. Bar Chart

A bar chart, also known as a bar graph, is a data visualization tool used to represent categorical data with rectangular bars. It is commonly used to present and compare the values of different categories or groups.

#fig, ax = plt.subplots(figsize=(60, 60))
fig=plt.figure(figsize =(10,10))
sns.set(font_scale=1.25)
b=sns.barplot(data=df, x="Job Roles", y="Salary", hue="Employment Status",errorbar=None)
plt.xticks(rotation=30)
plt.show()

Above code will give something like this:

We can see that it gives a clear understanding of comparison between people belonging to different Job Role and their Salary. Being grouped bar chart it also provides with information with respect to different employment status in same job role.

According to the dataset, people working with IOS are mostly Contract based and their salary is higher than any other Job Role.

Count of all these can also be used to tell a story(numerical form)but visualization like this makes it easier and more fun to work with.

Overall, bar charts are a versatile and widely used data visualization tool that helps convey information in a clear and easily understandable manner, making them valuable for data analysis, reporting, and decision-making processes.

2. Pie Chart

A pie chart is a circular data visualization tool that is used to represent the composition or distribution of a whole into different parts or categories. It is particularly effective in illustrating proportions and percentages.

a=df['Job Roles']
a=set(a)
print(a)
job_roles=df['Job Roles'].tolist()
q=[]
for i in a:
q.append(job_roles.count(i))

a will have values:

Above code will provide with count for every Job Role and store it in an array named ‘q’

fig=plt.figure(figsize =(7,7))plt.pie(q,labels=a,autopct='%.0f%%')
plt.show()

Above code will plot a pie chart:

Pie chart for composition of different Job Roles

Above implementation provides us with visual representation of the different Job Roles within a whole. It allows viewers to quickly grasp the proportions and identify the most significant or least significant one. SDE taking up the most share and Mobile having the least.

3. Horizontal Bar Chart

Let’s look at another type of Bar Chart, we will be using it for some different purpose.

A horizontal bar chart is a type of data visualization that presents categorical data using horizontal bars. In this chart, the categories or groups are displayed along the vertical axis (y-axis), while the length or width of the bars represents the values or frequencies associated with each category.

It allows for quick and intuitive comparison between categories, making it especially useful when the category labels are lengthy.

companies = df["Company Name"].value_counts().head(50)
plt.figure(figsize=(12,15))
sns.barplot(y=companies.index, x=companies.values, palette='plasma')
plt.show()

Above code will make a horizontal bar chart for the dataset taking only first 50 companies in consideration. We have done this just to make it easier to understand. Number of values may change as per need.

Here the labels are too big to fit properly in vertical bar chart so horizontal bar chart is out option to go.

Here, bar chart is not so accurate because we have taken a random sample which has resulted in some anomalies but we can get a rough idea. If accurate visualization is required then we can use whole dataset also.

Since we have used count() function in the code, the plot shows us the count of jobs in each company. We can also use count plot instead of this.

4. Count Plot

Python provides the support for count plot which makes our work easier as we won’t have to count the values separately as we did above.

A count plot is a type of data visualization that displays the number of occurrences of different categories or values in a dataset. It is essentially a bar chart where the height of each bar represents the count or frequency of a particular category or value.

Let’s say for the dataset, we are interested in knowing number of jobs at each location. For this, we need count plot. We can also use bar chart but we will have to manipulate data to fit our need.

plt.figure(figsize=(20,10))
plt.title('Location')
sns.countplot(x=df["Location"])

From the above plot, we get to know the distribution of jobs at different locations.

Through the analysis of various graphs and plots, several insights can be derived. Firstly, it is observed that Bangalore has the highest number of job opportunities compared to other locations.

When examining different job roles, it is evident that for full-time workers, positions other than Testing and Mobile offer higher salaries. However, in the case of contract workers, Testing and Mobile roles provide the highest remuneration.

These findings highlight the unique dynamics of job roles and salary distributions. Bangalore emerges as a thriving hub for employment, while the nature of compensation varies depending on the employment type and specific job roles.

In Similar way, we can harness the power of data to derive meaningful insights from it.

In conclusion, Exploratory Data Analysis (EDA) is a powerful approach that enables us to uncover valuable insights and extract meaningful information from data. By employing various statistical techniques, visualizations, and exploratory techniques, we can gain a deep understanding of the data and its underlying patterns, relationships, and trends. Through EDA, we can identify important features, detect outliers, explore correlations, and make data-driven decisions. The iterative nature of EDA allows us to refine our analysis, ask new questions, and discover hidden insights that can drive innovation, optimize strategies, and guide decision-making. With EDA as a fundamental step in the data analysis process, we can unlock the full potential of data and leverage its power to gain a competitive edge in today’s data-driven world.

Source link

Leave a Reply Cancel reply

Related Stories

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

VC-Dimension V.S. Inductive Bias V.S. Biology V.S. Physical Laws : Comprehensive Multi-Disciplinary Table of Machine Learning Classifiers | by Medium_AI_CS_ML | Apr, 2024

Why Machine Learning Is Worth Talking About? | by jupytermishra | Apr, 2024

You may have missed

The Weekly Reorg: Bitcoin Fashion Week

Virtual curating frees artist – Hypergrid Business

Different types of artificial intelligence (AI) | by Robert Ishimura Sousa | Apr, 2024

Azteco Is Helping Millions Buy Bitcoin Without Sharing Their Identity