Visualizing Proportions#

Visualizing Proportions

Get to know your mushrooms 🍄#

Mushrooms are very interesting. Let’s import a dataset to study them:

import matplotlib.pyplot as plt
import pandas as pd

mushrooms = pd.read_csv("../../data/mushrooms.csv")
mushrooms.head()
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
0 Poisonous Convex Smooth Brown Bruises Pungent Free Close Narrow Black ... Smooth White White Partial White One Pendant Black Scattered Urban
1 Edible Convex Smooth Yellow Bruises Almond Free Close Broad Black ... Smooth White White Partial White One Pendant Brown Numerous Grasses
2 Edible Bell Smooth White Bruises Anise Free Close Broad Brown ... Smooth White White Partial White One Pendant Brown Numerous Meadows
3 Poisonous Convex Scaly White Bruises Pungent Free Close Narrow Brown ... Smooth White White Partial White One Pendant Black Scattered Urban
4 Edible Convex Smooth Green No Bruises NaN Free Crowded Broad Black ... Smooth White White Partial White One Evanescent Brown Abundant Grasses

5 rows × 23 columns

Right away, you notice that all the data is textual. You will have to convert this data to be able to use it in a chart. Most of the data, in fact, is represented as an object:

print(mushrooms.select_dtypes(["object"]).columns)
Index(['class', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',
       'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',
       'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',
       'stalk-surface-below-ring', 'stalk-color-above-ring',
       'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',
       'ring-type', 'spore-print-color', 'population', 'habitat'],
      dtype='object')

Take this data and convert the ‘class’ column to a category:

cols = mushrooms.select_dtypes(["object"]).columns
mushrooms[cols] = mushrooms[cols].astype("category")
edibleclass = mushrooms.groupby(["class"]).count()
edibleclass
/var/folders/7w/fv5n0x414253d7dv5g2wwmb40000gn/T/ipykernel_93696/2186401341.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  edibleclass = mushrooms.groupby(["class"]).count()
cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color stalk-shape ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
class
Edible 4208 4208 4208 4208 800 4208 4208 4208 4208 4208 ... 4208 4208 4208 4208 4208 4208 4208 4208 4208 4208
Poisonous 3916 3916 3916 3916 3796 3916 3916 3916 3916 3916 ... 3916 3916 3916 3916 3916 3880 3880 3916 3916 3916

2 rows × 22 columns

Now, if you print out the mushrooms data, you can see that it has been grouped into categories according to the poisonous/edible class.

Pie 🥧#

labels = ["Edible", "Poisonous"]
plt.pie(edibleclass["population"], labels=labels, autopct="%.1f %%")
plt.title("Edible?")
plt.show()
../_images/245b0a00ecda77a7eb4949935284b9a467ca9d09cbec93043a28406d32701116.png

Voila, a pie chart showing the proportions of this data according to these two classes of mushrooms. It’s quite important to get the order of the labels correct, especially here, so be sure to verify the order with which the label array is built!

Donuts 🍩#

A somewhat more visually interesting pie chart is a donut chart, which is a pie chart with a hole in the middle. Let’s look at our data using this method.

Take a look at the various habitats where mushrooms grow:

habitat = mushrooms.groupby(["habitat"]).count()
habitat
/var/folders/7w/fv5n0x414253d7dv5g2wwmb40000gn/T/ipykernel_93696/3512452478.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  habitat = mushrooms.groupby(["habitat"]).count()
class cap-shape cap-surface cap-color bruises odor gill-attachment gill-spacing gill-size gill-color ... stalk-surface-above-ring stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population
habitat
Grasses 2148 2148 2148 2148 2148 1056 2148 2148 2148 2148 ... 2148 2148 2148 2148 2148 2148 2148 2148 2148 2148
Leaves 832 832 832 832 832 576 832 832 832 832 ... 832 832 832 832 832 832 832 832 832 832
Meadows 292 292 292 292 292 256 292 292 292 292 ... 292 292 292 292 292 292 292 292 292 292
Paths 1144 1144 1144 1144 1144 1104 1144 1144 1144 1144 ... 1144 1144 1144 1144 1144 1144 1144 1144 1144 1144
Urban 368 368 368 368 368 272 368 368 368 368 ... 368 368 368 368 368 368 368 368 368 368
Waste 192 192 192 192 192 0 192 192 192 192 ... 192 192 192 192 192 192 192 192 192 192
Wood 3148 3148 3148 3148 3148 1332 3148 3148 3148 3148 ... 3148 3148 3148 3148 3148 3148 3112 3112 3148 3148

7 rows × 22 columns

Here, you are grouping your data by habitat. There are 7 listed, so use those as labels for your donut chart:

labels = ["Grasses", "Leaves", "Meadows", "Paths", "Urban", "Waste", "Wood"]

plt.pie(habitat["class"], labels=labels, autopct="%1.1f%%", pctdistance=0.85)

center_circle = plt.Circle((0, 0), 0.40, fc="white")
fig = plt.gcf()

fig.gca().add_artist(center_circle)

plt.title("Mushroom Habitats")

plt.show()
../_images/e99e9a86a3a76dfc7fc1d107f61a091417d8f4150029ccc1c3bea36450d90c00.png

This code draws a chart and a center circle, then adds that center circle in the chart. Edit the width of the center circle by changing 0.40 to another value.

Donut charts can be tweaked in several ways to change the labels. The labels in particular can be highlighted for readability. Learn more in the docs.

Now that you know how to group your data and then display it as a pie or donut, you can explore other types of charts. Try a waffle chart, which is just a different way of exploring quantity.m

Waffles 🧇#

A ‘waffle’ type chart is a different way to visualize quantities as a 2D array of squares. Try visualizing the different quantities of mushroom cap colors in this dataset. To do this, you need to install a helper library called PyWaffle and use Matplotlib:

pip install pywaffle

Select a segment of your data to group:

capcolor = mushrooms.groupby(["cap-color"]).count()
capcolor
/var/folders/7w/fv5n0x414253d7dv5g2wwmb40000gn/T/ipykernel_93696/1230659777.py:1: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  capcolor = mushrooms.groupby(["cap-color"]).count()
class cap-shape cap-surface bruises odor gill-attachment gill-spacing gill-size gill-color stalk-shape ... stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring veil-type veil-color ring-number ring-type spore-print-color population habitat
cap-color
Brown 2284 2284 2284 2284 1100 2284 2284 2284 2284 2284 ... 2284 2284 2284 2284 2284 2272 2272 2284 2284 2284
Buff 168 168 168 168 96 168 168 168 168 168 ... 168 168 168 168 168 168 168 168 168 168
Cinnamon 44 44 44 44 12 44 44 44 44 44 ... 44 44 44 44 44 32 32 44 44 44
Green 1856 1856 1856 1856 808 1856 1856 1856 1856 1856 ... 1856 1856 1856 1856 1856 1856 1856 1856 1856 1856
Pink 144 144 144 144 64 144 144 144 144 144 ... 144 144 144 144 144 144 144 144 144 144
Purple 16 16 16 16 0 16 16 16 16 16 ... 16 16 16 16 16 16 16 16 16 16
Red 1500 1500 1500 1500 876 1500 1500 1500 1500 1500 ... 1500 1500 1500 1500 1500 1488 1488 1500 1500 1500
White 1040 1040 1040 1040 592 1040 1040 1040 1040 1040 ... 1040 1040 1040 1040 1040 1040 1040 1040 1040 1040
Yellow 1072 1072 1072 1072 1048 1072 1072 1072 1072 1072 ... 1072 1072 1072 1072 1072 1072 1072 1072 1072 1072

9 rows × 22 columns

Create a waffle chart by creating labels and then grouping your data:

import matplotlib.pyplot as plt
import pandas as pd
from pywaffle import Waffle

data = {
    "color": [
        "brown",
        "buff",
        "cinnamon",
        "green",
        "pink",
        "purple",
        "red",
        "white",
        "yellow",
    ],
    "amount": capcolor["class"],
}

df = pd.DataFrame(data)

fig = plt.figure(
    FigureClass=Waffle,
    rows=100,
    values=df.amount,
    labels=list(df.color),
    figsize=(30, 30),
    colors=[
        "brown",
        "tan",
        "maroon",
        "green",
        "pink",
        "purple",
        "red",
        "whitesmoke",
        "yellow",
    ],
)
../_images/2c4fa3dcf66fd3775f84f8ee548a2b5ca0ff97dbddcc9e0af3e79bf786ab0650.png

Using a waffle chart, you can plainly see the proportions of cap colors of this mushrooms dataset. Interestingly, there are many green-capped mushrooms!

✅ Pywaffle supports icons within the charts that use any icon available in Font Awesome. Do some experiments to create an even more interesting waffle chart using icons instead of squares.

In this lesson, you learned three ways to visualize proportions. First, you need to group your data into categories and then decide which is the best way to display the data - pie, donut, or waffle. All are delicious and gratify the user with an instant snapshot of a dataset.