Introduction
Matplotlib in python is a very important and convenient graphical tool. You can use matplotlib to visually analyze data. Today, this article will explain the matplotlib application in Pandas in detail.
Basic drawing
To use matplotlib, we need to quote it:
In [1]: import matplotlib.pyplot as plt
Suppose we want to randomly generate 365 days of data from January 1, 2020, and then draw a graph to indicate that it should be written like this:
ts = pd.Series(np.random.randn(365), index=pd.date_range("1/1/2020", periods=365))
ts.plot()
Use DF to draw images of multiple Series at the same time:
df3 = pd.DataFrame(np.random.randn(365, 4), index=ts.index, columns=list("ABCD"))
df3= df3.cumsum()
df3.plot()
You can specify the data used in rows and columns:
df3 = pd.DataFrame(np.random.randn(365, 2), columns=["B", "C"]).cumsum()
df3["A"] = pd.Series(list(range(len(df))))
df3.plot(x="A", y="B");
Other images
plot() supports many image types, including bar, hist, box, density, area, scatter, hexbin, pie, etc. Let's see how to use them with examples.
bar
df.iloc[5].plot(kind="bar");
Multiple columns of bar:
df2 = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
df2.plot.bar();
stacked bar
df2.plot.bar(stacked=True);
barh
barh represents the horizontal bar chart:
df2.plot.barh(stacked=True);
Histograms
df2.plot.hist(alpha=0.5);
box
df.plot.box();
The color of the box can be customized:
color = {
....: "boxes": "DarkGreen",
....: "whiskers": "DarkOrange",
....: "medians": "DarkBlue",
....: "caps": "Gray",
....: }
df.plot.box(color=color, sym="r+");
Can be converted to horizontal:
df.plot.box(vert=False);
In addition to box, you can also use DataFrame.boxplot to draw box plots:
In [42]: df = pd.DataFrame(np.random.rand(10, 5))
In [44]: bp = df.boxplot()
Boxplot can use by to group:
df = pd.DataFrame(np.random.rand(10, 2), columns=["Col1", "Col2"])
df
Out[90]:
Col1 Col2
0 0.047633 0.150047
1 0.296385 0.212826
2 0.562141 0.136243
3 0.997786 0.224560
4 0.585457 0.178914
5 0.551201 0.867102
6 0.740142 0.003872
7 0.959130 0.581506
8 0.114489 0.534242
9 0.042882 0.314845
df.boxplot()
Now add a column to df:
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
df
Out[92]:
Col1 Col2 X
0 0.047633 0.150047 A
1 0.296385 0.212826 A
2 0.562141 0.136243 A
3 0.997786 0.224560 A
4 0.585457 0.178914 A
5 0.551201 0.867102 B
6 0.740142 0.003872 B
7 0.959130 0.581506 B
8 0.114489 0.534242 B
9 0.042882 0.314845 B
bp = df.boxplot(by="X")
Area
Use Series.plot.area() or DataFrame.plot.area() to draw area graphs.
In [60]: df = pd.DataFrame(np.random.rand(10, 4), columns=["a", "b", "c", "d"])
In [61]: df.plot.area();
If you don’t want to stack up, you can specify stacked=False
In [62]: df.plot.area(stacked=False);
Scatter
DataFrame.plot.scatter() can create dot plots.
In [63]: df = pd.DataFrame(np.random.rand(50, 4), columns=["a", "b", "c", "d"])
In [64]: df.plot.scatter(x="a", y="b");
The scatter chart can also have a third axis:
df.plot.scatter(x="a", y="b", c="c", s=50);
The third parameter can be changed to the size of the scatter point:
df.plot.scatter(x="a", y="b", s=df["c"] * 200);
Hexagonal bin
Use DataFrame.plot.hexbin() to create a honeycomb graph:
In [69]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
In [70]: df["b"] = df["b"] + np.arange(1000)
In [71]: df.plot.hexbin(x="a", y="b", gridsize=25);
By default, the color depth represents the number of elements in (x, y). You can specify different aggregation methods through reduce_C_function: for example, mean
, max
, sum
, std
.
In [72]: df = pd.DataFrame(np.random.randn(1000, 2), columns=["a", "b"])
In [73]: df["b"] = df["b"] = df["b"] + np.arange(1000)
In [74]: df["z"] = np.random.uniform(0, 3, 1000)
In [75]: df.plot.hexbin(x="a", y="b", C="z", reduce_C_function=np.max, gridsize=25);
Pie
Use DataFrame.plot.pie() or Series.plot.pie() to build a pie chart:
In [76]: series = pd.Series(3 * np.random.rand(4), index=["a", "b", "c", "d"], name="series")
In [77]: series.plot.pie(figsize=(6, 6));
You can make a graph according to the number of columns:
In [78]: df = pd.DataFrame(
....: 3 * np.random.rand(4, 2), index=["a", "b", "c", "d"], columns=["x", "y"]
....: )
....:
In [79]: df.plot.pie(subplots=True, figsize=(8, 4));
More customized content:
In [80]: series.plot.pie(
....: labels=["AA", "BB", "CC", "DD"],
....: colors=["r", "g", "b", "c"],
....: autopct="%.2f",
....: fontsize=20,
....: figsize=(6, 6),
....: );
If the value of the incoming value does not add up to 1, then an umbrella will be drawn:
In [81]: series = pd.Series([0.1] * 4, index=["a", "b", "c", "d"], name="series2")
In [82]: series.plot.pie(figsize=(6, 6));
Processing NaN data in the drawing
The following is the way to handle NaN data in the default drawing mode:
Drawing method | Ways to deal with NaN |
---|---|
Line | Leave gaps at NaNs |
Line (stacked) | Fill 0’s |
Bar | Fill 0’s |
Scatter | Drop NaNs |
Histogram | Drop NaNs (column-wise) |
Box | Drop NaNs (column-wise) |
Area | Fill 0’s |
KDE | Drop NaNs (column-wise) |
Hexbin | Drop NaNs |
Pie | Fill 0’s |
Other drawing tools
Scatter matrix
You can use scatter_matrix in pandas.plotting to draw a scatter matrix chart:
In [83]: from pandas.plotting import scatter_matrix
In [84]: df = pd.DataFrame(np.random.randn(1000, 4), columns=["a", "b", "c", "d"])
In [85]: scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal="kde");
Density plot
Use Series.plot.kde() and DataFrame.plot.kde() to draw a density map:
In [86]: ser = pd.Series(np.random.randn(1000))
In [87]: ser.plot.kde();
Andrews curves
The Andrews curve allows multivariate data to be drawn as a large number of curves, which are created using the properties of the sample as the coefficients of the Fourier series. By coloring these curves differently for each class, data clustering can be visualized. The curves of samples belonging to the same category are usually closer together and form a larger structure.
In [88]: from pandas.plotting import andrews_curves
In [89]: data = pd.read_csv("data/iris.data")
In [90]: plt.figure();
In [91]: andrews_curves(data, "Name");
Parallel coordinates
Parallel coordinates is a drawing technique used to draw multivariate data. Parallel coordinates allow people to view clusters in the data and visually estimate other statistical information. Use parallel coordinate points to represent connected line segments. Each vertical line represents an attribute. A set of connected line segments represents a data point. Points that tend to cluster will appear closer.
In [92]: from pandas.plotting import parallel_coordinates
In [93]: data = pd.read_csv("data/iris.data")
In [94]: plt.figure();
In [95]: parallel_coordinates(data, "Name");
Lag plot
Lag chart is a scatter chart made with time series and corresponding lag order series. Can be used to observe autocorrelation.
In [96]: from pandas.plotting import lag_plot
In [97]: plt.figure();
In [98]: spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000)
In [99]: data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing))
In [100]: lag_plot(data);
Autocorrelation plot
Autocorrelation graphs are often used to check randomness in time series. The autocorrelation graph is a plane two-dimensional coordinate dangling line graph. The abscissa represents the delay order, and the ordinate represents the autocorrelation coefficient.
In [101]: from pandas.plotting import autocorrelation_plot
In [102]: plt.figure();
In [103]: spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000)
In [104]: data = pd.Series(0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing))
In [105]: autocorrelation_plot(data);
Bootstrap plot
The bootstrap plot is used to visually evaluate the uncertainty of statistical data, such as mean, median, intermediate range, etc. Select a random subset of the specified size from the data set, calculate the relevant statistical information for the subset, and repeat the specified number of times. The generated graph and histogram constitute the guide graph.
In [106]: from pandas.plotting import bootstrap_plot
In [107]: data = pd.Series(np.random.rand(1000))
In [108]: bootstrap_plot(data, size=50, samples=500, color="grey");
RadViz
It is based on the spring tension minimization algorithm. It maps the feature of the data set to a point in the unit circle of the two-dimensional target space, and the position of the point is determined by the feature attached to the point. Put the instance into the center of the circle, and the feature will "pull" the instance toward the position of the instance in the circle (the normalized value corresponding to the instance).
In [109]: from pandas.plotting import radviz
In [110]: data = pd.read_csv("data/iris.data")
In [111]: plt.figure();
In [112]: radviz(data, "Name");
Image format
After matplotlib 1.5 version, many default drawing settings are provided, which can be set by matplotlib.style.use(my_plot_style).
You can list all available style types by using matplotlib.style.available:
import matplotlib as plt;
plt.style.available
Out[128]:
['seaborn-dark',
'seaborn-darkgrid',
'seaborn-ticks',
'fivethirtyeight',
'seaborn-whitegrid',
'classic',
'_classic_test',
'fast',
'seaborn-talk',
'seaborn-dark-palette',
'seaborn-bright',
'seaborn-pastel',
'grayscale',
'seaborn-notebook',
'ggplot',
'seaborn-colorblind',
'seaborn-muted',
'seaborn',
'Solarize_Light2',
'seaborn-paper',
'bmh',
'seaborn-white',
'dark_background',
'seaborn-poster',
'seaborn-deep']
Remove small icons
By default, the drawn graph will have an icon indicating the column type, which can be disabled by using legend=False:
In [115]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, columns=list("ABCD"))
In [116]: df = df.cumsum()
In [117]: df.plot(legend=False);
Set the name of the label
In [118]: df.plot();
In [119]: df.plot(xlabel="new x", ylabel="new y");
Zoom
If the X-axis or Y-axis data difference is too large in the drawing, it may cause the image display to be unfriendly, and the parts with small values cannot be displayed basically. You can pass in logy=True to zoom in on the Y-axis:
In [120]: ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
In [121]: ts = np.exp(ts.cumsum())
In [122]: ts.plot(logy=True);
Multiple Y axis
Use secondary_y=True to plot multiple Y-axis data:
In [125]: plt.figure();
In [126]: ax = df.plot(secondary_y=["A", "B"])
In [127]: ax.set_ylabel("CD scale");
In [128]: ax.right_ax.set_ylabel("AB scale");
Right is added to the small icon by default, if you want to remove it, you can set mark_right=False:
In [129]: plt.figure();
In [130]: df.plot(secondary_y=["A", "B"], mark_right=False);
Coordinate text adjustment
When using time as a coordinate, because the time is too long, the display of the x-axis coordinate value is incomplete, you can use x_compat=True to adjust:
In [133]: plt.figure();
In [134]: df["A"].plot(x_compat=True);
If there are multiple images that need to be adjusted, you can use with:
In [135]: plt.figure();
In [136]: with pd.plotting.plot_params.use("x_compat", True):
.....: df["A"].plot(color="r")
.....: df["B"].plot(color="g")
.....: df["C"].plot(color="b")
.....:
Subgraph
When drawing DF, multiple Series can be divided into sub-graphs to display:
In [137]: df.plot(subplots=True, figsize=(6, 6));
You can modify the layout of the subgraph:
df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False);
The above is equivalent to:
In [139]: df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False);
A more complex example:
In [140]: fig, axes = plt.subplots(4, 4, figsize=(9, 9))
In [141]: plt.subplots_adjust(wspace=0.5, hspace=0.5)
In [142]: target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]]
In [143]: target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]]
In [144]: df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False);
In [145]: (-df).plot(subplots=True, ax=target2, legend=False, sharex=False, sharey=False);
Draw form
If you set table=True, you can directly display the table data in the figure:
In [165]: fig, ax = plt.subplots(1, 1, figsize=(7, 6.5))
In [166]: df = pd.DataFrame(np.random.rand(5, 3), columns=["a", "b", "c"])
In [167]: ax.xaxis.tick_top() # Display x-axis ticks on top.
In [168]: df.plot(table=True, ax=ax)
fig
The table can also be displayed on the picture:
In [172]: from pandas.plotting import table
In [173]: fig, ax = plt.subplots(1, 1)
In [174]: table(ax, np.round(df.describe(), 2), loc="upper right", colWidths=[0.2, 0.2, 0.2]);
In [175]: df.plot(ax=ax, ylim=(0, 2), legend=None);
Use Colormaps
If there is too much data on the Y axis, the default line color may be difficult to distinguish. In this case, colormap can be passed in.
In [176]: df = pd.DataFrame(np.random.randn(1000, 10), index=ts.index)
In [177]: df = df.cumsum()
In [178]: plt.figure();
In [179]: df.plot(colormap="cubehelix");
This article has been included in http://www.flydean.com/09-python-pandas-plot/
The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。