<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Everythingist</title><link>http://blairhudson.github.io/blog/</link><description>The personal blog of Blair Hudson, The Data Everythingist</description><atom:link href="http://blairhudson.github.io/blog/rss.xml" type="application/rss+xml" rel="self"></atom:link><language>en</language><copyright>Contents © 2017 &lt;a href="http://blairhudson.com"&gt;Blair Hudson&lt;/a&gt; </copyright><lastBuildDate>Sun, 19 Nov 2017 11:34:41 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Using bivariate kernel density estimation for plotting multi-task classification results</title><link>http://blairhudson.github.io/blog/posts/using-bivariate-kernel-density-estimation-for-plotting-multi-task-classification-results/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;One common technique for interpreting the outputs of a single classification model is kernel density estimation (KDE). Similar to a histogram, a KDE plot allows us to estimate the underlying probability density of our model.&lt;/p&gt;
&lt;p&gt;This is particularly useful for visualising the impact of selecting different classification thresholds (i.e. deciding at what point to round a given probability to 1 or 0).&lt;/p&gt;
&lt;p&gt;You can apply this visualisation technique to multi-task classifcation too. This uses bivariate KDE, which also generalises to multivariate KDE. Unforunately we're constrained to two tasks, given the limitation of having only two axes on 2D plots.&lt;/p&gt;
&lt;p&gt;To achieve this, we're going to create a suitable test dataset based on the Digits classification data, train a Random Forest Classifier using two labels, and output a bivariate KDE plot using the Seaborn visualisation library.&lt;/p&gt;
&lt;h3 id="Preparing-data-for-multi-task-learning"&gt;Preparing data for multi-task learning&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/using-bivariate-kernel-density-estimation-for-plotting-multi-task-classification-results/#Preparing-data-for-multi-task-learning"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;To simulate multi-task learning, we're going to load three classes of the Digits data (i.e. digits 0, 1 and 2), and break this into labels for two binary tasks:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;the digit is one&lt;/li&gt;
&lt;li&gt;the digit is two&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;(In the case that the digit is zero, both tasks will have a False label.)&lt;/p&gt;
&lt;p&gt;Before training our model, we'll also consolidate the two label sets into a single 2 x N set.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [1]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_digits&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_digits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_class&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;y_task1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;y_task2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="n"&gt;y_multitask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;y_task1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y_task2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h3 id="Training-a-multi-task-Random-Forest-classifier"&gt;Training a multi-task Random Forest classifier&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/using-bivariate-kernel-density-estimation-for-plotting-multi-task-classification-results/#Training-a-multi-task-Random-Forest-classifier"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;After splitting our data into train and test sets, we'll train the classifier and split the predicted probabilities into two sets:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;probability that the digit is one&lt;/li&gt;
&lt;li&gt;probability that the digit is two&lt;/li&gt;
&lt;/ol&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [14]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_multitask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RandomForestClassifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y_task1_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;y_task2_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h3 id="Visualisation-bivariate-KDE-with-Seaborn"&gt;Visualisation bivariate KDE with Seaborn&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/using-bivariate-kernel-density-estimation-for-plotting-multi-task-classification-results/#Visualisation-bivariate-KDE-with-Seaborn"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Seaborn (a set of extensions over Matplotlib) comes to the rescue with the built-in KDE plot function. The function accepts two data sets, one for the X-axis (in this case, task 1) and one fo the Y-axis (task 2).&lt;/p&gt;
&lt;p&gt;We've also constrained the axis to the range (0,1). Without this, the full range of the KDE function will be plotted, going below 0% and above 100%, which doesn't make sense in the context of our problem (or any binary classification task).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [13]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="k"&gt;matplotlib&lt;/span&gt; inline
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;sns&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;matplotlib&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="n"&gt;kdeplot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kdeplot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_task1_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_task2_hat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shade&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'grey'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'dotted'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axhline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'grey'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linewidth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'dotted'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;kdeplot&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'P(Task 1)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'P(Task 2)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xlim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ylim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;



&lt;div class="output_png output_subarea "&gt;
&lt;img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAYcAAAEJCAYAAAB/pOvWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz%0AAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmwXWWZ7/FvEEPEQAYoQ9owCjwRZcYiINCtYMsgJuqN%0AKOhtomCDiGCjXXLtautWSaNXUWaQyGAhCkSvpAWCGro1TOEqQxmGPJiAQDTEMiSBOOSEwP1jnxM2%0AZ9hnD++73net9ftUpeDs8Tn77L1++3nftd415pVXXkFERKTZFqkLEBGR/CgcRERkCIWDiIgMoXAQ%0AEZEhFA4iIjKEwkFERIaIGg5mdrCZ/WKYy483s1+Z2X1mdmrMGkREpHPRwsHM/hX4DjBu0OWvB74F%0A/CPw98CnzGxKrDpERKRzMTuH5cAHh7n8rcAyd1/j7n3A3cAREesQEZEObRnrgd39R2a2yzBXbQus%0Aa/r5RWDCaI837vMHFHIo95unbNf1fafuOL2j20/bbs+2b/uWSbu2fdu9Jk3rqI7pE7tv3LbfatQ/%0AXccmjh0b/DFju++uRRxyuL7jSH4mjN1yTDf3SzEh/QKwTdPP2wBrE9QxrN+vWt31fVc+uzRgJcVZ%0AunZV6hJeY21fX+oSOrbb7nukLkEkqBTh8Diwh5lNNrOxNIaU7ktQx4h6CYhOrFj9RNu3Xb7mqbZv%0A+9iaFd2U05U/bVg3+o1qYGwJux2RVgoLBzM70cw+5e4bgX8BfkojFK5x998XVUdsZe0eclO27mH+%0AvJtTlyAS1JiyrMpa1JxDs27nH2LNPXQy7wCaexCRcs05VF6s7qGToaUqKFP3cN9di1KXIBKUwqGF%0AouYeRERyo3CQzXrZa6nuE9PajVWqRuGQgSrstRRLWYaWrrvyitQliASlcBAJYObsD6cuQSQohUMk%0AZd2lNcehpTJ0D30lqFGkEwoHkQDuvOP21CWIBKVwqLgqzDuUwYlzPpm6BJGgFA5SCrkPLS26c2Hq%0AEkSCUjhE0ulR0lVR111a3zh+fOoSRIJSOLTQy/LdZZbbKq1lcODBM1KXIBKUwmEEdQ2GnOU8tDT3%0AkotSlyASlMJBJICPnvyJ1CWIBKVwGEavXUNd5xsG1HHeYc3zz6cuQSQohcMgGk7KW65DS4vv1qqs%0AUi3RziFdNqFCoZuuoZNzSUueZp/08dQliARV+87hzVO2U7cgPVu44LbUJYgEVdvOIUYg1H2uoc6m%0A7DA1dQkiQZUmHAZvzDs9EU+u3YGGlDq3tq8vu1OI7r3/AalLEAmqNOEwWG4b+6K6hk7PIy3FuOyC%0Ar3PGOV9IXYZIMKUNh1z0EgpFdA17TZoW/TkETv3MZ1OXIBJU7Seke6E5Bhmw4pmnU5cgEpTCoUu9%0ABkM3XYOGlPK15OGHUpcgEpTCoQspgkFeK7eD4WbOPiF1CSJBKRw6MHXH6cmCoZuuQfMNxVkw/5bU%0AJYgEpQnpUYScVyhLxzB94pTUJZTOrrvvnroEkaAUDiMIPdncSzBoriF/09/29tQliARVmmGl2HsG%0ADQwZhRg6GixFMGhIqVgXnn9e6hJEgipV5zDaRnvls0t7un8M6hjiyelI6bPP/VLqEkSCKlU4jCan%0A4w5Szi/00jVovqE7Sx99RENLUimlGVYqkxDBoK6hXJ5atix1CSJBVapzSC1Ut9BLMGiuIY1jZs5K%0AXYJIUOocApi23Z5ZBEOvNKTUvfnzbkpdgkhQ6hy6FGNOoddgUNeQzt777Z+6BJGgFA4diDXJHKJb%0AUDCkNW2nnVOXIBJUaYaVQg7ddPqcMZ87l2AIOaS0/VYTgj1WWcy99OLUJYgEFa1zMLMtgMuBfYEN%0AwCnuvqzp+pOAc4BNwDXufkU7jzvaRnrF6ic6qjPlLqe5BIP0Tif6kaqJOaw0Cxjn7oeY2QzgAmBm%0A0/XfAN4GrAceM7Mb3X1Nr09ahvWLQk06hwqGMk9E53Ig3JKHHtSpQqVSYg4rHQbcAeDui4GDBl3/%0AG2ACMA4YA7wSsZYsvGXSrgqGilr13MrUJYgEFTMctgXWNf28ycyaO5VHgAeAR4Fb3X1txFqSChkK%0AkPdQUh3nGwCOOua41CWIBBUzHF4Atml+Lnd/CcDM9gGOA3YFdgHeZGazI9ZSuIFACH3cQshgUNcQ%0Azrwbrk9dgkhQMcPhHuBYgP45hyVN160D/gr81d03AX8EJkWspRCxAmGAgiFfMw47InUJIkHFnJD+%0AMfAeM7uXxpzCHDM7ERjv7leZ2beBu82sD1gOXBexlmiKOKI552GkZnUdUgKYNHly6hJEghrzyivl%0AmAf+0q9vHVLo8jVPFVpDiqUtYgRDrK4hVTjksLfS3Esu4tQzz0pdhsgQE8ZuOaab+5U6HDrRKkhy%0AXAE1VrdQtWCAPMJBJFfdhkNtls/IMQCGE3MISfMM8Txw/2IOPHhG6jJEgqlNOOQu9rxCzGCo81zD%0AgD+vX5+6BJGgFA6JFTHZrI4hviOOPCp1CSJBlWbhvSrZa9K0zf9iix0M6hoavn/t1alLEAlKnUNB%0Ait4dVd1CsY48+tjUJYgEpXCIKNXxCUUFg7qGV43VHlNSMaUJh5E2tI+tWVFwJcPL5UC1ugVDLrux%0Azp93MyefdnrqMkSCKc1xDjcsf7irQkOFRy4b/5EUPYykcBApBx3nMILcN+q9SjG3kEsw5OS+uxZx%0AyOFaX0mqo/LhUFWpJpwVDCL1oHAomZR7ISkYRqauQapGxzmUxPSJUxQMGbvuyrZOgS5SGuocMpbL%0AsQoKhtHNnP3h1CWIBKVwyEwugTBAwdCevr6+1CWIBFWaYaXcNpqhDAwXpR42Gk7uwZDTbqx33nF7%0A6hJEgirNcQ4PrF45bKFL164qupSe5BYAI8k9GCCvcBDJVW2PcxhtY5siPMoSAMMpQyjkaNGdC7Uy%0Aq1RK6cNhNGXeUBdNwdC9N44fn7oEkaBKP6wkvStjKGhISaQ93Q4rlWZCWuIoYzDkaO4lF6UuQSQo%0AdQ41VIVAyK1zWP/ii4zfZpvUZYgMUdsJaWlfFUIB8gsGgDXPP69wkErRsFINbL/VhMoEQ64W370o%0AdQkiQWlYqcKqGgg5dg4iudKEtACvdgkKhmItXHBb6hJEgtKcQwVUNQjKZMoOU1OXIBJUaYaVnl7/%0AlyGF/mnDuhSlJFfXMMi1axDJWS2HlZqHUKo6nFL1368qLrvg66lLEAmq1J1DN3LtNrTRby33rqFv%0AwwbGbrVV6jJEhtBxDm3qZiPcTaBoY18vK555mt322DN1GSLB1C4cuqENfVq5dw0ASx5+SOEglVK7%0AYSUplzIEg0jOajkhLdVWpmBYMP+W1CWIBKVwkCyVKRgAdt1999QliAQVbVjJzLYALgf2BTYAp7j7%0Asqbr3wF8ExgDPAd8zN3/NtLjaVipPsoWDCI5y3FYaRYwzt0PAb4IXDBwhZmNAeYCc9z9MOAOYOeI%0AtUhJlDUYLjz/vNQliATVdjiY2bZm1sm5EAc2+rj7YuCgpuv2BFYDnzOzXwKT3d07eGypoLIGA8DZ%0A534pdQkiQbUMBzObbmbfMbM/As8CT5vZSjP7tplNH+WxtwWaDxDYZGYDu85uDxwKXAocBRxpZu/u%0A7leQKihzMAAsffSR1CWIBDViOJjZfwD/G/gJYO4+wd23A6YDtwNfNbOvtXjsF4Dms59s4e4v9f//%0AamCZuz/u7htpdBgHDX4AqYeyBwPAU8uWjX4jkRJpdRDcLe7+/wZf6O7rgPnAfDM7uMX97wGOB242%0AsxnAkqbrngTGm9nu/ZPUhwNXd1y9lFoVQmHAMTNnpS5BJKiWeyuZ2WRgirs/Pujyfdz9N60euGlv%0ApX1o7JE0BzgAGO/uV/UPI321/7p73f2sVo+nvZWqo0qhMGD+vJuYOfuE1GWIDNHt3kojhoOZ/Q/g%0AYhrzBq8AHxoICTN70N0P6LLWrigcyq+KoTDgyd8+oeUzJEsxdmX9N2B/d38rcB7wczOz/uu6ejKp%0Ap4ljx1Y6GACm7aQ9saVaWoXDGHdfBeDuNwCfB+4ws6k0OgmRluoQCgPmXnpx6hJEgmo1rDQP+C1w%0Ambv/vv+ys4EzgDe4+7TCqgTW9b20udC1fX1FPrV0oC5hIFIWMYaVPkFj+GivgQvc/ULgXGBtN08W%0AysA30jp9M82Z/haw5KEHU5cgElRpluxu7hxGo84irjqHwEgWLriNo445LnUZIkME31spN52Ew2AK%0Ai94oDETKK8eF97IxeBhKG7vhDfc66bVqz7wbrk9dgkhQo4aDme0yzGWnR6mmQHXdEI70e9fhd49p%0AxmFHpC5BJKh2ziH9UzM7xt2fNLO9aCy1vRG4Im5paYy2kcx9iEob+TQmTZ6cugSRoNoJh1OAW81s%0AAfAR4N/c/dq4ZeVLG18Zzg+uu4ZTz2y5AoxIqbQ1IW1m+wMLgI+4+y9iFzWcXiakRUTqKsbaSht5%0A7ZHQr+v/78vAK+5e6FdohYPk7IH7F3PgwTNSlyEyRLfh0GpYaVyXtYjUzp/Xr09dgkhQI+6t5O6b%0A3H0TsCMwu///LwbuBlqdx0Gkdo448qjUJYgE1c5xDt8FMLP3A3sD/wu4IGZRImXz/Wt1riqplnbC%0A4Q3ufiONs7rd4O7/DWwVtyyRcjny6GNTlyASVDvh8LKZzaQRDj8xs/cBm+KWJVIuY7WLs1RMO+Fw%0AOvAh4Cx3/wNwMo1jHwp1312LuO+uRQBcd+UVrFm9mlUrV25u5xfduZAH7l8MwNxLLmL9iy/y7NNP%0Ab17WYOGC2zavnHnZBV+nb8MGnvztE8yfdxMAC+bfwtJHHwHgwvPPA2Dpo4+wYP4tQOM0kE/+9gn6%0ANmzgsgu+DjRW4ly44DagsXzCs08/zfoXX2TuJRcBjT1YFt25EGgMO6xauZI1q1dz3ZVX6Heq2O90%0A0/XfrdzvVMW/Ux1/p261e5zDBGBrGkt4vw7Y1d0Xdf2sXdCurCIinYu28J6ZfQV4FngSeAD4HfDN%0Abp5MpKoGvt2JVEU7w0onATsBNwLvBN4LrIxZlIiIpNVOODzn7muBR4F93X0hsEPcskTK5ZDDtSqr%0AVEs74bDOzE4EHgRONLODgElxyxIpl4EJRJGqaCccTgF2dPf/Av5A46C4f49alUjJzJz94dQliATV%0AauG9f3L37xZcz4i0t5LkbNXKlUyZOjV1GSJDxNhbSYvTi7TpzjtuT12CSFC1OIe0SGwnzvlk6hJE%0Agmq1ZPfbzOzJYS4fQ+N8DrtFqkmkdBbduVArs0qltAqHZYBWExNpwxvHj09dgkhQrcKhz92fLqwS%0AkRLTWeCkalrNOdxTWBUiJTewkJpIVbTqHO4Y7c5m9n53/8+A9YiU0kdP/kTqErK3tq8v6uNP1LLp%0AQbUKh13N7GfAD4FFwArgJWBn4N3ACcCPo1coUgJrnn+e8dtsk7qMwsTe0Hejl5oULEO1XLLbzN4E%0AnAG8H9iDxkl+lgM/AS5391VFFAk6CE7yNu+G65l90sdTlxFMjhv/HJQxRLo9CK6t8znkQOEgEoeC%0AIIxcg6PbcBhxWMnM/g64lEbHcDdwbv/qrCIyyMIFt3HUMcelLmNUCoJ4Rnptcw2N0bSac7iWxsl9%0ArqIxv/AtYE67D2xmWwCXA/sCG4BT3H3ZMLe7Cnje3b/Y6vFivqnL+seTfEzZId91lRQIaZU1NFqF%0Aw5vd/b0AZnYn8HCHjz0LGOfuh5jZDOACYGbzDczsn4G9gV92+NhBtfrw5P4HlDzsvf8BqUvYTGFQ%0ADsP9nXLa3rQ6zmFz5e6+sfnnNh1G/+6w7r4YOKj5SjM7FDgY+HaHj1uotX19Q/6JDDZwQvmU9P4s%0Av5y2N606h8E6nRDeFljX9PMmM9vS3V8ys6nAl4EPAKVbCD/3xJfinfqZzyZ5XoVB9Q3+Gxe1relk%0A4b039//c7sJ7LwDNO35v4e4v9f//bGB74HYapxzd2syWuvt1HVWfkeY/oIKiflY88zS77bFnYc+n%0AUKivosKiVTj0+k6/BzgeuLl/zmHJwBXufjFwMYCZnQxML3MwDJYq6SWdJQ8/VEg4KBRksFgjGdGO%0Ac2jaW2kfGt3GHOAAYLy7X9V0u5NphEPLvZWeXv+XShznoKCQbigUpFs7j9+62gfBVSUcBigkqmXB%0A/Fs4ZuasKI+tYJBedBsOnUxIS0Cao6iWXXffPfhjKhQkJXUOmVFQCCgYJJxuOwedQzozqfdtlu5c%0AeP55wR5Lf3/JgTqHzKmTqA+FgsSgOYeK0txEOSx99BGmv+3tXd9fwdCdP21YN/qNerT9VhOiP0eO%0AFA4lMrABUUjk56lly7oOBwXDUEVs9NvVaS1VCRMNK5WYQqL86h4MOYVAEVIEh4aVakidRD7mz7uJ%0AmbNP6Og+dQuGugXBcEZ6DXLsNhQOFaCQSG/v/fbv6PZ1CAaFQftyDA2FQ4Ws7etTQCQybaedU5eQ%0ABQVCWMO9nkUFRimPc/jThnVD/kmDjpNIY+6lF7d926r9ffQ5LFZR27/STEg/sHpl8EJzHOcLTZ1E%0AXqoUDAqDfDVv2zQh3YUcx/lC03xEMZY89OCopwqtQjAoEMqh+e+08/itu3qMWofDSFKO88Wi+Yi4%0AVj23kr1TFxGRQqF+aj2s1Isyh4VConhl7RoUCuV34HZTNaxUpMEfmjKFhYaawpt3w/XMPunjw15X%0AxmCoYigsXbsq6uNPnzgl6uMXTeEQSPOHqSxBoaGmcGYcdkTqEoIpYzDE3vCHrKEsIaJwiKBMQaEu%0AIoxJkycPe3mZuoYyhEIOIdCrVr9DTsGhcIisLEGhLqI3P7juGk4986zUZXQt12CoQhh0YqTfN0Vo%0AaEI6gZxDAtRFhFKGriG3UKhbGPSi3cDQhHSJ5N5NqIvo3AP3L+bAg2ekLqMjuQSDAqE7g1+30N1F%0AKcNhuDdTTmN1nRj4gOYWEpqL6Myf169/zc+5dw2pg0GBEF7osCjNsNINyx8OWmiuYZJbSIACohs5%0Ah0PKYFAoFO+kt+ynYaVO5DTx0yzHTkJdxOi+f+3VnDjnk4CCYTgKhfKpbTiMJPY4XrtyDQkFxPCO%0APPrY1CWMKkUw5B4Kj61ZUejz7TVpWqHP1wuFwyhSh0VuIaGAGN7YzF+TooMhp1AoOgBaaaeWXAJE%0A4dChVGGRU0homGmo+fNu5uTTTs9ySKlOwZBTEHRrpN+h6NCo7YR0DEV2FTmEBCggBsstHIoMhhSh%0AUIUw6EU7gaEJ6QwMfDiKCIk/bViXRUBomKnhvrsW8daSHecQUpHBUPdAaDb4tQjZXSgcImj+oMQM%0AilyGmhQQeSqiaygqFBQI7QkZFqUZVvrSr2/tuNBcJnagmG4idUhAvYeZchpSqkowKBR6d95B79Ow%0A0mCjvbGKDI8ihpxyGGqqaxdx3ZVXMOsTn0xdRmFiB0MZQmH5mqfaut1bJu0auZI4Kh0OoxnuDRg7%0AMGIPOSkg0njXrA+kLmGz2F1DzGBIHQrtbvBjPmYuYVLrcBhOzAmewZauXRUtICDtMFPdAmLjxnyG%0AlGKKFQxFh0KMEAhlpNqKDg2Fwyia37QxgiLmcFPqLqJOAXHfz37G8R//n6nLiNo1lDkYcg6Ddg33%0AO8QMjGjhYGZbAJcD+wIbgFPcfVnT9R8FzgZeApYAn3b3l2PVE0LMoIgVEqm7iLoERA7BUDaxQ6EK%0AgTCamIGxRZBHGd4sYJy7HwJ8Ebhg4AozewPwFeBd7v5OYALwvoi1BPfYmhWb/4W0dO2qKN/QUq7E%0AmdNePDGs7evjV7/479RllKpriBUMy9c8tflfXTW/Br28DjHD4TDgDgB3Xwwc1HTdBuBQd/9L/89b%0AAn+LWEtUsUIiNAVEPG8YPz51CdGUIRjqHggxxJxz2BZo3hptMrMt3f2l/uGjVQBmdiYwHvh5qwfr%0A5Q9f1ERO6GGnGENNKechqjzE9PaD3pG6hChyD4ZcA2HF6ieGXDZtuz0TVNK9mOHwArBN089buPtL%0AAz/0z0n8H2BP4EPuHu1ovBSz/wMfglAhoYDI201XXs4Jp3062fOnPrNbO0IGQ9GhMNzGPvZjpA6T%0AmOFwD3A8cLOZzaAx6dzs2zSGl2almoge/AaLERahQiJ0F6GACOv4j1VvQjpk11CmYAgRBCG0qqOI%0A4Ii2fEbT3kr7AGOAOcABNIaQft3/7y5goICL3P3HIz3eR35+SeHrfIQOi1B7OIXsIlLu6lqVgFjb%0A18fKZ59h6o47JashRucQKhxCBUPMUMglELrVKixufM+ZeS2f0d8NnDbo4qVN/x9zMjyI5jdjiKAI%0A2UWog8jLw/fey9QT0oVDaFUPhrKHwWCDf58QnUVpFt5L0TkMJ2Q30WtIqINIL4e9sHLtGkIEg0Kh%0Ad3d/5JKuOofsv73nJuR+1L1+eEIeE6HdXHtz789+mrqEygkZDCtWP1HLYOhF6ZfPWLH6iWSz+gNv%0A3l66iRBDTaGGmTTE1L3tdtghdQnB5NA1hAqGnAJh5bNLh1w2dcfpCSppT2nCodUfud03QKwQCRUS%0AdQ+IMrN99k1dQmWECIaiQmG4DX6M+6cIkdKEQwgxJm2aLV/zlAKiB2XuHr538YV87LNnpy4jC6mX%0A3Y4VDL0GQejnjh0YpZmQPuzGM6MXGjIsep247iUkQk1Up+ogyhYQa/v62NjXx+sT1R1yvij1kFKv%0AXUPoYEgZCN0YLjA0IR3AwKRViDdYr5PWvXzAqjBJXTbPPftM6hKyUJVgWPns0tIFA7xad4jaFQ4j%0ACBkS3cohIFIo495L/pvfpC6h1Hr5nIT6rIbcsOag199F4TCKEG+8XrqI1OO3qbqHsgXEUR/4YOoS%0ApAdVCYSQFA5tChUS3eg2IDS8VJxf3nZr6hKSS/FFptfPZJU6hdBqtbdSCANvxm4nr7vdo6nbPZli%0Anae6CGXae2nH3XZLXUJpdfulKUQwFOH3q1Z3fd83T9kuYCWdKU049PKHjLHLVy8hUcaA0PEPre32%0A1r1Sl1ArvQRDzFDoJQg6ebwiQqM04dCLwW+GkGHR7RHavR4TkUKKgChD9zBx7FguPP88Tj7nC6lL%0AKZ2iz8sQOhhCh0Evzxs6MGo55xB6r4Ruv8V088FIPf8gw1MwFKfbz1vIYPj9qtXJgmEkoWuqZTg0%0ACxUS3U5Yly0gUkxOl2HPpScffyx1CVKAHENhsIEae62z9uEwIFQ3kXtASBzPPvlk6hKSKur9mKpr%0AKEMoDKeXmhUOwyhLQHRDw0txzEx4nENZdxQo6j0fIhjqSOEwgl67iCJWhUzVPWhoaaj5825KXYJE%0AUNdgAIXDqIoMCHUP5bX3fvtnv1dVTKHOjx5aL5/fOgcDKBzaknNA1Kl7yNm0nXZOXULlFXninroH%0AA9TkOIcQBgIi5zM3daLMR07naO6lF3PGOV9g4tixSYbAtt9qggJ7kG6/1MUOhg2/W9vT/bfaZWKg%0ASlorVTiM9kdLeah5K50eKNfpAXK9niRIendGRY5zmD5xioYdI+k1FIZ7nJhBUZphpXbSvHn/3li7%0AnnX7bSSnc9mGom+qr1ry0IOb/7/Ocw9lF3qbseF3azf/iyHmY5cmHLoVIySKWLCriLmHsn1DzHmP%0ApVXPrUxdQul2aS3b8jGdirXRHum5Qj9f5cNhQOiQ6CYgqtg9SMNRxxz3mp/VPdRbkcEw+HlDPXdt%0AwmFAmfZCKHpRMunevBuuT11CMN3uqKB5r4ZUwRC6htqFA4TrInSSEBkw47AjhlyWonso29CSxNNr%0AF1HLcEgp5tBSimMeNCndMGny5NQlVF63J9gqUg5dQyi1Dgd1D+WblM7VD667ZtjLy9o91PEYmDIN%0AOReh1uEAekNIGKeeedaI19VpcrrTeYeq77FUZrUPh9xpUvpVOW9kH7h/ceoSXkPdQ+erGeR6EG0q%0ACgeRAP68fn3L63MOttBi7rVUhnmHqlA4iARwxJFHjXqbogOiLN2DhpbypHBIQAfDVc/3r706dQki%0AQSkcElBrXD1HHn1sW7erS/cQc2K6k89P0fMORa2YWgSFg5RC7mP2YzuoL/ffJZSyHjGtiemGaOFg%0AZluY2ZVmdp+Z/cLMdh90/fFm9qv+60+NVUfZ5T4eqyNyG+bPu7mj2xcZEFWce4jZPfSqKt1DzM5h%0AFjDO3Q8BvghcMHCFmb0e+Bbwj8DfA58ysyT7zelbgoRw8mmnd3yfsgVEN3LpHjS81LmY4XAYcAeA%0Auy8GDmq67q3AMndf4+59wN3A0MVpSqDsZ4Yr+77subjvrkWpS4iuzN1DCmUPiJhngtsWaF54Z5OZ%0AbenuLw1z3YtAy682f/vGg2PClygSxtFHvrur+00YW9zJGHcev3XPj3HgdlO7uNd+PT+vFC9m5/AC%0AsE3zc/UHw3DXbQNUZ8UqEZGSixkO9wDHApjZDGBJ03WPA3uY2WQzG0tjSOm+iLWIiEgHxrzyyitR%0AHtjMtgAuB/YBxgBzgAOA8e5+lZkdD/w7jYC6xt0vi1KIiIh0LFo4iIhIeekgOBERGULhICIiQxS3%0AH12bmuYq9gU2AKe4+7Km6wfmKl6iMVcxN0mhBWjjtfgocDaN12IJ8Gl3fzlFrTGN9jo03e4q4Hl3%0A/2LBJRamjffEO4Bv0pjnew74mLv/LUWtsbXxWpwEnANsorGtuCJJoQUys4OBr7n7Pwy6vOPtZo6d%0AQymOrC5Iq9fiDcBXgHe5+ztpHCfyviRVxjfi6zDAzP4Z2LvowhJo9Z4YA8wF5rj7wEGoOyepshij%0AvS++ARwFvBM4x8wmFVxfoczsX4HvAOMGXd7VdjPHcKjFkdVtavVabAAOdfe/9P+8JVDJb4i0fh0w%0As0OBg4FvF19a4Vq9FnsCq4HPmdkvgcnu7sWXWJiW7wvgNzS+NI2j0UlVfe+b5cAHh7m8q+1mjuEw%0A7JHVI1w36pHVJTfia+HuL7v7KgAzOxMYD/y8+BILMeLrYGZTgS8Dn0lRWAKtPh/bA4cCl9L4xnyk%0AmXV36HY5tHotAB4BHgAeBW5190ofaOvuPwI2DnNVV9vNHMNBR1a/qtVrMbDy7TeA9wAfcveqfjNq%0A9TrMprFRvJ3G0MKJZnZyseUVqtVrsZrGN8TH3X0jjW/Vg79NV8mIr4WZ7QMcB+wK7AK8ycxmF15h%0AHrrabuZRVSUUAAADNUlEQVQYDjqy+lWtXgtoDKOMA2Y1DS9V0Yivg7tf7O4H9k/AfRX4vrtfl6LI%0AgrR6TzwJjG9aHv9wGt+aq6rVa7EO+CvwV3ffBPwRqPScQwtdbTezOwhOR1a/qtVrAfy6/99dvDqW%0AepG7/zhBqVGN9p5out3JwPSa7K000ufj3TRCcgxwr7uflazYyNp4LU4DPgH00RiPP7V/zL2yzGwX%0A4EZ3n2FmJ9LDdjO7cBARkfRyHFYSEZHEFA4iIjKEwkFERIZQOIiIyBAKBxERGSK7hfdEYujfxe8J%0A4DEau/6OBf5AYx2iFf23+SzwO+C9NNbjGQvs3n8faOwqfG0Hz7kl8Dd3H/VzZmb7Ate7+z79Px9M%0A4/iVc9t9PpGQFA5SJ39w981nuzez84FLgA/0L0T2fnc/CvjP/ut3AX7RfJ8YzGwOcB6w+UBGd7/f%0AzD5vZnu5+2Mj31skDoWD1Nki4P39/38G8MPR7mBmO9JY+XIisAPwPXf/kpntD1wJvI7Gkbn/BDzT%0AdL/DgauBo939yabLJ9M4yvfE/sdtdgONJac/2c0vJ9ILzTlILfUvY3wCjSUYoBESi9q460k0hn8O%0ABvYDPtu/FPS/AF9194OAK4AZTc91AHAVcFxzMAC4+/PuPhtYMcxzNYeXSKEUDlInf2dmD5vZwzSW%0Acx5DY7E+gD0YfgM92NeAlWb2BRpr5I8FtgZuA640s+8AfwZu7L/9FjQWwLvD3X/bSbHu/jwwzswm%0AdnI/kRA0rCR18ocW8wcv0zhL1mguBKYBPwD+L3A0MMbdbzSzu4Hjgc/3X35m/30+AnzPzK5290c6%0ArHljf20ihVLnINKwnPbOmvYeGqdh/CGN5aCnAK8zsx8B+/efivLLNBaAA3jZ3f8L+Dfgqv7F4trS%0A3zFsdPcXOvg9RIJQOIg0/AR4Vxu3+w/gB2b2APA54CEaIfEV4Mtm9hBwPo3uodm1NDqT0zuo6R/o%0A33NKpGhalVUEMLMdgJvdPZvTzprZfOBc7coqKahzEAHc/Tngx2Y2K3UtsPnkNU8oGCQVdQ4iIjKE%0AOgcRERlC4SAiIkMoHEREZAiFg4iIDKFwEBGRIRQOIiIyxP8Hq8Aht8409lQAAAAASUVORK5CYII="&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;The bivariate KDE plot makes the probability densities for our two learning tasks clear, and as you would expect, demonstrates that it is not likely for a given case to have a high probability for both tasks simulatenously.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/using-bivariate-kernel-density-estimation-for-plotting-multi-task-classification-results/</guid><pubDate>Sun, 19 Nov 2017 11:32:39 GMT</pubDate></item><item><title>Automating error analysis with RuleFit models</title><link>http://blairhudson.github.io/blog/posts/automating-error-analysis-with-rulefit-models/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;When building machine learning models, the goal is generally to improve the performance of a model based on some performance metric. One of the most simple metrics is &lt;em&gt;error&lt;/em&gt;. Error is simply the inverse of model accuracy - so if a model had 95% accuracy, this would correspond with 5% error.&lt;/p&gt;
&lt;p&gt;There are many ways to improve the performance of a model and subsequently decrease the model error. This includes adding more training observations (rows), enriching training obversations with more features (columns), modifying the model algorithm or &lt;a href="http://blairhudson.github.io/blog/posts/optimising-hyper-parameters-efficiently-with-scikit-optimize/"&gt;optimising the algorithm parameters&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="Reducing-error-by-introducing-new-features"&gt;Reducing error by introducing new features&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/automating-error-analysis-with-rulefit-models/#Reducing-error-by-introducing-new-features"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;In the post linked above we looked at how to optimise the parameters of a given algorithm, so for now we're interested in what we can do with the data itself.&lt;/p&gt;
&lt;p&gt;While there are many ways to create more training observations, this is often infeasible due to consideration of cost (imagine the cost of high-end medical studies) and time (such as waiting for enough events to occur).&lt;/p&gt;
&lt;p&gt;The next option we have is to introduce new features to the observations already in our model. There is often lots of different approaches here too, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;engineering new features based on existing features&lt;/li&gt;
&lt;li&gt;creating new features from available data not already used&lt;/li&gt;
&lt;li&gt;making more data available (such as from external providers)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal of this post is to explore a method not for assessing which approach to take, but for identifying where the gaps are to help you assess all of the options available to you.&lt;/p&gt;
&lt;h3 id="Modelling-error-analysis"&gt;Modelling error analysis&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/automating-error-analysis-with-rulefit-models/#Modelling-error-analysis"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;To get started, let's install an implementation of RuleFit from GitHub using pip:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install git+https://github.com/christophM/rulefit&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we're going to load up a sample data set to work on, partitioning it into data for training our initial model, and data for testing its performance. Note that feature names will be important for this exercise.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [1]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="c1"&gt;# load our data - we also care about feature names&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'_'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;

&lt;span class="c1"&gt;# split data for training and testing&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;train_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# more reproducibility&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;With our data ready, let's build a quick logistic regression model on the training data. We're also going to generate predictions for our test data (as positive probabilities, or the likelihood of the class label being True).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [2]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# define our model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# fit our model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# generate some predictions&lt;/span&gt;
&lt;span class="n"&gt;y_hat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;At the start of this post we discussed model error, so let's now calculate this for our model to see how much room for improvement there is.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [3]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# calculate error on each obversation in the test set&lt;/span&gt;
&lt;span class="n"&gt;y_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;absolute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y_hat&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# is there much room for improvement?&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'model error:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_error&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;model error: 0.0705335674785
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;It looks like there is almost 7% &lt;em&gt;mean absolute error&lt;/em&gt;. Maybe we can find some good leads for improving on this?&lt;/p&gt;
&lt;p&gt;To do so, we're going to create a new model using the &lt;code&gt;RuleFit&lt;/code&gt; class, but instead of targetting the original class label &lt;em&gt;y&lt;/em&gt;, we're going to calculate the &lt;em&gt;absolute error&lt;/em&gt; of each observation.&lt;/p&gt;
&lt;p&gt;The absolute error is the difference between the discrete actual value of &lt;em&gt;y&lt;/em&gt; (0 or 1) and the continuous positive probability we predicted (0.0 to 1.0).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [4]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;rulefit&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RuleFit&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.ensemble&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingRegressor&lt;/span&gt;

&lt;span class="c1"&gt;# define and fit our shiny new RuleFit model&lt;/span&gt;
&lt;span class="n"&gt;generator_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s1"&gt;'max_depth'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# control the complexity of our rules&lt;/span&gt;
    &lt;span class="s1"&gt;'n_estimators'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  
    &lt;span class="s1"&gt;'learning_rate'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'random_state'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1234&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;GradientBoostingRegressor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;generator_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;rf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;RuleFit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[4]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;RuleFit(tree_generator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.003, loss='ls', max_depth=5,
             max_features=None, max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=1000,
             presort='auto', random_state=1234, subsample=1.0, verbose=0,
             warm_start=False))&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;With the &lt;code&gt;RuleFit&lt;/code&gt; model fitted to our errors, we can generate a set of rules that might help us to isolate areas of our data that need enriching with new features.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;RuleFit&lt;/code&gt; actually generates rules for the data having a positive impact on the model, but we can ignore these for error analysis for filtering &lt;code&gt;coef&lt;/code&gt; &amp;gt; 0.&lt;/p&gt;
&lt;p&gt;If we multiply the coefficient and support values calculated by &lt;code&gt;RuleFit&lt;/code&gt;, we can use that as a rough estimate for how much error is due to that subset of the data.&lt;/p&gt;
&lt;p&gt;By summing these estimates, we get an approximate amount of error explained by these rules. This will differ from the above simply because our rules may not perfect fit our errors (that is, our error model has its own error).&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [5]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# get the outputs&lt;/span&gt;
&lt;span class="n"&gt;rules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_rules&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# remove the rules we're not interested in. if the coefficient isn't above 0&lt;/span&gt;
&lt;span class="c1"&gt;# there rule is not a good indicator of an area for improvement&lt;/span&gt;
&lt;span class="n"&gt;rules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;coef&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s1"&gt;'linear'&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;# we can estimate an effect for each rule on the error score from above by &lt;/span&gt;
&lt;span class="c1"&gt;# multiplying the coefficient and support values&lt;/span&gt;
&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'effect'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'coef'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'support'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'modelled error:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'effect'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'unexplained error:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_error&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'effect'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;modelled error: 0.046811390404179753
unexplained error: 0.0237221770743
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Let's take a look at the top 10 rules:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [6]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# display the top 10 rules by effect&lt;/span&gt;
&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'display.max_colwidth'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nlargest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'effect'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[6]:&lt;/div&gt;


&lt;div class="output_html rendered_html output_subarea output_execute_result"&gt;
&lt;div&gt;
&lt;style&gt;
    .dataframe thead tr:only-child th {
        text-align: right;
    }

    .dataframe thead th {
        text-align: left;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
&lt;/style&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;rule&lt;/th&gt;
      &lt;th&gt;type&lt;/th&gt;
      &lt;th&gt;coef&lt;/th&gt;
      &lt;th&gt;support&lt;/th&gt;
      &lt;th&gt;effect&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;883&lt;/th&gt;
      &lt;td&gt;worst_area &amp;lt;= 976.25 &amp;amp; worst_area &amp;gt; 553.299987793 &amp;amp; radius_error &amp;gt; 0.275350004435&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.056166&lt;/td&gt;
      &lt;td&gt;0.216783&lt;/td&gt;
      &lt;td&gt;0.012176&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1176&lt;/th&gt;
      &lt;td&gt;compactness_error &amp;gt; 0.0203649997711 &amp;amp; worst_fractal_dimension &amp;lt;= 0.113150000572 &amp;amp; area_error &amp;gt; 23.2399997711 &amp;amp; radius_error &amp;lt;= 0.339100003242&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.265549&lt;/td&gt;
      &lt;td&gt;0.041958&lt;/td&gt;
      &lt;td&gt;0.011142&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;769&lt;/th&gt;
      &lt;td&gt;worst_area &amp;gt; 548.650024414 &amp;amp; area_error &amp;gt; 23.2350006104 &amp;amp; worst_area &amp;lt;= 976.25&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.028937&lt;/td&gt;
      &lt;td&gt;0.223776&lt;/td&gt;
      &lt;td&gt;0.006475&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;458&lt;/th&gt;
      &lt;td&gt;worst_radius &amp;gt; 15.345000267 &amp;amp; worst_concavity &amp;gt; 0.207249999046 &amp;amp; worst_symmetry &amp;gt; 0.203749999404 &amp;amp; worst_symmetry &amp;lt;= 0.560700058937 &amp;amp; worst_radius &amp;lt;= 16.8100013733&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.101535&lt;/td&gt;
      &lt;td&gt;0.041958&lt;/td&gt;
      &lt;td&gt;0.004260&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1548&lt;/th&gt;
      &lt;td&gt;worst_symmetry &amp;gt; 0.203749999404 &amp;amp; area_error &amp;lt;= 33.375 &amp;amp; worst_symmetry &amp;lt;= 0.560700058937 &amp;amp; worst_perimeter &amp;gt; 100.555000305&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.024403&lt;/td&gt;
      &lt;td&gt;0.153846&lt;/td&gt;
      &lt;td&gt;0.003754&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5361&lt;/th&gt;
      &lt;td&gt;radius_error &amp;gt; 0.344399988651 &amp;amp; symmetry_error &amp;lt;= 0.017725000158 &amp;amp; area_error &amp;gt; 23.2399997711 &amp;amp; worst_perimeter &amp;lt;= 105.199996948&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.133092&lt;/td&gt;
      &lt;td&gt;0.027972&lt;/td&gt;
      &lt;td&gt;0.003723&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;357&lt;/th&gt;
      &lt;td&gt;worst_radius &amp;gt; 15.5699996948 &amp;amp; worst_symmetry &amp;gt; 0.203749999404 &amp;amp; mean_texture &amp;gt; 19.1049995422 &amp;amp; worst_symmetry &amp;lt;= 0.560700058937 &amp;amp; radius_error &amp;lt;= 0.418200016022&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.032694&lt;/td&gt;
      &lt;td&gt;0.069930&lt;/td&gt;
      &lt;td&gt;0.002286&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;57&lt;/th&gt;
      &lt;td&gt;worst_area &amp;gt; 548.650024414 &amp;amp; worst_area &amp;lt;= 1086.0&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.002326&lt;/td&gt;
      &lt;td&gt;0.405594&lt;/td&gt;
      &lt;td&gt;0.000943&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1433&lt;/th&gt;
      &lt;td&gt;worst_area &amp;gt; 548.650024414 &amp;amp; mean_perimeter &amp;gt; 79.2050018311 &amp;amp; mean_texture &amp;gt; 21.1399993896 &amp;amp; area_error &amp;lt;= 36.4049987793&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.012087&lt;/td&gt;
      &lt;td&gt;0.069930&lt;/td&gt;
      &lt;td&gt;0.000845&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;5547&lt;/th&gt;
      &lt;td&gt;worst_area &amp;gt; 744.0 &amp;amp; worst_symmetry &amp;gt; 0.203749999404 &amp;amp; mean_texture &amp;gt; 19.1049995422 &amp;amp; worst_symmetry &amp;lt;= 0.560700058937 &amp;amp; radius_error &amp;lt;= 0.418200016022&lt;/td&gt;
      &lt;td&gt;rule&lt;/td&gt;
      &lt;td&gt;0.004637&lt;/td&gt;
      &lt;td&gt;0.069930&lt;/td&gt;
      &lt;td&gt;0.000324&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;To wrap up, let's produce a report of the top 3 rules, including up to 10 examples from the data to which the rules apply.&lt;/p&gt;
&lt;p&gt;This report can be used in conjunction with subject-matter experitise on the data to isolate areas for feature enrichment, to improve your model!&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [7]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;IPython.core.display&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# prepare a dataframe for use below (we really care about the `query` function)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_names&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'y_error'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_error&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'y_sq_error'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_error&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nlargest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'effect'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'rule:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'rule'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'support:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'support'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'coef:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'coef'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'estimated error effect (support x coef):'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'effect'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    
    &lt;span class="c1"&gt;# it might be useful to compare the local error to the estimated model effect&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'rule MAE:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'rule'&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="s1"&gt;'y_error'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'rule RMSE:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'rule'&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="s1"&gt;'y_sq_error'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    
    &lt;span class="c1"&gt;# we can use the rule to filter the data&lt;/span&gt;
    &lt;span class="n"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'rule'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nlargest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'y_error'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;rule: worst_area &amp;lt;= 976.25 &amp;amp; worst_area &amp;gt; 553.299987793 &amp;amp; radius_error &amp;gt; 0.275350004435
support: 0.21678321678321677
coef: 0.05616597023836435
estimated error effect (support x coef): 0.012175839702023041
rule MAE: 0.2658110370900938
rule RMSE: 0.4155759177730745
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;


&lt;div class="output_html rendered_html output_subarea "&gt;
&lt;div&gt;
&lt;style&gt;
    .dataframe thead tr:only-child th {
        text-align: right;
    }

    .dataframe thead th {
        text-align: left;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
&lt;/style&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;mean_radius&lt;/th&gt;
      &lt;th&gt;mean_texture&lt;/th&gt;
      &lt;th&gt;mean_perimeter&lt;/th&gt;
      &lt;th&gt;mean_area&lt;/th&gt;
      &lt;th&gt;mean_smoothness&lt;/th&gt;
      &lt;th&gt;mean_compactness&lt;/th&gt;
      &lt;th&gt;mean_concavity&lt;/th&gt;
      &lt;th&gt;mean_concave_points&lt;/th&gt;
      &lt;th&gt;mean_symmetry&lt;/th&gt;
      &lt;th&gt;mean_fractal_dimension&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;worst_perimeter&lt;/th&gt;
      &lt;th&gt;worst_area&lt;/th&gt;
      &lt;th&gt;worst_smoothness&lt;/th&gt;
      &lt;th&gt;worst_compactness&lt;/th&gt;
      &lt;th&gt;worst_concavity&lt;/th&gt;
      &lt;th&gt;worst_concave_points&lt;/th&gt;
      &lt;th&gt;worst_symmetry&lt;/th&gt;
      &lt;th&gt;worst_fractal_dimension&lt;/th&gt;
      &lt;th&gt;y_error&lt;/th&gt;
      &lt;th&gt;y_sq_error&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;91&lt;/th&gt;
      &lt;td&gt;11.76&lt;/td&gt;
      &lt;td&gt;18.14&lt;/td&gt;
      &lt;td&gt;75.00&lt;/td&gt;
      &lt;td&gt;431.1&lt;/td&gt;
      &lt;td&gt;0.09968&lt;/td&gt;
      &lt;td&gt;0.05914&lt;/td&gt;
      &lt;td&gt;0.02685&lt;/td&gt;
      &lt;td&gt;0.03515&lt;/td&gt;
      &lt;td&gt;0.1619&lt;/td&gt;
      &lt;td&gt;0.06287&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;85.10&lt;/td&gt;
      &lt;td&gt;553.6&lt;/td&gt;
      &lt;td&gt;0.1137&lt;/td&gt;
      &lt;td&gt;0.07974&lt;/td&gt;
      &lt;td&gt;0.0612&lt;/td&gt;
      &lt;td&gt;0.07160&lt;/td&gt;
      &lt;td&gt;0.1978&lt;/td&gt;
      &lt;td&gt;0.06915&lt;/td&gt;
      &lt;td&gt;0.974325&lt;/td&gt;
      &lt;td&gt;0.949308&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;83&lt;/th&gt;
      &lt;td&gt;15.37&lt;/td&gt;
      &lt;td&gt;22.76&lt;/td&gt;
      &lt;td&gt;100.20&lt;/td&gt;
      &lt;td&gt;728.2&lt;/td&gt;
      &lt;td&gt;0.09200&lt;/td&gt;
      &lt;td&gt;0.10360&lt;/td&gt;
      &lt;td&gt;0.11220&lt;/td&gt;
      &lt;td&gt;0.07483&lt;/td&gt;
      &lt;td&gt;0.1717&lt;/td&gt;
      &lt;td&gt;0.06097&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;107.50&lt;/td&gt;
      &lt;td&gt;830.9&lt;/td&gt;
      &lt;td&gt;0.1257&lt;/td&gt;
      &lt;td&gt;0.19970&lt;/td&gt;
      &lt;td&gt;0.2846&lt;/td&gt;
      &lt;td&gt;0.14760&lt;/td&gt;
      &lt;td&gt;0.2556&lt;/td&gt;
      &lt;td&gt;0.06828&lt;/td&gt;
      &lt;td&gt;0.929329&lt;/td&gt;
      &lt;td&gt;0.863652&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;89&lt;/th&gt;
      &lt;td&gt;14.60&lt;/td&gt;
      &lt;td&gt;23.29&lt;/td&gt;
      &lt;td&gt;93.97&lt;/td&gt;
      &lt;td&gt;664.7&lt;/td&gt;
      &lt;td&gt;0.08682&lt;/td&gt;
      &lt;td&gt;0.06636&lt;/td&gt;
      &lt;td&gt;0.08390&lt;/td&gt;
      &lt;td&gt;0.05271&lt;/td&gt;
      &lt;td&gt;0.1627&lt;/td&gt;
      &lt;td&gt;0.05416&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;102.20&lt;/td&gt;
      &lt;td&gt;758.2&lt;/td&gt;
      &lt;td&gt;0.1312&lt;/td&gt;
      &lt;td&gt;0.15810&lt;/td&gt;
      &lt;td&gt;0.2675&lt;/td&gt;
      &lt;td&gt;0.13590&lt;/td&gt;
      &lt;td&gt;0.2477&lt;/td&gt;
      &lt;td&gt;0.06836&lt;/td&gt;
      &lt;td&gt;0.856296&lt;/td&gt;
      &lt;td&gt;0.733242&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;47&lt;/th&gt;
      &lt;td&gt;13.80&lt;/td&gt;
      &lt;td&gt;15.79&lt;/td&gt;
      &lt;td&gt;90.43&lt;/td&gt;
      &lt;td&gt;584.1&lt;/td&gt;
      &lt;td&gt;0.10070&lt;/td&gt;
      &lt;td&gt;0.12800&lt;/td&gt;
      &lt;td&gt;0.07789&lt;/td&gt;
      &lt;td&gt;0.05069&lt;/td&gt;
      &lt;td&gt;0.1662&lt;/td&gt;
      &lt;td&gt;0.06566&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;110.30&lt;/td&gt;
      &lt;td&gt;812.4&lt;/td&gt;
      &lt;td&gt;0.1411&lt;/td&gt;
      &lt;td&gt;0.35420&lt;/td&gt;
      &lt;td&gt;0.2779&lt;/td&gt;
      &lt;td&gt;0.13830&lt;/td&gt;
      &lt;td&gt;0.2589&lt;/td&gt;
      &lt;td&gt;0.10300&lt;/td&gt;
      &lt;td&gt;0.832785&lt;/td&gt;
      &lt;td&gt;0.693531&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;137&lt;/th&gt;
      &lt;td&gt;14.22&lt;/td&gt;
      &lt;td&gt;27.85&lt;/td&gt;
      &lt;td&gt;92.55&lt;/td&gt;
      &lt;td&gt;623.9&lt;/td&gt;
      &lt;td&gt;0.08223&lt;/td&gt;
      &lt;td&gt;0.10390&lt;/td&gt;
      &lt;td&gt;0.11030&lt;/td&gt;
      &lt;td&gt;0.04408&lt;/td&gt;
      &lt;td&gt;0.1342&lt;/td&gt;
      &lt;td&gt;0.06129&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;102.50&lt;/td&gt;
      &lt;td&gt;764.0&lt;/td&gt;
      &lt;td&gt;0.1081&lt;/td&gt;
      &lt;td&gt;0.24260&lt;/td&gt;
      &lt;td&gt;0.3064&lt;/td&gt;
      &lt;td&gt;0.08219&lt;/td&gt;
      &lt;td&gt;0.1890&lt;/td&gt;
      &lt;td&gt;0.07796&lt;/td&gt;
      &lt;td&gt;0.707592&lt;/td&gt;
      &lt;td&gt;0.500686&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;73&lt;/th&gt;
      &lt;td&gt;11.80&lt;/td&gt;
      &lt;td&gt;16.58&lt;/td&gt;
      &lt;td&gt;78.99&lt;/td&gt;
      &lt;td&gt;432.0&lt;/td&gt;
      &lt;td&gt;0.10910&lt;/td&gt;
      &lt;td&gt;0.17000&lt;/td&gt;
      &lt;td&gt;0.16590&lt;/td&gt;
      &lt;td&gt;0.07415&lt;/td&gt;
      &lt;td&gt;0.2678&lt;/td&gt;
      &lt;td&gt;0.07371&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;91.93&lt;/td&gt;
      &lt;td&gt;591.7&lt;/td&gt;
      &lt;td&gt;0.1385&lt;/td&gt;
      &lt;td&gt;0.40920&lt;/td&gt;
      &lt;td&gt;0.4504&lt;/td&gt;
      &lt;td&gt;0.18650&lt;/td&gt;
      &lt;td&gt;0.5774&lt;/td&gt;
      &lt;td&gt;0.10300&lt;/td&gt;
      &lt;td&gt;0.703641&lt;/td&gt;
      &lt;td&gt;0.495111&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;130&lt;/th&gt;
      &lt;td&gt;14.99&lt;/td&gt;
      &lt;td&gt;22.11&lt;/td&gt;
      &lt;td&gt;97.53&lt;/td&gt;
      &lt;td&gt;693.7&lt;/td&gt;
      &lt;td&gt;0.08515&lt;/td&gt;
      &lt;td&gt;0.10250&lt;/td&gt;
      &lt;td&gt;0.06859&lt;/td&gt;
      &lt;td&gt;0.03876&lt;/td&gt;
      &lt;td&gt;0.1944&lt;/td&gt;
      &lt;td&gt;0.05913&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;110.20&lt;/td&gt;
      &lt;td&gt;867.1&lt;/td&gt;
      &lt;td&gt;0.1077&lt;/td&gt;
      &lt;td&gt;0.33450&lt;/td&gt;
      &lt;td&gt;0.3114&lt;/td&gt;
      &lt;td&gt;0.13080&lt;/td&gt;
      &lt;td&gt;0.3163&lt;/td&gt;
      &lt;td&gt;0.09251&lt;/td&gt;
      &lt;td&gt;0.646361&lt;/td&gt;
      &lt;td&gt;0.417782&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;111&lt;/th&gt;
      &lt;td&gt;13.27&lt;/td&gt;
      &lt;td&gt;14.76&lt;/td&gt;
      &lt;td&gt;84.74&lt;/td&gt;
      &lt;td&gt;551.7&lt;/td&gt;
      &lt;td&gt;0.07355&lt;/td&gt;
      &lt;td&gt;0.05055&lt;/td&gt;
      &lt;td&gt;0.03261&lt;/td&gt;
      &lt;td&gt;0.02648&lt;/td&gt;
      &lt;td&gt;0.1386&lt;/td&gt;
      &lt;td&gt;0.05318&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;104.50&lt;/td&gt;
      &lt;td&gt;830.6&lt;/td&gt;
      &lt;td&gt;0.1006&lt;/td&gt;
      &lt;td&gt;0.12380&lt;/td&gt;
      &lt;td&gt;0.1350&lt;/td&gt;
      &lt;td&gt;0.10010&lt;/td&gt;
      &lt;td&gt;0.2027&lt;/td&gt;
      &lt;td&gt;0.06206&lt;/td&gt;
      &lt;td&gt;0.471311&lt;/td&gt;
      &lt;td&gt;0.222134&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;119&lt;/th&gt;
      &lt;td&gt;16.25&lt;/td&gt;
      &lt;td&gt;19.51&lt;/td&gt;
      &lt;td&gt;109.80&lt;/td&gt;
      &lt;td&gt;815.8&lt;/td&gt;
      &lt;td&gt;0.10260&lt;/td&gt;
      &lt;td&gt;0.18930&lt;/td&gt;
      &lt;td&gt;0.22360&lt;/td&gt;
      &lt;td&gt;0.09194&lt;/td&gt;
      &lt;td&gt;0.2151&lt;/td&gt;
      &lt;td&gt;0.06578&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;122.10&lt;/td&gt;
      &lt;td&gt;939.7&lt;/td&gt;
      &lt;td&gt;0.1377&lt;/td&gt;
      &lt;td&gt;0.44620&lt;/td&gt;
      &lt;td&gt;0.5897&lt;/td&gt;
      &lt;td&gt;0.17750&lt;/td&gt;
      &lt;td&gt;0.3318&lt;/td&gt;
      &lt;td&gt;0.09136&lt;/td&gt;
      &lt;td&gt;0.467160&lt;/td&gt;
      &lt;td&gt;0.218238&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;25&lt;/th&gt;
      &lt;td&gt;13.90&lt;/td&gt;
      &lt;td&gt;19.24&lt;/td&gt;
      &lt;td&gt;88.73&lt;/td&gt;
      &lt;td&gt;602.9&lt;/td&gt;
      &lt;td&gt;0.07991&lt;/td&gt;
      &lt;td&gt;0.05326&lt;/td&gt;
      &lt;td&gt;0.02995&lt;/td&gt;
      &lt;td&gt;0.02070&lt;/td&gt;
      &lt;td&gt;0.1579&lt;/td&gt;
      &lt;td&gt;0.05594&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;104.40&lt;/td&gt;
      &lt;td&gt;830.5&lt;/td&gt;
      &lt;td&gt;0.1064&lt;/td&gt;
      &lt;td&gt;0.14150&lt;/td&gt;
      &lt;td&gt;0.1673&lt;/td&gt;
      &lt;td&gt;0.08150&lt;/td&gt;
      &lt;td&gt;0.2356&lt;/td&gt;
      &lt;td&gt;0.07603&lt;/td&gt;
      &lt;td&gt;0.344350&lt;/td&gt;
      &lt;td&gt;0.118577&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;10 rows × 32 columns&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;rule: compactness_error &amp;gt; 0.0203649997711 &amp;amp; worst_fractal_dimension &amp;lt;= 0.113150000572 &amp;amp; area_error &amp;gt; 23.2399997711 &amp;amp; radius_error &amp;lt;= 0.339100003242
support: 0.04195804195804196
coef: 0.26554881554597404
estimated error effect (support x coef): 0.011141908344586324
rule MAE: 0.7144778689742627
rule RMSE: 0.7290404836131931
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;


&lt;div class="output_html rendered_html output_subarea "&gt;
&lt;div&gt;
&lt;style&gt;
    .dataframe thead tr:only-child th {
        text-align: right;
    }

    .dataframe thead th {
        text-align: left;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
&lt;/style&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;mean_radius&lt;/th&gt;
      &lt;th&gt;mean_texture&lt;/th&gt;
      &lt;th&gt;mean_perimeter&lt;/th&gt;
      &lt;th&gt;mean_area&lt;/th&gt;
      &lt;th&gt;mean_smoothness&lt;/th&gt;
      &lt;th&gt;mean_compactness&lt;/th&gt;
      &lt;th&gt;mean_concavity&lt;/th&gt;
      &lt;th&gt;mean_concave_points&lt;/th&gt;
      &lt;th&gt;mean_symmetry&lt;/th&gt;
      &lt;th&gt;mean_fractal_dimension&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;worst_perimeter&lt;/th&gt;
      &lt;th&gt;worst_area&lt;/th&gt;
      &lt;th&gt;worst_smoothness&lt;/th&gt;
      &lt;th&gt;worst_compactness&lt;/th&gt;
      &lt;th&gt;worst_concavity&lt;/th&gt;
      &lt;th&gt;worst_concave_points&lt;/th&gt;
      &lt;th&gt;worst_symmetry&lt;/th&gt;
      &lt;th&gt;worst_fractal_dimension&lt;/th&gt;
      &lt;th&gt;y_error&lt;/th&gt;
      &lt;th&gt;y_sq_error&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;83&lt;/th&gt;
      &lt;td&gt;15.37&lt;/td&gt;
      &lt;td&gt;22.76&lt;/td&gt;
      &lt;td&gt;100.20&lt;/td&gt;
      &lt;td&gt;728.2&lt;/td&gt;
      &lt;td&gt;0.09200&lt;/td&gt;
      &lt;td&gt;0.1036&lt;/td&gt;
      &lt;td&gt;0.11220&lt;/td&gt;
      &lt;td&gt;0.07483&lt;/td&gt;
      &lt;td&gt;0.1717&lt;/td&gt;
      &lt;td&gt;0.06097&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;107.50&lt;/td&gt;
      &lt;td&gt;830.9&lt;/td&gt;
      &lt;td&gt;0.1257&lt;/td&gt;
      &lt;td&gt;0.1997&lt;/td&gt;
      &lt;td&gt;0.2846&lt;/td&gt;
      &lt;td&gt;0.14760&lt;/td&gt;
      &lt;td&gt;0.2556&lt;/td&gt;
      &lt;td&gt;0.06828&lt;/td&gt;
      &lt;td&gt;0.929329&lt;/td&gt;
      &lt;td&gt;0.863652&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;47&lt;/th&gt;
      &lt;td&gt;13.80&lt;/td&gt;
      &lt;td&gt;15.79&lt;/td&gt;
      &lt;td&gt;90.43&lt;/td&gt;
      &lt;td&gt;584.1&lt;/td&gt;
      &lt;td&gt;0.10070&lt;/td&gt;
      &lt;td&gt;0.1280&lt;/td&gt;
      &lt;td&gt;0.07789&lt;/td&gt;
      &lt;td&gt;0.05069&lt;/td&gt;
      &lt;td&gt;0.1662&lt;/td&gt;
      &lt;td&gt;0.06566&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;110.30&lt;/td&gt;
      &lt;td&gt;812.4&lt;/td&gt;
      &lt;td&gt;0.1411&lt;/td&gt;
      &lt;td&gt;0.3542&lt;/td&gt;
      &lt;td&gt;0.2779&lt;/td&gt;
      &lt;td&gt;0.13830&lt;/td&gt;
      &lt;td&gt;0.2589&lt;/td&gt;
      &lt;td&gt;0.10300&lt;/td&gt;
      &lt;td&gt;0.832785&lt;/td&gt;
      &lt;td&gt;0.693531&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;137&lt;/th&gt;
      &lt;td&gt;14.22&lt;/td&gt;
      &lt;td&gt;27.85&lt;/td&gt;
      &lt;td&gt;92.55&lt;/td&gt;
      &lt;td&gt;623.9&lt;/td&gt;
      &lt;td&gt;0.08223&lt;/td&gt;
      &lt;td&gt;0.1039&lt;/td&gt;
      &lt;td&gt;0.11030&lt;/td&gt;
      &lt;td&gt;0.04408&lt;/td&gt;
      &lt;td&gt;0.1342&lt;/td&gt;
      &lt;td&gt;0.06129&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;102.50&lt;/td&gt;
      &lt;td&gt;764.0&lt;/td&gt;
      &lt;td&gt;0.1081&lt;/td&gt;
      &lt;td&gt;0.2426&lt;/td&gt;
      &lt;td&gt;0.3064&lt;/td&gt;
      &lt;td&gt;0.08219&lt;/td&gt;
      &lt;td&gt;0.1890&lt;/td&gt;
      &lt;td&gt;0.07796&lt;/td&gt;
      &lt;td&gt;0.707592&lt;/td&gt;
      &lt;td&gt;0.500686&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;73&lt;/th&gt;
      &lt;td&gt;11.80&lt;/td&gt;
      &lt;td&gt;16.58&lt;/td&gt;
      &lt;td&gt;78.99&lt;/td&gt;
      &lt;td&gt;432.0&lt;/td&gt;
      &lt;td&gt;0.10910&lt;/td&gt;
      &lt;td&gt;0.1700&lt;/td&gt;
      &lt;td&gt;0.16590&lt;/td&gt;
      &lt;td&gt;0.07415&lt;/td&gt;
      &lt;td&gt;0.2678&lt;/td&gt;
      &lt;td&gt;0.07371&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;91.93&lt;/td&gt;
      &lt;td&gt;591.7&lt;/td&gt;
      &lt;td&gt;0.1385&lt;/td&gt;
      &lt;td&gt;0.4092&lt;/td&gt;
      &lt;td&gt;0.4504&lt;/td&gt;
      &lt;td&gt;0.18650&lt;/td&gt;
      &lt;td&gt;0.5774&lt;/td&gt;
      &lt;td&gt;0.10300&lt;/td&gt;
      &lt;td&gt;0.703641&lt;/td&gt;
      &lt;td&gt;0.495111&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;130&lt;/th&gt;
      &lt;td&gt;14.99&lt;/td&gt;
      &lt;td&gt;22.11&lt;/td&gt;
      &lt;td&gt;97.53&lt;/td&gt;
      &lt;td&gt;693.7&lt;/td&gt;
      &lt;td&gt;0.08515&lt;/td&gt;
      &lt;td&gt;0.1025&lt;/td&gt;
      &lt;td&gt;0.06859&lt;/td&gt;
      &lt;td&gt;0.03876&lt;/td&gt;
      &lt;td&gt;0.1944&lt;/td&gt;
      &lt;td&gt;0.05913&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;110.20&lt;/td&gt;
      &lt;td&gt;867.1&lt;/td&gt;
      &lt;td&gt;0.1077&lt;/td&gt;
      &lt;td&gt;0.3345&lt;/td&gt;
      &lt;td&gt;0.3114&lt;/td&gt;
      &lt;td&gt;0.13080&lt;/td&gt;
      &lt;td&gt;0.3163&lt;/td&gt;
      &lt;td&gt;0.09251&lt;/td&gt;
      &lt;td&gt;0.646361&lt;/td&gt;
      &lt;td&gt;0.417782&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;119&lt;/th&gt;
      &lt;td&gt;16.25&lt;/td&gt;
      &lt;td&gt;19.51&lt;/td&gt;
      &lt;td&gt;109.80&lt;/td&gt;
      &lt;td&gt;815.8&lt;/td&gt;
      &lt;td&gt;0.10260&lt;/td&gt;
      &lt;td&gt;0.1893&lt;/td&gt;
      &lt;td&gt;0.22360&lt;/td&gt;
      &lt;td&gt;0.09194&lt;/td&gt;
      &lt;td&gt;0.2151&lt;/td&gt;
      &lt;td&gt;0.06578&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;122.10&lt;/td&gt;
      &lt;td&gt;939.7&lt;/td&gt;
      &lt;td&gt;0.1377&lt;/td&gt;
      &lt;td&gt;0.4462&lt;/td&gt;
      &lt;td&gt;0.5897&lt;/td&gt;
      &lt;td&gt;0.17750&lt;/td&gt;
      &lt;td&gt;0.3318&lt;/td&gt;
      &lt;td&gt;0.09136&lt;/td&gt;
      &lt;td&gt;0.467160&lt;/td&gt;
      &lt;td&gt;0.218238&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;6 rows × 32 columns&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;rule: worst_area &amp;gt; 548.650024414 &amp;amp; area_error &amp;gt; 23.2350006104 &amp;amp; worst_area &amp;lt;= 976.25
support: 0.22377622377622378
coef: 0.02893728555291459
estimated error effect (support x coef): 0.006475476487365502
rule MAE: 0.2578116453233828
rule RMSE: 0.40903437023906347
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;


&lt;div class="output_html rendered_html output_subarea "&gt;
&lt;div&gt;
&lt;style&gt;
    .dataframe thead tr:only-child th {
        text-align: right;
    }

    .dataframe thead th {
        text-align: left;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
&lt;/style&gt;
&lt;table border="1" class="dataframe"&gt;
  &lt;thead&gt;
    &lt;tr style="text-align: right;"&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;mean_radius&lt;/th&gt;
      &lt;th&gt;mean_texture&lt;/th&gt;
      &lt;th&gt;mean_perimeter&lt;/th&gt;
      &lt;th&gt;mean_area&lt;/th&gt;
      &lt;th&gt;mean_smoothness&lt;/th&gt;
      &lt;th&gt;mean_compactness&lt;/th&gt;
      &lt;th&gt;mean_concavity&lt;/th&gt;
      &lt;th&gt;mean_concave_points&lt;/th&gt;
      &lt;th&gt;mean_symmetry&lt;/th&gt;
      &lt;th&gt;mean_fractal_dimension&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;worst_perimeter&lt;/th&gt;
      &lt;th&gt;worst_area&lt;/th&gt;
      &lt;th&gt;worst_smoothness&lt;/th&gt;
      &lt;th&gt;worst_compactness&lt;/th&gt;
      &lt;th&gt;worst_concavity&lt;/th&gt;
      &lt;th&gt;worst_concave_points&lt;/th&gt;
      &lt;th&gt;worst_symmetry&lt;/th&gt;
      &lt;th&gt;worst_fractal_dimension&lt;/th&gt;
      &lt;th&gt;y_error&lt;/th&gt;
      &lt;th&gt;y_sq_error&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;91&lt;/th&gt;
      &lt;td&gt;11.76&lt;/td&gt;
      &lt;td&gt;18.14&lt;/td&gt;
      &lt;td&gt;75.00&lt;/td&gt;
      &lt;td&gt;431.1&lt;/td&gt;
      &lt;td&gt;0.09968&lt;/td&gt;
      &lt;td&gt;0.05914&lt;/td&gt;
      &lt;td&gt;0.02685&lt;/td&gt;
      &lt;td&gt;0.03515&lt;/td&gt;
      &lt;td&gt;0.1619&lt;/td&gt;
      &lt;td&gt;0.06287&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;85.10&lt;/td&gt;
      &lt;td&gt;553.6&lt;/td&gt;
      &lt;td&gt;0.1137&lt;/td&gt;
      &lt;td&gt;0.07974&lt;/td&gt;
      &lt;td&gt;0.0612&lt;/td&gt;
      &lt;td&gt;0.07160&lt;/td&gt;
      &lt;td&gt;0.1978&lt;/td&gt;
      &lt;td&gt;0.06915&lt;/td&gt;
      &lt;td&gt;0.974325&lt;/td&gt;
      &lt;td&gt;0.949308&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;83&lt;/th&gt;
      &lt;td&gt;15.37&lt;/td&gt;
      &lt;td&gt;22.76&lt;/td&gt;
      &lt;td&gt;100.20&lt;/td&gt;
      &lt;td&gt;728.2&lt;/td&gt;
      &lt;td&gt;0.09200&lt;/td&gt;
      &lt;td&gt;0.10360&lt;/td&gt;
      &lt;td&gt;0.11220&lt;/td&gt;
      &lt;td&gt;0.07483&lt;/td&gt;
      &lt;td&gt;0.1717&lt;/td&gt;
      &lt;td&gt;0.06097&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;107.50&lt;/td&gt;
      &lt;td&gt;830.9&lt;/td&gt;
      &lt;td&gt;0.1257&lt;/td&gt;
      &lt;td&gt;0.19970&lt;/td&gt;
      &lt;td&gt;0.2846&lt;/td&gt;
      &lt;td&gt;0.14760&lt;/td&gt;
      &lt;td&gt;0.2556&lt;/td&gt;
      &lt;td&gt;0.06828&lt;/td&gt;
      &lt;td&gt;0.929329&lt;/td&gt;
      &lt;td&gt;0.863652&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;89&lt;/th&gt;
      &lt;td&gt;14.60&lt;/td&gt;
      &lt;td&gt;23.29&lt;/td&gt;
      &lt;td&gt;93.97&lt;/td&gt;
      &lt;td&gt;664.7&lt;/td&gt;
      &lt;td&gt;0.08682&lt;/td&gt;
      &lt;td&gt;0.06636&lt;/td&gt;
      &lt;td&gt;0.08390&lt;/td&gt;
      &lt;td&gt;0.05271&lt;/td&gt;
      &lt;td&gt;0.1627&lt;/td&gt;
      &lt;td&gt;0.05416&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;102.20&lt;/td&gt;
      &lt;td&gt;758.2&lt;/td&gt;
      &lt;td&gt;0.1312&lt;/td&gt;
      &lt;td&gt;0.15810&lt;/td&gt;
      &lt;td&gt;0.2675&lt;/td&gt;
      &lt;td&gt;0.13590&lt;/td&gt;
      &lt;td&gt;0.2477&lt;/td&gt;
      &lt;td&gt;0.06836&lt;/td&gt;
      &lt;td&gt;0.856296&lt;/td&gt;
      &lt;td&gt;0.733242&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;47&lt;/th&gt;
      &lt;td&gt;13.80&lt;/td&gt;
      &lt;td&gt;15.79&lt;/td&gt;
      &lt;td&gt;90.43&lt;/td&gt;
      &lt;td&gt;584.1&lt;/td&gt;
      &lt;td&gt;0.10070&lt;/td&gt;
      &lt;td&gt;0.12800&lt;/td&gt;
      &lt;td&gt;0.07789&lt;/td&gt;
      &lt;td&gt;0.05069&lt;/td&gt;
      &lt;td&gt;0.1662&lt;/td&gt;
      &lt;td&gt;0.06566&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;110.30&lt;/td&gt;
      &lt;td&gt;812.4&lt;/td&gt;
      &lt;td&gt;0.1411&lt;/td&gt;
      &lt;td&gt;0.35420&lt;/td&gt;
      &lt;td&gt;0.2779&lt;/td&gt;
      &lt;td&gt;0.13830&lt;/td&gt;
      &lt;td&gt;0.2589&lt;/td&gt;
      &lt;td&gt;0.10300&lt;/td&gt;
      &lt;td&gt;0.832785&lt;/td&gt;
      &lt;td&gt;0.693531&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;137&lt;/th&gt;
      &lt;td&gt;14.22&lt;/td&gt;
      &lt;td&gt;27.85&lt;/td&gt;
      &lt;td&gt;92.55&lt;/td&gt;
      &lt;td&gt;623.9&lt;/td&gt;
      &lt;td&gt;0.08223&lt;/td&gt;
      &lt;td&gt;0.10390&lt;/td&gt;
      &lt;td&gt;0.11030&lt;/td&gt;
      &lt;td&gt;0.04408&lt;/td&gt;
      &lt;td&gt;0.1342&lt;/td&gt;
      &lt;td&gt;0.06129&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;102.50&lt;/td&gt;
      &lt;td&gt;764.0&lt;/td&gt;
      &lt;td&gt;0.1081&lt;/td&gt;
      &lt;td&gt;0.24260&lt;/td&gt;
      &lt;td&gt;0.3064&lt;/td&gt;
      &lt;td&gt;0.08219&lt;/td&gt;
      &lt;td&gt;0.1890&lt;/td&gt;
      &lt;td&gt;0.07796&lt;/td&gt;
      &lt;td&gt;0.707592&lt;/td&gt;
      &lt;td&gt;0.500686&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;73&lt;/th&gt;
      &lt;td&gt;11.80&lt;/td&gt;
      &lt;td&gt;16.58&lt;/td&gt;
      &lt;td&gt;78.99&lt;/td&gt;
      &lt;td&gt;432.0&lt;/td&gt;
      &lt;td&gt;0.10910&lt;/td&gt;
      &lt;td&gt;0.17000&lt;/td&gt;
      &lt;td&gt;0.16590&lt;/td&gt;
      &lt;td&gt;0.07415&lt;/td&gt;
      &lt;td&gt;0.2678&lt;/td&gt;
      &lt;td&gt;0.07371&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;91.93&lt;/td&gt;
      &lt;td&gt;591.7&lt;/td&gt;
      &lt;td&gt;0.1385&lt;/td&gt;
      &lt;td&gt;0.40920&lt;/td&gt;
      &lt;td&gt;0.4504&lt;/td&gt;
      &lt;td&gt;0.18650&lt;/td&gt;
      &lt;td&gt;0.5774&lt;/td&gt;
      &lt;td&gt;0.10300&lt;/td&gt;
      &lt;td&gt;0.703641&lt;/td&gt;
      &lt;td&gt;0.495111&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;130&lt;/th&gt;
      &lt;td&gt;14.99&lt;/td&gt;
      &lt;td&gt;22.11&lt;/td&gt;
      &lt;td&gt;97.53&lt;/td&gt;
      &lt;td&gt;693.7&lt;/td&gt;
      &lt;td&gt;0.08515&lt;/td&gt;
      &lt;td&gt;0.10250&lt;/td&gt;
      &lt;td&gt;0.06859&lt;/td&gt;
      &lt;td&gt;0.03876&lt;/td&gt;
      &lt;td&gt;0.1944&lt;/td&gt;
      &lt;td&gt;0.05913&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;110.20&lt;/td&gt;
      &lt;td&gt;867.1&lt;/td&gt;
      &lt;td&gt;0.1077&lt;/td&gt;
      &lt;td&gt;0.33450&lt;/td&gt;
      &lt;td&gt;0.3114&lt;/td&gt;
      &lt;td&gt;0.13080&lt;/td&gt;
      &lt;td&gt;0.3163&lt;/td&gt;
      &lt;td&gt;0.09251&lt;/td&gt;
      &lt;td&gt;0.646361&lt;/td&gt;
      &lt;td&gt;0.417782&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;111&lt;/th&gt;
      &lt;td&gt;13.27&lt;/td&gt;
      &lt;td&gt;14.76&lt;/td&gt;
      &lt;td&gt;84.74&lt;/td&gt;
      &lt;td&gt;551.7&lt;/td&gt;
      &lt;td&gt;0.07355&lt;/td&gt;
      &lt;td&gt;0.05055&lt;/td&gt;
      &lt;td&gt;0.03261&lt;/td&gt;
      &lt;td&gt;0.02648&lt;/td&gt;
      &lt;td&gt;0.1386&lt;/td&gt;
      &lt;td&gt;0.05318&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;104.50&lt;/td&gt;
      &lt;td&gt;830.6&lt;/td&gt;
      &lt;td&gt;0.1006&lt;/td&gt;
      &lt;td&gt;0.12380&lt;/td&gt;
      &lt;td&gt;0.1350&lt;/td&gt;
      &lt;td&gt;0.10010&lt;/td&gt;
      &lt;td&gt;0.2027&lt;/td&gt;
      &lt;td&gt;0.06206&lt;/td&gt;
      &lt;td&gt;0.471311&lt;/td&gt;
      &lt;td&gt;0.222134&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;119&lt;/th&gt;
      &lt;td&gt;16.25&lt;/td&gt;
      &lt;td&gt;19.51&lt;/td&gt;
      &lt;td&gt;109.80&lt;/td&gt;
      &lt;td&gt;815.8&lt;/td&gt;
      &lt;td&gt;0.10260&lt;/td&gt;
      &lt;td&gt;0.18930&lt;/td&gt;
      &lt;td&gt;0.22360&lt;/td&gt;
      &lt;td&gt;0.09194&lt;/td&gt;
      &lt;td&gt;0.2151&lt;/td&gt;
      &lt;td&gt;0.06578&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;122.10&lt;/td&gt;
      &lt;td&gt;939.7&lt;/td&gt;
      &lt;td&gt;0.1377&lt;/td&gt;
      &lt;td&gt;0.44620&lt;/td&gt;
      &lt;td&gt;0.5897&lt;/td&gt;
      &lt;td&gt;0.17750&lt;/td&gt;
      &lt;td&gt;0.3318&lt;/td&gt;
      &lt;td&gt;0.09136&lt;/td&gt;
      &lt;td&gt;0.467160&lt;/td&gt;
      &lt;td&gt;0.218238&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;25&lt;/th&gt;
      &lt;td&gt;13.90&lt;/td&gt;
      &lt;td&gt;19.24&lt;/td&gt;
      &lt;td&gt;88.73&lt;/td&gt;
      &lt;td&gt;602.9&lt;/td&gt;
      &lt;td&gt;0.07991&lt;/td&gt;
      &lt;td&gt;0.05326&lt;/td&gt;
      &lt;td&gt;0.02995&lt;/td&gt;
      &lt;td&gt;0.02070&lt;/td&gt;
      &lt;td&gt;0.1579&lt;/td&gt;
      &lt;td&gt;0.05594&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;104.40&lt;/td&gt;
      &lt;td&gt;830.5&lt;/td&gt;
      &lt;td&gt;0.1064&lt;/td&gt;
      &lt;td&gt;0.14150&lt;/td&gt;
      &lt;td&gt;0.1673&lt;/td&gt;
      &lt;td&gt;0.08150&lt;/td&gt;
      &lt;td&gt;0.2356&lt;/td&gt;
      &lt;td&gt;0.07603&lt;/td&gt;
      &lt;td&gt;0.344350&lt;/td&gt;
      &lt;td&gt;0.118577&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;10 rows × 32 columns&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;✨&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/automating-error-analysis-with-rulefit-models/</guid><pubDate>Wed, 27 Sep 2017 13:01:51 GMT</pubDate></item><item><title>Introduction to Classification using Logistic Regression with Scikit-Learn</title><link>http://blairhudson.github.io/blog/posts/introduction-to-classification-using-logistic-regression-with-scikit-learn/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;This post, including the source &lt;code&gt;.ipynb&lt;/code&gt; notebook file, will be used as a basis for other topics. You can obtain a copy of the source by clicking the &lt;em&gt;Source&lt;/em&gt; link at the post of this post.&lt;/p&gt;
&lt;p&gt;To keep things simple, we're going to utilise one of the &lt;a href="http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets"&gt;many toy datasets&lt;/a&gt; built into Scikit-Learn! (And yes, it is a &lt;a href="https://goo.gl/U2Uwz2"&gt;real dataset&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;We're also not going to explain &lt;em&gt;how&lt;/em&gt; Scikit-Learn's &lt;code&gt;LogisticRegression&lt;/code&gt; is implemented in this post.&lt;/p&gt;
&lt;p&gt;To structure our code, we will define our model in two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The code we need to fit our model&lt;/li&gt;
&lt;li&gt;The code we need to use our fitted model to generate predictions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When it comes to model building, these are the two main functional components - so, and for reasons which will be explained in other posts, we're going to build a Python class called &lt;code&gt;CustomModel&lt;/code&gt;, with a function for each of these components:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [1]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CustomModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

        &lt;span class="c1"&gt;# LogisticRegression implements a number of parameters, you can read about them here:&lt;/span&gt;
        &lt;span class="c1"&gt;# http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html&lt;/span&gt;
        &lt;span class="c1"&gt;#&lt;/span&gt;
        &lt;span class="c1"&gt;# With the exception of `random_state`, each of these are the defaults.&lt;/span&gt;
        
        &lt;span class="n"&gt;model_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="s1"&gt;'penalty'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'l2'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'dual'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'tol'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'C'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'fit_intercept'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'intercept_scaling'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'class_weight'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'random_state'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Fixed to 1234 for reproducibility&lt;/span&gt;
            &lt;span class="s1"&gt;'solver'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'liblinear'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'max_iter'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'multi_class'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'ovr'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'verbose'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'warm_start'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'n_jobs'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;model_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt; &lt;span class="c1"&gt;# fun fact: returning self enables method chaining i.e. .fit().predict()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        
        &lt;span class="c1"&gt;# We only want to output the positive case (the second column returned by `predict_proba`:&lt;/span&gt;
    
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict_proba&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)[:,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Now we're ready to use our model!&lt;/p&gt;
&lt;p&gt;In the next section we're going to load the sample data discussed above, and divide it into two portions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;75% for model fitting&lt;/li&gt;
&lt;li&gt;25% for predictions&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [2]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;train_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# more reproducibility&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Now we have everything we need, lets load up our model, fit it with the training data, and generate some predictions:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [4]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# load our model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CustomModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# fit our model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# generate some predictions&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[4]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;array([  9.25168417e-01,   9.99922130e-01,   9.53635418e-01,
         9.88416588e-01,   9.97542577e-01,   9.95232506e-01,
         4.60659258e-02,   9.98390194e-01,   6.59002902e-10,
         2.76899836e-06,   8.30718694e-10,   9.63993586e-01,
         9.94157890e-01,   9.50980576e-01,   9.96974859e-01,
         6.97038792e-10,   9.99809391e-01,   9.96431765e-01,
         9.99363563e-01,   8.43800531e-06,   9.95502414e-01,
         7.77576547e-03,   1.12727716e-09,   3.40904102e-17,
         3.68627970e-09,   6.55649762e-01,   3.51723839e-03,
         9.97326888e-01,   9.98785233e-01,   9.97552026e-01,
         9.86350517e-01,   9.98844211e-01,   5.70842717e-04,
         9.87742427e-01,   9.19814189e-01,   9.78443649e-01,
         9.92882821e-01,   1.14676290e-02,   1.48817234e-01,
         9.98733024e-01,   4.13813658e-05,   9.93177003e-01,
         1.72319657e-10,   8.54534408e-01,   8.81187668e-01,
         9.97568264e-01,   9.98086681e-01,   8.32784885e-01,
         4.49929586e-11,   8.89087737e-01,   9.28259947e-01,
         9.91244116e-01,   9.94876558e-01,   1.51106510e-08,
         2.60668778e-01,   9.99597520e-01,   9.98940073e-01,
         9.99968817e-01,   9.91318570e-01,   8.29369844e-03,
         9.93238377e-01,   9.92431535e-01,   9.29775117e-01,
         9.99271713e-01,   9.96474598e-01,   2.41572863e-04,
         1.51376226e-11,   9.97330558e-01,   9.98831771e-01,
         4.79400697e-01,   9.99798779e-01,   3.57307727e-07,
         9.99656809e-01,   7.03641088e-01,   9.98247027e-01,
         9.96093354e-01,   9.99588791e-01,   2.58369708e-08,
         9.98136922e-01,   7.97865310e-03,   9.99065333e-01,
         9.98470351e-01,   9.94581260e-01,   9.29328694e-01,
         1.41996390e-02,   1.43214384e-04,   3.71155631e-05,
         4.45838811e-06,   9.13207438e-01,   8.56295696e-01,
         9.99467328e-01,   9.74324559e-01,   9.99328632e-01,
         2.91312374e-12,   1.00998256e-01,   9.86992421e-01,
         9.97149193e-01,   9.13815924e-01,   9.98807818e-01,
         9.84005486e-01,   3.17865443e-08,   2.30937811e-11,
         9.98036358e-01,   9.99532884e-01,   1.24075526e-03,
         9.98819765e-01,   9.99752279e-01,   8.53677349e-04,
         1.53192255e-01,   9.30832406e-01,   1.49723823e-05,
         5.28688983e-01,   1.48786146e-03,   9.92804571e-51,
         8.86447353e-01,   9.95516043e-01,   9.98554149e-01,
         1.75078944e-03,   9.99922978e-01,   4.67159833e-01,
         9.99825913e-01,   9.57716419e-01,   9.95069689e-01,
         9.98728887e-01,   7.49375338e-14,   9.92513330e-01,
         1.49918676e-02,   1.63977226e-02,   9.95785292e-01,
         9.56124754e-01,   3.53639065e-01,   9.96011137e-01,
         7.27728677e-33,   9.97779030e-01,   7.77872222e-02,
         9.90058068e-01,   9.80367925e-01,   2.92408222e-01,
         9.98164180e-01,   1.67926421e-01,   9.99996297e-01,
         6.35631576e-10,   1.06440027e-01])&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/introduction-to-classification-using-logistic-regression-with-scikit-learn/</guid><pubDate>Wed, 20 Sep 2017 11:09:58 GMT</pubDate></item><item><title>Building a utility function wrapper for Scikit-Learn models</title><link>http://blairhudson.github.io/blog/posts/building-a-utility-function-wrapper-for-scikit-learn-models/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;In a previous post we learned how to &lt;a href="http://blairhudson.github.io/blog/posts/accessing-jupyter-notebooks-programatically"&gt;access a notebook programmatically&lt;/a&gt; using the &lt;code&gt;ipynb&lt;/code&gt; package.&lt;/p&gt;
&lt;p&gt;This is very powerful as it allows a data scientist to focus on implementing a model which is re-usable, specifying a &lt;code&gt;fit&lt;/code&gt; and &lt;code&gt;predict&lt;/code&gt; method to provide some structure to their code.&lt;/p&gt;
&lt;p&gt;In this post, we're going to build a utility wrapper which takes the previous code and the following functionality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Serialization, so we don't have to re-fit models if we don't need to&lt;/li&gt;
&lt;li&gt;Scoring, so we can determine how well our model is performing&lt;/li&gt;
&lt;li&gt;Feature importance, so we can determine the predictive power of individual features - and provide insight into feature selection&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="Building-the-wrapper"&gt;Building the wrapper&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/building-a-utility-function-wrapper-for-scikit-learn-models/#Building-the-wrapper"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Here is the code:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [1]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.externals&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;roc_auc_score&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.exceptions&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NotFittedError&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;os.path&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelUtils&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;span class="sd"&gt;            If serialize_path is specified and valid, load the model from disk.&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;serialize_path&lt;/span&gt;
        
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Loaded from'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_fitted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_fitted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;span class="sd"&gt;            Fit our model, saving the model to disk if serialize_path is specified.&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="c1"&gt;# fit our model&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_fitted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;
        
        &lt;span class="c1"&gt;# serialise to path&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Saved to'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;
    
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_fitted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;NotFittedError&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;span class="sd"&gt;            Generates a score for the model based on predicting on X and comparing &lt;/span&gt;
&lt;span class="sd"&gt;            to y_true.&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="n"&gt;y_pred&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;roc_auc_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;feature_importance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;span class="sd"&gt;            To calculate feature importance, we iterate through each feature i, &lt;/span&gt;
&lt;span class="sd"&gt;            generating a model score with all other features zeroed.&lt;/span&gt;
&lt;span class="sd"&gt;            &lt;/span&gt;
&lt;span class="sd"&gt;            If normalize is True, divide the results by the minimum score, such that&lt;/span&gt;
&lt;span class="sd"&gt;            each score represents "N times better than the worst feature".&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__zero_except&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt;
        
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;
    
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__zero_except&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sd"&gt;"""&lt;/span&gt;
&lt;span class="sd"&gt;            A helper function to replace all but the ith column with zeroes, and &lt;/span&gt;
&lt;span class="sd"&gt;            return the result. (There is probably a cleaner way to do this.)&lt;/span&gt;
&lt;span class="sd"&gt;        """&lt;/span&gt;
        &lt;span class="n"&gt;X_copy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;X_i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;X_copy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;X_copy&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X_i&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;X_copy&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h3 id="Using-the-wrapper"&gt;Using the wrapper&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/building-a-utility-function-wrapper-for-scikit-learn-models/#Using-the-wrapper"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Now we have our &lt;code&gt;ModelUtils&lt;/code&gt; wrapper class, lets import &lt;code&gt;CustomModel&lt;/code&gt; &lt;a href="http://blairhudson.github.io/blog/posts/accessing-jupyter-notebooks-programatically"&gt;as before&lt;/a&gt; and put it to work.&lt;/p&gt;
&lt;p&gt;As we instantiate the wrapper, we're specifying &lt;code&gt;test.pkl&lt;/code&gt; in the current directory as the location to serialize the model.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;serialize_path&lt;/code&gt; is configured and valid, the pre-fitted model will be loaded from there, and the &lt;code&gt;predict&lt;/code&gt; function will be immediately available. If configured but the file does not exist, &lt;code&gt;ModelUtils&lt;/code&gt; will serialize to this location after fitting the model.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [2]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ipynb.fs.defs.model&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CustomModel&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ModelUtils&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CustomModel&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;serialize_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'test.pkl'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Let's load up the sample data again, and fit our model and then use it to create some predictions. Note the &lt;code&gt;Saved to test.pkl&lt;/code&gt; output.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [3]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;train_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# more reproducibility&lt;/span&gt;

&lt;span class="c1"&gt;# fit our model (as before)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# generate some predictions (as before)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;# fun fact: the ; character suppresses notebook output&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;Saved to test.pkl
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;h3 id="Model-and-feature-performance"&gt;Model and feature performance&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/building-a-utility-function-wrapper-for-scikit-learn-models/#Model-and-feature-performance"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Finally, here are our two new functions.&lt;/p&gt;
&lt;p&gt;First, let's score the performance of our model. This is using a metric called ROC AUC - we won't explain what that is in this post in any detail, but essentially it is a measure of how well the model can separate each class in &lt;code&gt;y&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Then we will calculate relative feature importance for each of the 30 features in the sample dataset. Based on individual scoring performance, what this means is the the first feature is ~2.12x more powerful than the lowest performing feature, and the best feature is ~35.86x more powerful.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [4]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# score our model&lt;/span&gt;
&lt;span class="n"&gt;auc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AUC:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# calculate relative feature importance&lt;/span&gt;
&lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_importance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Top feature relative performance:'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;importance&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;feature_importance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;AUC: 0.989669421488
Top feature relative performance: 35.855513308
[  2.121673     8.6730038   34.85551331   2.06844106  25.14068441
  29.74904943  33.68060837  34.68441065  24.88973384  15.51711027
   4.66920152  18.64638783   4.85931559  34.7148289   16.36121673
  24.82129278  27.2851711   27.96958175  15.74904943  16.24334601   1.
  29.3269962   35.85551331  35.83269962  26.1634981   30.85931559
  33.77186312  35.12927757  27.24714829  23.27376426]
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;👏&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/building-a-utility-function-wrapper-for-scikit-learn-models/</guid><pubDate>Sun, 17 Sep 2017 11:04:40 GMT</pubDate></item><item><title>Accessing Jupyter notebooks programatically</title><link>http://blairhudson.github.io/blog/posts/accessing-jupyter-notebooks-programatically/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;In a previous post we &lt;a href="http://blairhudson.github.io/blog/posts/introduction-to-classification-using-logistic-regression-with-scikit-learn"&gt;created a simple classifier&lt;/a&gt; using Scikit-Learn's &lt;code&gt;LogisticRegression&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;As we pieced together our model, we structured the code into a class called &lt;code&gt;CustomModel&lt;/code&gt;, with two functions: &lt;code&gt;fit&lt;/code&gt; and &lt;code&gt;predict&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To start working programatically with the notebook created in that post, you will first need to install the &lt;code&gt;ipynb&lt;/code&gt; package:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install git+https://github.com/blairhudson/ipynb&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;(Note: This is actually a fork of an &lt;a href="https://github.com/ipython/ipynb"&gt;IPython repo&lt;/a&gt;. Unfortunately the master has a bug with parsing tuple-based assignments (e.g. &lt;code&gt;X, y = ...&lt;/code&gt;). A &lt;a href="https://github.com/ipython/ipynb/pull/34"&gt;pull request&lt;/a&gt; has been submitted.)&lt;/p&gt;
&lt;p&gt;Now you're ready to go!&lt;/p&gt;
&lt;h3 id="Using-the-ipynb-package"&gt;Using the ipynb package&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/accessing-jupyter-notebooks-programatically/#Using-the-ipynb-package"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;To simplify things considerably, make sure that you have a copy of &lt;a href="http://blairhudson.github.io/blog/posts/introduction-to-classification-using-logistic-regression-with-scikit-learn/Introduction%20to%20Classification%20using%20Logistic%20Regression%20with%20Scikit-Learn.ipynb"&gt;the source&lt;/a&gt; in the current working directory, and rename it to &lt;code&gt;model.ipynb&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now, thanks to the &lt;code&gt;ipynb&lt;/code&gt; package you can access the &lt;code&gt;CustomModel&lt;/code&gt; class like this:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [5]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;ipynb.fs.defs.model&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CustomModel&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;To prove it, let's generate predictions on the same sample data:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [6]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;

&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;train_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                                    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# more reproducibility&lt;/span&gt;

&lt;span class="c1"&gt;# load our model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CustomModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# fit our model&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# generate some predictions&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[6]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;array([  9.25168417e-01,   9.99922130e-01,   9.53635418e-01,
         9.88416588e-01,   9.97542577e-01,   9.95232506e-01,
         4.60659258e-02,   9.98390194e-01,   6.59002902e-10,
         2.76899836e-06,   8.30718694e-10,   9.63993586e-01,
         9.94157890e-01,   9.50980576e-01,   9.96974859e-01,
         6.97038792e-10,   9.99809391e-01,   9.96431765e-01,
         9.99363563e-01,   8.43800531e-06,   9.95502414e-01,
         7.77576547e-03,   1.12727716e-09,   3.40904102e-17,
         3.68627970e-09,   6.55649762e-01,   3.51723839e-03,
         9.97326888e-01,   9.98785233e-01,   9.97552026e-01,
         9.86350517e-01,   9.98844211e-01,   5.70842717e-04,
         9.87742427e-01,   9.19814189e-01,   9.78443649e-01,
         9.92882821e-01,   1.14676290e-02,   1.48817234e-01,
         9.98733024e-01,   4.13813658e-05,   9.93177003e-01,
         1.72319657e-10,   8.54534408e-01,   8.81187668e-01,
         9.97568264e-01,   9.98086681e-01,   8.32784885e-01,
         4.49929586e-11,   8.89087737e-01,   9.28259947e-01,
         9.91244116e-01,   9.94876558e-01,   1.51106510e-08,
         2.60668778e-01,   9.99597520e-01,   9.98940073e-01,
         9.99968817e-01,   9.91318570e-01,   8.29369844e-03,
         9.93238377e-01,   9.92431535e-01,   9.29775117e-01,
         9.99271713e-01,   9.96474598e-01,   2.41572863e-04,
         1.51376226e-11,   9.97330558e-01,   9.98831771e-01,
         4.79400697e-01,   9.99798779e-01,   3.57307727e-07,
         9.99656809e-01,   7.03641088e-01,   9.98247027e-01,
         9.96093354e-01,   9.99588791e-01,   2.58369708e-08,
         9.98136922e-01,   7.97865310e-03,   9.99065333e-01,
         9.98470351e-01,   9.94581260e-01,   9.29328694e-01,
         1.41996390e-02,   1.43214384e-04,   3.71155631e-05,
         4.45838811e-06,   9.13207438e-01,   8.56295696e-01,
         9.99467328e-01,   9.74324559e-01,   9.99328632e-01,
         2.91312374e-12,   1.00998256e-01,   9.86992421e-01,
         9.97149193e-01,   9.13815924e-01,   9.98807818e-01,
         9.84005486e-01,   3.17865443e-08,   2.30937811e-11,
         9.98036358e-01,   9.99532884e-01,   1.24075526e-03,
         9.98819765e-01,   9.99752279e-01,   8.53677349e-04,
         1.53192255e-01,   9.30832406e-01,   1.49723823e-05,
         5.28688983e-01,   1.48786146e-03,   9.92804571e-51,
         8.86447353e-01,   9.95516043e-01,   9.98554149e-01,
         1.75078944e-03,   9.99922978e-01,   4.67159833e-01,
         9.99825913e-01,   9.57716419e-01,   9.95069689e-01,
         9.98728887e-01,   7.49375338e-14,   9.92513330e-01,
         1.49918676e-02,   1.63977226e-02,   9.95785292e-01,
         9.56124754e-01,   3.53639065e-01,   9.96011137e-01,
         7.27728677e-33,   9.97779030e-01,   7.77872222e-02,
         9.90058068e-01,   9.80367925e-01,   2.92408222e-01,
         9.98164180e-01,   1.67926421e-01,   9.99996297e-01,
         6.35631576e-10,   1.06440027e-01])&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Magic ✨&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/accessing-jupyter-notebooks-programatically/</guid><pubDate>Sun, 17 Sep 2017 10:16:20 GMT</pubDate></item><item><title>Using Jupyter notebooks with Anaconda</title><link>http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Jupyter is a popular data science environment, and Jupyter notebooks (such as &lt;a href="http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/Using%20Jupyter%20notebooks%20with%20Anaconda.ipynb"&gt;the notebook this post&lt;/a&gt; was written with) are a great way to create and share great data science with inline documentation (using Markdown syntax).&lt;/p&gt;
&lt;p&gt;Jupyter is capable of running kernels in many different programming languages, but in this post we're focussed just on Python.&lt;/p&gt;
&lt;h3 id="Installing-Anaconda"&gt;Installing Anaconda&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/#Installing-Anaconda"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;Download Anaconda for your operating system from the &lt;a href="https://www.anaconda.com/download"&gt;Anaconda website&lt;/a&gt;. For best compatibility with modern data science packages, I suggest Python 3.6 version or newer.&lt;/li&gt;
&lt;li&gt;Run the downloaded installer and follow the prompts.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="Launching-Jupyter"&gt;Launching Jupyter&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/#Launching-Jupyter"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;Run the following command to launch the Jupyter environment in your current directory:
 &lt;code&gt;jupyter notebook&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;By default, this will open the web interface in your default web browser, and by default at &lt;a href="http://localhost:8888/"&gt;http://localhost:8888/&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Now you can select an existing &lt;code&gt;.ipynb&lt;/code&gt; file from the file navigator to open it, or create a new notebook.&lt;/li&gt;
&lt;/ol&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [1]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"🚀"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;🚀
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/</guid><pubDate>Sat, 16 Sep 2017 10:08:20 GMT</pubDate></item><item><title>Optimising hyper-parameters efficiently with Scikit-Optimize</title><link>http://blairhudson.github.io/blog/posts/optimising-hyper-parameters-efficiently-with-scikit-optimize/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;One of the most well-known techniques for experimenting with various model configurations is &lt;em&gt;Grid Search&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;With grid search, you specify a discrete search space (a parameter grid) of all of the parameter values you would like to test. The search permutes through the grid, testing various combinations until all are exhausted. Basic a specified performance metric (e.g. error), you can select the best parameter combination for your model.&lt;/p&gt;
&lt;p&gt;What's wrong with this?&lt;/p&gt;
&lt;p&gt;If you have a large parameter grid, this doesn't work too well:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [1]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;param_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s1"&gt;'param_a'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'param_b'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'param_c'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;num_searches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prod&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
    
&lt;span class="n"&gt;num_searches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[1]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;27&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;And maybe we want to search over four possible values instead for &lt;code&gt;param_a&lt;/code&gt;, and add two more new parameters:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [2]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;param_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s1"&gt;'param_a'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'param_b'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'param_c'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'param_d'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'param_e'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;num_searches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[2]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;216&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;As you can see from the first grid, there's already 27 combinations to try. Then this jumps to 216 for our larger grid. Depending on the complexity of the model and the amount of data to process, this can very easily become infeasible.&lt;/p&gt;
&lt;p&gt;There are a few approaches to solving this, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;breaking down the search into multiple smaller steps (such as searching &lt;code&gt;param_a&lt;/code&gt; and &lt;code&gt;param_b&lt;/code&gt; first, with defaults for the others, then using the best values to search the remaining parameters - this can be tricky in practice)&lt;/li&gt;
&lt;li&gt;searching the parameter space &lt;a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html"&gt;at random&lt;/a&gt; (which has an additional benefit of discovering better parameter values when random samples are drawn frmo a continuous range)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While Scikit-Learn doesn't provide many more options, some &lt;a href="https://github.com/scikit-optimize/scikit-optimize/blob/master/AUTHORS.md"&gt;clever people&lt;/a&gt; have developed a drop-in replacement for Scikit-Learn's &lt;code&gt;GridSearchCV&lt;/code&gt; and &lt;code&gt;RandomizedSearchCV&lt;/code&gt; called &lt;code&gt;BayesSearchCV&lt;/code&gt; in a package called &lt;em&gt;Scikit-Optimize&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Let's install Scikit-Optimize and implement &lt;code&gt;BayesSearchCV&lt;/code&gt; with a simple example!&lt;/p&gt;
&lt;h3 id="Installing-Scikit-Optimize"&gt;Installing Scikit-Optimize&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/optimising-hyper-parameters-efficiently-with-scikit-optimize/#Installing-Scikit-Optimize"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Assuming you already have already &lt;a href="http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/"&gt;installed Anaconda and Jupyter&lt;/a&gt;, you will need to do the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pip install scikit-optimize&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have trouble installing, you may first need to run the following to install one of Scikit-Optmize's dependencies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pip install scikit-garden&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="Implementing-BayesSearchCV"&gt;Implementing BayesSearchCV&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/optimising-hyper-parameters-efficiently-with-scikit-optimize/#Implementing-BayesSearchCV"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;p&gt;Here's an example implementation using a sample dataset and &lt;a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html"&gt;Logistic Regression&lt;/a&gt;.&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [3]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;warnings&lt;/span&gt;
&lt;span class="n"&gt;warnings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filterwarnings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ignore'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;skopt&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BayesSearchCV&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.datasets&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.model_selection&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;sklearn.linear_model&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;

&lt;span class="c1"&gt;# prep some sample data&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_breast_cancer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;return_X_y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# we're using a logistic regression model&lt;/span&gt;
&lt;span class="n"&gt;clf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# this is our parameter grid&lt;/span&gt;
&lt;span class="n"&gt;param_grid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s1"&gt;'solver'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'liblinear'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'saga'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  
    &lt;span class="s1"&gt;'penalty'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'l1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'l2'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="s1"&gt;'tol'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e-5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1e-3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'log-uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'C'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e-5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'log-uniform'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'fit_intercept'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# set up our optimiser to find the best params in 30 searches&lt;/span&gt;
&lt;span class="n"&gt;opt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;BayesSearchCV&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;param_grid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_iter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [4]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Best params achieve a test score of'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_test&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s1"&gt;':'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;

&lt;div class="output_subarea output_stream output_stdout output_text"&gt;
&lt;pre&gt;Best params achieve a test score of 0.958041958042 :
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[4]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;{'C': 100.0,
 'fit_intercept': True,
 'penalty': 'l1',
 'solver': 'liblinear',
 'tol': 0.00094035472283658726}&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;By increasing the value of &lt;code&gt;n_iter&lt;/code&gt;, you can continue the search to find better parameter combinations. You can also use the optimiser for prediction, by calling &lt;code&gt;.predict()&lt;/code&gt; or &lt;code&gt;.predict_proba()&lt;/code&gt; for probabilities, or extract and use the best one standalone:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [5]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_estimator_&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[5]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;LogisticRegression(C=100.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l1', random_state=1234, solver='liblinear',
          tol=0.00094035472283658726, verbose=0, warm_start=False)&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;You may also find it useful to re-use the best parameters programatically to define an equivalent model:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [6]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;LogisticRegression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;opt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;best_params_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt output_prompt"&gt;Out[6]:&lt;/div&gt;



&lt;div class="output_text output_subarea output_execute_result"&gt;
&lt;pre&gt;LogisticRegression(C=100.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l1', random_state=None, solver='liblinear',
          tol=0.00094035472283658726, verbose=0, warm_start=False)&lt;/pre&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/optimising-hyper-parameters-efficiently-with-scikit-optimize/</guid><pubDate>Sat, 16 Sep 2017 05:59:59 GMT</pubDate></item><item><title>Creating a blog with Jupyter notebooks</title><link>http://blairhudson.github.io/blog/posts/creating-a-blog-with-jupyter-notebooks/</link><dc:creator>Blair Hudson</dc:creator><description>&lt;div tabindex="-1" id="notebook" class="border-box-sizing"&gt;
    &lt;div class="container" id="notebook-container"&gt;

&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;Assuming you already have already &lt;a href="http://blairhudson.github.io/blog/posts/using-jupyter-notebooks-with-anaconda/"&gt;installed Jupyter notebook&lt;/a&gt;, you will need to do the following:&lt;/p&gt;
&lt;h3 id="Installing-and-configuring-a-Nikola-blog"&gt;Installing and configuring a Nikola blog&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/creating-a-blog-with-jupyter-notebooks/#Installing-and-configuring-a-Nikola-blog"&gt;¶&lt;/a&gt;&lt;/h3&gt;&lt;ol&gt;
&lt;li&gt;&lt;p&gt;First you'll need to create a directory structure as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; - /blog
 -- /posts
 -- /output&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/blog&lt;/code&gt; is the root directory for everything you'll be doing with your blog&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/posts&lt;/code&gt; is where you'll store your Jupyter notebooks&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/output&lt;/code&gt; will contain the code generated for your blog&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the following command to install Nikola (the static website generator which will do most of the heavy lifting)&lt;sup&gt;[1]&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pip install --upgrade "Nikola[extras]"&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Change directory to your blog root:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cd blog&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start up Nikola, following the prompts to configure your new blog:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;nikola init .&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Open &lt;code&gt;/blog/conf.py&lt;/code&gt; and change the &lt;code&gt;POSTS&lt;/code&gt; and &lt;code&gt;PAGES&lt;/code&gt; sections to include the lines as follows. This will allow Nikola to treat &lt;code&gt;.ipynb&lt;/code&gt; files as blog posts.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; POSTS = (
     ("posts/*.rst", "posts", "post.tmpl"),
     ("posts/*.md", "posts", "post.tmpl"),
     ("posts/*.txt", "posts", "post.tmpl"),
     ("posts/*.html", "posts", "post.tmpl"),
     ("posts/*.ipynb", "posts", "post.tmpl"),
 )
 PAGES = (
     ("pages/*.rst", "pages", "page.tmpl"),
     ("pages/*.md", "pages", "page.tmpl"),
     ("pages/*.txt", "pages", "page.tmpl"),
     ("pages/*.html", "pages", "page.tmpl"),
     ("pages/*.ipynb", "pages", "page.tmpl"),
 )&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Write your blog post in Jupyter, saving the &lt;code&gt;.ipynb&lt;/code&gt; file to &lt;code&gt;/posts&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You will need to explicitly add the following metadata to your notebook (in the Jupyter menu, select &lt;em&gt;Edit &amp;gt; Edit Notebook Metadata&lt;/em&gt;). Change the metadata to match your post.&lt;sup&gt;[2]&lt;/sup&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; "nikola": {
     "title": "Creating a blog with Jupyter notebooks",
     "slug": "creating-a-blog-with-jupyter-notebooks",
     "date": "2017-09-09 21:09:01 UTC+10:00"
 }&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run &lt;code&gt;nikola build&lt;/code&gt; each time you update your &lt;code&gt;/posts&lt;/code&gt;, which will generate your site and store it in &lt;code&gt;/output&lt;/code&gt;!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you're going to be publishing your blog on Github (like me), you can push the content of &lt;code&gt;/output&lt;/code&gt; to your website repo (&lt;a href="https://github.com/blairhudson/blog"&gt;example&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id="[1]Problems-installing-Nikola?"&gt;&lt;sup&gt;[1]&lt;/sup&gt;Problems installing Nikola?&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/creating-a-blog-with-jupyter-notebooks/#%5B1%5DProblems-installing-Nikola?"&gt;¶&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;I ran into some issues installing Nikola on OS X with Anaconda. Specifically, &lt;code&gt;gcc&lt;/code&gt; in Anaconda was the culprit. Resolution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;conda remove gcc&lt;/code&gt; to uninstall &lt;code&gt;gcc&lt;/code&gt; provided by Anaconda&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This will default to the system &lt;code&gt;gcc&lt;/code&gt;, which you can check by running &lt;code&gt;which gcc&lt;/code&gt; (which should output &lt;code&gt;/usr/bin/gcc&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;If this still doesn't resolve the issue still, you may need to install a more up-to-date &lt;code&gt;gcc&lt;/code&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install &lt;a href="https://brew.sh"&gt;Homebrew&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;brew install gcc&lt;/code&gt; (you may be prompted to install Developer Tools)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;brew unlink gcc&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;brew link --overwrite gcc&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;which gcc&lt;/code&gt; should now show &lt;code&gt;/usr/local/Cellar/gcc/7.2.0&lt;/code&gt;. 👍&lt;/p&gt;
&lt;h4 id="[2]Inferring-Nikola-post-metadata"&gt;&lt;sup&gt;[2]&lt;/sup&gt;Inferring Nikola post metadata&lt;a class="anchor-link" href="http://blairhudson.github.io/blog/posts/creating-a-blog-with-jupyter-notebooks/#%5B2%5DInferring-Nikola-post-metadata"&gt;¶&lt;/a&gt;&lt;/h4&gt;&lt;p&gt;Like me, you probably want as little as possible to come between your latest notebook hack and your awesome new blog.&lt;/p&gt;
&lt;p&gt;Nikola parses Jupyter notebooks with a plugin, which with some modification we can have infer all of the Nikola post metadata automatically. For me, the plugin file was here (though it may differ for you):&lt;/p&gt;
&lt;p&gt;&lt;code&gt;~/anaconda/lib/python3.5/site-packages/nikola/plugins/compile/ipynb.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;To automagically infer the required metadata, you can replace the &lt;code&gt;read_metadata()&lt;/code&gt; function in the file above with the following code:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [5]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;read_metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_metadata_regexp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unslugify_titles&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sd"&gt;"""Read metadata directly from ipynb file.&lt;/span&gt;

&lt;span class="sd"&gt;    As ipynb file support arbitrary metadata as json, the metadata used by Nikola&lt;/span&gt;
&lt;span class="sd"&gt;    will be assume to be in the 'nikola' subfield.&lt;/span&gt;
&lt;span class="sd"&gt;    """&lt;/span&gt;
    &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_req_missing_ipynb&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;lang&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;lang&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LocaleBorg&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_lang&lt;/span&gt;
    &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;translated_source_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"utf8"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;in_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;nb_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nbformat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;in_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_nbformat&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Metadata might not exist in two-file posts or in hand-crafted&lt;/span&gt;
    &lt;span class="c1"&gt;# .ipynb files.&lt;/span&gt;

    &lt;span class="c1"&gt;# infer metadata&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;splitext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;basename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;' '&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'-'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;datetime&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="n"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fromtimestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getctime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'%Y-%m-&lt;/span&gt;&lt;span class="si"&gt;%d&lt;/span&gt;&lt;span class="s1"&gt; %k:%M:%S'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;implicit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;'title'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'slug'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;explicit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nb_json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'metadata'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'nikola'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    
    &lt;span class="c1"&gt;# replace inference with explicit if available&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;implicit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;explicit&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
&lt;div class="cell border-box-sizing text_cell rendered"&gt;
&lt;div class="prompt input_prompt"&gt;
&lt;/div&gt;
&lt;div class="inner_cell"&gt;
&lt;div class="text_cell_render border-box-sizing rendered_html"&gt;
&lt;p&gt;With this small modification, we instruct Nikola to infer the &lt;code&gt;title&lt;/code&gt; and &lt;code&gt;slug&lt;/code&gt; values based on the filename, and the &lt;code&gt;date&lt;/code&gt; value based on the filesystem. Magical! ✨&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; The makers of Nikola have suggested some official methods for achieving this that are built right into the existing workflow:&lt;/p&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="cell border-box-sizing code_cell rendered"&gt;
&lt;div class="input"&gt;
&lt;div class="prompt input_prompt"&gt;In [9]:&lt;/div&gt;
&lt;div class="inner_cell"&gt;
    &lt;div class="input_area"&gt;
&lt;div class=" highlight hl-ipython3"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;%%&lt;/span&gt;&lt;span class="k"&gt;html&lt;/span&gt;
&amp;lt;blockquote class="twitter-tweet" data-conversation="none" data-lang="en"&amp;gt;&amp;lt;p lang="en" dir="ltr"&amp;gt;Titles and slugs can be done via FILE_METADATA_REGEXP, and auto dates are prone to issues.&amp;lt;br&amp;gt;Better: import files with `nikola new_post -i`&amp;lt;/p&amp;gt;&amp;amp;mdash; Nikola Generator (@GetNikola) &amp;lt;a href="https://twitter.com/GetNikola/status/907570254611484672"&amp;gt;September 12, 2017&amp;lt;/a&amp;gt;&amp;lt;/blockquote&amp;gt; &amp;lt;script async src="//platform.twitter.com/widgets.js" charset="utf-8"&amp;gt;&amp;lt;/script&amp;gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div class="output_wrapper"&gt;
&lt;div class="output"&gt;


&lt;div class="output_area"&gt;
&lt;div class="prompt"&gt;&lt;/div&gt;


&lt;div class="output_html rendered_html output_subarea "&gt;
&lt;blockquote class="twitter-tweet" data-conversation="none" data-lang="en"&gt;&lt;p lang="en" dir="ltr"&gt;Titles and slugs can be done via FILE_METADATA_REGEXP, and auto dates are prone to issues.&lt;br&gt;Better: import files with `nikola new_post -i`&lt;/p&gt;— Nikola Generator (@GetNikola) &lt;a href="https://twitter.com/GetNikola/status/907570254611484672"&gt;September 12, 2017&lt;/a&gt;&lt;/blockquote&gt; &lt;script async src="//platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;
&lt;/div&gt;

&lt;/div&gt;

&lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
</description><guid>http://blairhudson.github.io/blog/posts/creating-a-blog-with-jupyter-notebooks/</guid><pubDate>Wed, 13 Sep 2017 11:29:29 GMT</pubDate></item></channel></rss>