
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/ensemble/plot_feature_transformation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_ensemble_plot_feature_transformation.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_ensemble_plot_feature_transformation.py:


===============================================
Feature transformations with ensembles of trees
===============================================

Transform your features into a higher dimensional, sparse space. Then train a
linear model on these features.

First fit an ensemble of trees (totally random trees, a random forest, or
gradient boosted trees) on the training set. Then each leaf of each tree in the
ensemble is assigned a fixed arbitrary feature index in a new feature space.
These leaf indices are then encoded in a one-hot fashion.

Each sample goes through the decisions of each tree of the ensemble and ends up
in one leaf per tree. The sample is encoded by setting feature values for these
leaves to 1 and the other feature values to 0.

The resulting transformer has then learned a supervised, sparse,
high-dimensional categorical embedding of the data.

.. GENERATED FROM PYTHON SOURCE LINES 22-32

.. code-block:: default



    # Author: Tim Head <betatim@gmail.com>
    #
    # License: BSD 3 clause

    from sklearn import set_config

    set_config(display="diagram")








.. GENERATED FROM PYTHON SOURCE LINES 33-42

First, we will create a large dataset and split it into three sets:

- a set to train the ensemble methods which are later used to as a feature
  engineering transformer;
- a set to train the linear model;
- a set to test the linear model.

It is important to split the data in such way to avoid overfitting by leaking
data.

.. GENERATED FROM PYTHON SOURCE LINES 42-55

.. code-block:: default


    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split

    X, y = make_classification(n_samples=80000, random_state=10)

    X_full_train, X_test, y_full_train, y_test = train_test_split(
        X, y, test_size=0.5, random_state=10
    )
    X_train_ensemble, X_train_linear, y_train_ensemble, y_train_linear = train_test_split(
        X_full_train, y_full_train, test_size=0.5, random_state=10
    )








.. GENERATED FROM PYTHON SOURCE LINES 56-58

For each of the ensemble methods, we will use 10 estimators and a maximum
depth of 3 levels.

.. GENERATED FROM PYTHON SOURCE LINES 58-62

.. code-block:: default


    n_estimators = 10
    max_depth = 3








.. GENERATED FROM PYTHON SOURCE LINES 63-65

First, we will start by training the random forest and gradient boosting on
the separated training set

.. GENERATED FROM PYTHON SOURCE LINES 65-78

.. code-block:: default


    from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

    random_forest = RandomForestClassifier(
        n_estimators=n_estimators, max_depth=max_depth, random_state=10
    )
    random_forest.fit(X_train_ensemble, y_train_ensemble)

    gradient_boosting = GradientBoostingClassifier(
        n_estimators=n_estimators, max_depth=max_depth, random_state=10
    )
    _ = gradient_boosting.fit(X_train_ensemble, y_train_ensemble)








.. GENERATED FROM PYTHON SOURCE LINES 79-81

The :class:`~sklearn.ensemble.RandomTreesEmbedding` is an unsupervised method
and thus does not required to be trained independently.

.. GENERATED FROM PYTHON SOURCE LINES 81-88

.. code-block:: default


    from sklearn.ensemble import RandomTreesEmbedding

    random_tree_embedding = RandomTreesEmbedding(
        n_estimators=n_estimators, max_depth=max_depth, random_state=0
    )








.. GENERATED FROM PYTHON SOURCE LINES 89-94

Now, we will create three pipelines that will use the above embedding as
a preprocessing stage.

The random trees embedding can be directly pipelined with the logistic
regression because it is a standard scikit-learn transformer.

.. GENERATED FROM PYTHON SOURCE LINES 94-101

.. code-block:: default


    from sklearn.linear_model import LogisticRegression
    from sklearn.pipeline import make_pipeline

    rt_model = make_pipeline(random_tree_embedding, LogisticRegression(max_iter=1000))
    rt_model.fit(X_train_linear, y_train_linear)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 {color: black;background-color: white;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 pre{padding: 0;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-toggleable {background-color: white;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-estimator:hover {background-color: #d4ebff;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-item {z-index: 1;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-parallel-item:only-child::after {width: 0;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-af7a41ee-c10d-4fa1-89ab-2886def5d006 div.sk-text-repr-fallback {display: none;}</style><div id="sk-af7a41ee-c10d-4fa1-89ab-2886def5d006" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;randomtreesembedding&#x27;,
                     RandomTreesEmbedding(max_depth=3, n_estimators=10,
                                          random_state=0)),
                    (&#x27;logisticregression&#x27;, LogisticRegression(max_iter=1000))])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="64d19a6a-d904-45f7-9b78-a4f6fbcd3fa7" type="checkbox" ><label for="64d19a6a-d904-45f7-9b78-a4f6fbcd3fa7" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;randomtreesembedding&#x27;,
                     RandomTreesEmbedding(max_depth=3, n_estimators=10,
                                          random_state=0)),
                    (&#x27;logisticregression&#x27;, LogisticRegression(max_iter=1000))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="ed311d94-7d3f-4107-9860-2768dc39869e" type="checkbox" ><label for="ed311d94-7d3f-4107-9860-2768dc39869e" class="sk-toggleable__label sk-toggleable__label-arrow">RandomTreesEmbedding</label><div class="sk-toggleable__content"><pre>RandomTreesEmbedding(max_depth=3, n_estimators=10, random_state=0)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="1a13f0c2-76c8-4d91-93a6-af7e2612709d" type="checkbox" ><label for="1a13f0c2-76c8-4d91-93a6-af7e2612709d" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 102-106

Then, we can pipeline random forest or gradient boosting with a logistic
regression. However, the feature transformation will happen by calling the
method `apply`. The pipeline in scikit-learn expects a call to `transform`.
Therefore, we wrapped the call to `apply` within a `FunctionTransformer`.

.. GENERATED FROM PYTHON SOURCE LINES 106-125

.. code-block:: default


    from sklearn.preprocessing import FunctionTransformer
    from sklearn.preprocessing import OneHotEncoder


    def rf_apply(X, model):
        return model.apply(X)


    rf_leaves_yielder = FunctionTransformer(rf_apply, kw_args={"model": random_forest})

    rf_model = make_pipeline(
        rf_leaves_yielder,
        OneHotEncoder(handle_unknown="ignore"),
        LogisticRegression(max_iter=1000),
    )
    rf_model.fit(X_train_linear, y_train_linear)







.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-cfe5af72-563e-4c5a-9595-ca82263d582b {color: black;background-color: white;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b pre{padding: 0;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-toggleable {background-color: white;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-estimator:hover {background-color: #d4ebff;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-item {z-index: 1;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-parallel-item:only-child::after {width: 0;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-cfe5af72-563e-4c5a-9595-ca82263d582b div.sk-text-repr-fallback {display: none;}</style><div id="sk-cfe5af72-563e-4c5a-9595-ca82263d582b" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;functiontransformer&#x27;,
                     FunctionTransformer(func=&lt;function rf_apply at 0x7f38d53a0550&gt;,
                                         kw_args={&#x27;model&#x27;: RandomForestClassifier(max_depth=3,
                                                                                  n_estimators=10,
                                                                                  random_state=10)})),
                    (&#x27;onehotencoder&#x27;, OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                    (&#x27;logisticregression&#x27;, LogisticRegression(max_iter=1000))])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="5cee4607-2fb7-4405-ab03-3124076bf39c" type="checkbox" ><label for="5cee4607-2fb7-4405-ab03-3124076bf39c" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;functiontransformer&#x27;,
                     FunctionTransformer(func=&lt;function rf_apply at 0x7f38d53a0550&gt;,
                                         kw_args={&#x27;model&#x27;: RandomForestClassifier(max_depth=3,
                                                                                  n_estimators=10,
                                                                                  random_state=10)})),
                    (&#x27;onehotencoder&#x27;, OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                    (&#x27;logisticregression&#x27;, LogisticRegression(max_iter=1000))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="af386fbb-bc70-4024-b6c0-f42dc108ac87" type="checkbox" ><label for="af386fbb-bc70-4024-b6c0-f42dc108ac87" class="sk-toggleable__label sk-toggleable__label-arrow">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=&lt;function rf_apply at 0x7f38d53a0550&gt;,
                        kw_args={&#x27;model&#x27;: RandomForestClassifier(max_depth=3,
                                                                 n_estimators=10,
                                                                 random_state=10)})</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="a18e747a-b2a2-4659-8cea-c506ca3500ec" type="checkbox" ><label for="a18e747a-b2a2-4659-8cea-c506ca3500ec" class="sk-toggleable__label sk-toggleable__label-arrow">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="ecccf7cd-6da6-43db-a36e-2fb19809919e" type="checkbox" ><label for="ecccf7cd-6da6-43db-a36e-2fb19809919e" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 126-141

.. code-block:: default

    def gbdt_apply(X, model):
        return model.apply(X)[:, :, 0]


    gbdt_leaves_yielder = FunctionTransformer(
        gbdt_apply, kw_args={"model": gradient_boosting}
    )

    gbdt_model = make_pipeline(
        gbdt_leaves_yielder,
        OneHotEncoder(handle_unknown="ignore"),
        LogisticRegression(max_iter=1000),
    )
    gbdt_model.fit(X_train_linear, y_train_linear)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 {color: black;background-color: white;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 pre{padding: 0;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-toggleable {background-color: white;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-estimator:hover {background-color: #d4ebff;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-item {z-index: 1;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-parallel-item:only-child::after {width: 0;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7 div.sk-text-repr-fallback {display: none;}</style><div id="sk-1d2e2d73-33f3-45e0-96d4-d273974cf4b7" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;functiontransformer&#x27;,
                     FunctionTransformer(func=&lt;function gbdt_apply at 0x7f38d53a0e50&gt;,
                                         kw_args={&#x27;model&#x27;: GradientBoostingClassifier(n_estimators=10,
                                                                                      random_state=10)})),
                    (&#x27;onehotencoder&#x27;, OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                    (&#x27;logisticregression&#x27;, LogisticRegression(max_iter=1000))])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="65964b21-7434-4e15-bd83-540afdc7b16b" type="checkbox" ><label for="65964b21-7434-4e15-bd83-540afdc7b16b" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;functiontransformer&#x27;,
                     FunctionTransformer(func=&lt;function gbdt_apply at 0x7f38d53a0e50&gt;,
                                         kw_args={&#x27;model&#x27;: GradientBoostingClassifier(n_estimators=10,
                                                                                      random_state=10)})),
                    (&#x27;onehotencoder&#x27;, OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)),
                    (&#x27;logisticregression&#x27;, LogisticRegression(max_iter=1000))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="2875f01c-2cc6-47d0-9f3d-2eba7b1743a6" type="checkbox" ><label for="2875f01c-2cc6-47d0-9f3d-2eba7b1743a6" class="sk-toggleable__label sk-toggleable__label-arrow">FunctionTransformer</label><div class="sk-toggleable__content"><pre>FunctionTransformer(func=&lt;function gbdt_apply at 0x7f38d53a0e50&gt;,
                        kw_args={&#x27;model&#x27;: GradientBoostingClassifier(n_estimators=10,
                                                                     random_state=10)})</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="f39be6b7-9d1c-4b2d-abc6-7240bd706941" type="checkbox" ><label for="f39be6b7-9d1c-4b2d-abc6-7240bd706941" class="sk-toggleable__label sk-toggleable__label-arrow">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder(handle_unknown=&#x27;ignore&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="b1450901-6ef4-4b15-9ce9-491f56ba7af2" type="checkbox" ><label for="b1450901-6ef4-4b15-9ce9-491f56ba7af2" class="sk-toggleable__label sk-toggleable__label-arrow">LogisticRegression</label><div class="sk-toggleable__content"><pre>LogisticRegression(max_iter=1000)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 142-143

We can finally show the different ROC curves for all the models.

.. GENERATED FROM PYTHON SOURCE LINES 143-164

.. code-block:: default


    import matplotlib.pyplot as plt
    from sklearn.metrics import RocCurveDisplay

    fig, ax = plt.subplots()

    models = [
        ("RT embedding -> LR", rt_model),
        ("RF", random_forest),
        ("RF embedding -> LR", rf_model),
        ("GBDT", gradient_boosting),
        ("GBDT embedding -> LR", gbdt_model),
    ]

    model_displays = {}
    for name, pipeline in models:
        model_displays[name] = RocCurveDisplay.from_estimator(
            pipeline, X_test, y_test, ax=ax, name=name
        )
    _ = ax.set_title("ROC curve")




.. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_001.png
   :alt: ROC curve
   :srcset: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_001.png
   :class: sphx-glr-single-img





.. GENERATED FROM PYTHON SOURCE LINES 165-172

.. code-block:: default

    fig, ax = plt.subplots()
    for name, pipeline in models:
        model_displays[name].plot(ax=ax)

    ax.set_xlim(0, 0.2)
    ax.set_ylim(0.8, 1)
    _ = ax.set_title("ROC curve (zoomed in at top left)")



.. image-sg:: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_002.png
   :alt: ROC curve (zoomed in at top left)
   :srcset: /auto_examples/ensemble/images/sphx_glr_plot_feature_transformation_002.png
   :class: sphx-glr-single-img






.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.980 seconds)


.. _sphx_glr_download_auto_examples_ensemble_plot_feature_transformation.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_feature_transformation.py <plot_feature_transformation.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_feature_transformation.ipynb <plot_feature_transformation.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
