Airbnb New User Booking Predictions

Introduction

Predict new users' first bookings for their stay in a specific country. Details can be found in a Kaggle competion here.

Data Overview

There are 6 files provided. Two of these files provide background information (countries.csv and age_gender_bkts.csv), while sample_submission_NDF.csv provides an example of how the submission file containing our final predictions should be formatted. The three remaining files are the key ones:

    1. train_users_2.csv – This dataset contains data on Airbnb users, including the destination countries.

    2. test_users.csv – This dataset also contains data on Airbnb users, in the same format as train_users_2.csv, except without the destination country. These are the users for which we will have to make our final predictions.

    3. sessions.csv – This data is supplementary data that can be used to train the model and make the final predictions. It contains information about the actions (e.g. clicked on a listing, updated a wish list, ran a search etc.) taken by the users in both the testing and training datasets above.

A glimpse on the data

In [1]:
import pandas as pd

#import data
tr_datapath = "data/train_users_2.csv"
te_datapath = "data/test_users.csv"
df_train = pd.read_csv(tr_datapath, header = 0, index_col = None)
df_test = pd.read_csv(te_datapath, header = 0, index_col = None)
In [2]:
# size of training data
print(df_train.shape)
df_train.head()
(213451, 16)
Out[2]:
id date_account_created timestamp_first_active date_first_booking gender age signup_method signup_flow language affiliate_channel affiliate_provider first_affiliate_tracked signup_app first_device_type first_browser country_destination
0 gxn3p5htnn 2010-06-28 20090319043255 NaN -unknown- NaN facebook 0 en direct direct untracked Web Mac Desktop Chrome NDF
1 820tgsjxq7 2011-05-25 20090523174809 NaN MALE 38.0 facebook 0 en seo google untracked Web Mac Desktop Chrome NDF
2 4ft3gnwmtx 2010-09-28 20090609231247 2010-08-02 FEMALE 56.0 basic 3 en direct direct untracked Web Windows Desktop IE US
3 bjjt8pjhuk 2011-12-05 20091031060129 2012-09-08 FEMALE 42.0 facebook 0 en direct direct untracked Web Mac Desktop Firefox other
4 87mebub9p4 2010-09-14 20091208061105 2010-02-18 -unknown- 41.0 basic 0 en direct direct untracked Web Mac Desktop Chrome US
In [3]:
# size of test data, short of the country_destination column which need to be predicted by our model
print(df_test.shape)
df_test.head()
(62096, 15)
Out[3]:
id date_account_created timestamp_first_active date_first_booking gender age signup_method signup_flow language affiliate_channel affiliate_provider first_affiliate_tracked signup_app first_device_type first_browser
0 5uwns89zht 2014-07-01 20140701000006 NaN FEMALE 35.0 facebook 0 en direct direct untracked Moweb iPhone Mobile Safari
1 jtl0dijy2j 2014-07-01 20140701000051 NaN -unknown- NaN basic 0 en direct direct untracked Moweb iPhone Mobile Safari
2 xx0ulgorjt 2014-07-01 20140701000148 NaN -unknown- NaN basic 0 en direct direct linked Web Windows Desktop Chrome
3 6c6puo6ix0 2014-07-01 20140701000215 NaN -unknown- NaN basic 0 en direct direct linked Web Windows Desktop IE
4 czqhjk3yfe 2014-07-01 20140701000305 NaN -unknown- NaN basic 0 en direct direct untracked Web Mac Desktop Safari

Data cleansing

From the above snapshot of the data in training and test files, a few key pieces of information about the integrity of this dataset can be identified.

  • Firstly, is that at least two columns have missing values – the age column and date_first_booking column.
  • Secondly, most of the columns provided contain categorical data. In fact 11 of the 16 columns provided appear to be categorical.
  • Thirdly, the timestamp_first_active column looks to be a full timestamp, but in the format of a number. For example 20090609231247 looks like it should be 2009-06-09 23:12:47.
  • Fourthly, erroneous values. For some columns, there are values that can be identified as obviously incorrect. This may be a gender column where someone has entered a number, or an age column where someone has entered a value well over 100. These values either need to be corrected (if the correct value can be determined) or assumed to be missing.
  • Lastly, some columns need to be standardized. For example, when collecting data on country of birth, if users are not provided with a standardized list of countries, the data will inevitably contain multiple spellings of the same country (e.g. USA, United States, U.S. and so on). One of the main cleaning tasks often involves standardizing these values to ensure that there is only one version of each value.

First, let's combine the training data and test data into one DataFrame so that we can do data cleansing at the same time.

In [4]:
# combine df_train and df_test into one DataFrame
df_all = pd.concat((df_train, df_test), axis = 0, ignore_index = True, sort = False)

Fix the format of the dates

Because we will use the date information, cleaning the date timestamps is necessary. If we want to do anything with those dates (e.g. subtract one date from another, extract the month of the year from each date etc.), it will be far easier if Python recognizes the values as dates.

In [5]:
# fixing the date_account_created column
df_all['date_account_created'] = pd.to_datetime(df_all['date_account_created'], format='%Y-%m-%d')
# fixing the timestamp_first_active column
df_all['timestamp_first_active'] = pd.to_datetime(df_all['timestamp_first_active'], format='%Y%m%d%H%M%S')
# use the timestamp_first_active column to fill the missing values in data_account_created column
df_all['date_account_created'].fillna(df_all.timestamp_first_active, inplace=True)

Drop inconsistant columns

There are three date fields, but we have only covered two above. The remaining date field, date_first_booking, we are going to drop (remove) from the training data altogether. The reason is that this field is only populated for users who have made a booking. For the data in training_users_2.csv, all the users that have a first booking country have a value in the date_first_booking column and for those that have not made a booking (country_destination = NDF) the value is missing. However, for the data in test_users.csv, the date_first_booking column is empty for all the records.

This means that this column is not going to be useful for predicting which country a booking will be made. What is more, if we leave it in the training dataset when building the model, it will likely increase the chances that the model predicts NDF as those are the records without dates in the training dataset.

In [6]:
# Drop the date_first_booking column
df_all.drop('date_first_booking', axis = 1, inplace = True)

Correct the age column

As noticed earlier, there are several age values that are clearly incorrect (unreasonably high or too low). In this step, we replace these incorrect values with ‘NaN’.

To do this, we create a simple function that intakes a dataframe (table), a column name, a maximum acceptable value (90) and a minimum acceptable value (15). This function will then replace the values in the specified column that are outside the acceptable range with NaN.

Besides, the significant portion of users who did not provide a age value should also be noticed. After we have converted the incorrect age values to NaN, we then change all the NaN values to -1. After testing with other methods of filling the NaN values, including average, median, and most frequent value, using the value -1 yields the best prediction model.

In [7]:
import numpy as np
import warnings
warnings.filterwarnings('ignore')
#avoid comparison with NaN values
df_all['age'].fillna(-1, inplace=True)

# function to clean incorrect value
def remove_outliers(df, column, min_val, max_val):
    col_values = df[column].values
    df[column] = np.where(np.logical_or(col_values<min_val, col_values>max_val), np.NaN, col_values)
    return df

# Fixing age column
df_all = remove_outliers(df=df_all, column='age', min_val=15, max_val=90)
df_all['age'].fillna(-1, inplace=True)

Fill the missing values in column first_affiliate_tracked

And then view the DataFrame.

In [8]:
# Fill missing values in first_affliate_tracked
df_all['first_affiliate_tracked'].fillna(-1, inplace = True)
df_all.tail()
Out[8]:
id date_account_created timestamp_first_active gender age signup_method signup_flow language affiliate_channel affiliate_provider first_affiliate_tracked signup_app first_device_type first_browser country_destination
275542 cv0na2lf5a 2014-09-30 2014-09-30 23:52:32 -unknown- 31.0 basic 0 en direct direct untracked Web Windows Desktop IE NaN
275543 zp8xfonng8 2014-09-30 2014-09-30 23:53:06 -unknown- -1.0 basic 23 ko direct direct untracked Android Android Phone -unknown- NaN
275544 fa6260ziny 2014-09-30 2014-09-30 23:54:08 -unknown- -1.0 basic 0 de direct direct linked Web Windows Desktop Firefox NaN
275545 87k0fy4ugm 2014-09-30 2014-09-30 23:54:30 -unknown- -1.0 basic 0 en sem-brand google omg Web Mac Desktop Safari NaN
275546 9uqfg8txu3 2014-09-30 2014-09-30 23:59:01 FEMALE 49.0 basic 0 en other other tracked-other Web Windows Desktop Chrome NaN

Data transformation and feature extraction

We use data transformation is undertaken with the intention to enhance the ability of the classification algorithm to extract information from the data.

We then use feature extraction to create new features which will help improve the prediction accuracy of our model.

We will first focus on data transformation.

Transforming categorical data - one hot encoding

The first step we are going to undertake is some One Hot Encoding – replacing the categorical fields in the dataset with multiple columns representing one value from each column.

In [9]:
# one hot encoding function
def convert_to_onehot(df, column_to_convert):
    categories = list(df[column_to_convert].drop_duplicates())
    
    for category in categories:
        cat_name = str(category).replace(" ", "_").replace(
            "(", "").replace(")", "").replace("/", "_").replace("-", "").lower()
        col_name = column_to_convert[:5] + '_' + cat_name[:10]
        df[col_name] = 0
        df.loc[(df[column_to_convert] == category), col_name] = 1
    return df

#One hot encoding, and drop the original column from df_all
columns_to_convert = ['gender', 'signup_method', 'signup_flow', 'language', 
                      'affiliate_channel', 'affiliate_provider', 
                      'first_affiliate_tracked', 'signup_app', 
                      'first_device_type', 'first_browser']

for column in columns_to_convert:
    df_all = convert_to_onehot(df_all, column)
    df_all.drop(column, axis = 1, inplace = True)

df_all.head()
Out[9]:
id date_account_created timestamp_first_active age country_destination gende_unknown gende_male gende_female gende_other signu_facebook ... first_theworld_b first_slimbrowse first_epic first_stainless first_googlebot first_outlook_20 first_icedragon first_ibrowse first_nintendo_b first_uc_browser
0 gxn3p5htnn 2010-06-28 2009-03-19 04:32:55 -1.0 NDF 1 0 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
1 820tgsjxq7 2011-05-25 2009-05-23 17:48:09 38.0 NDF 0 1 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
2 4ft3gnwmtx 2010-09-28 2009-06-09 23:12:47 56.0 US 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 bjjt8pjhuk 2011-12-05 2009-10-31 06:01:29 42.0 other 0 0 1 0 1 ... 0 0 0 0 0 0 0 0 0 0
4 87mebub9p4 2010-09-14 2009-12-08 06:11:05 41.0 US 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 157 columns

Creating new featrues

Two fields that can be used to create some new features are the two date fields – date_account_created and timestamp_first_active. We want to extract all the information we can out of these two date fields that could potentially differentiate which country someone will make their first booking in.

In [10]:
# Add new datetime related fields
df_all['day_account_created'] = df_all['date_account_created'].dt.weekday
df_all['month_account_created'] = df_all['date_account_created'].dt.month
df_all['quarter_account_created'] = df_all['date_account_created'].dt.quarter
df_all['year_account_created'] = df_all['date_account_created'].dt.year
df_all['hour_first_active'] = df_all['timestamp_first_active'].dt.hour
df_all['day_first_active'] = df_all['timestamp_first_active'].dt.weekday
df_all['month_first_active'] = df_all['timestamp_first_active'].dt.month
df_all['quarter_first_active'] = df_all['timestamp_first_active'].dt.quarter
df_all['year_first_active'] = df_all['timestamp_first_active'].dt.year
df_all['created_less_active'] = (df_all['date_account_created'] - df_all['timestamp_first_active']).dt.days

# Drop unnecessary columns
columns_to_drop = ['date_account_created', 'timestamp_first_active', 'date_first_booking', 'country_destination']
for column in columns_to_drop:
    if column in df_all.columns:
        df_all.drop(column, axis=1, inplace=True)

print(df_all.shape)
(275547, 164)

Adding new data

We will see what new data we can add from the sessios.csv file. The dataset contains records of user actions, with each row representing one action a user took. Every time a user reviewed search results, updated a wish list or updated their account information, a new row was created in this dataset. Although this data is likely to be very useful for our goal of predicting which country a user will make their first booking in, it also complicates the process of combining this data with the data from training.csv, as it will have to be aggregated so that there is one row per user.

Aside from details of the actions taken, there are a couple of interesting fields in this data. The first is device_type – this field contains the type of device used for the specified action. The second interesting field is the secs_elapsed field. This shows us how long (in seconds) was spent on a particular action.

Import sessions data

In [11]:
# read sessions.csv
session_path = 'data/sessions.csv'
sessions = pd.read_csv(session_path, header = 0, index_col = False)
sessions.head()
Out[11]:
user_id action action_type action_detail device_type secs_elapsed
0 d1mm9tcy42 lookup NaN NaN Windows Desktop 319.0
1 d1mm9tcy42 search_results click view_search_results Windows Desktop 67753.0
2 d1mm9tcy42 lookup NaN NaN Windows Desktop 301.0
3 d1mm9tcy42 search_results click view_search_results Windows Desktop 22141.0
4 d1mm9tcy42 lookup NaN NaN Windows Desktop 435.0

Extract the primary and secondary devides for each user

How do we determine what a user’s primary and secondary devices are? We look at how much time they spent on each device. One thing to note as we make these transformations is that by aggregating the data this way, we are also implicitly removing the missing values.

In [12]:
# Determine primary device
sessions_device = sessions.loc[:, ['user_id', 'device_type', 'secs_elapsed']]
aggregated_lvl1 = sessions_device.groupby(['user_id', 'device_type'],
                                          as_index = False, sort = False).aggregate(np.sum)
index = aggregated_lvl1.groupby(['user_id'], sort = False)[
    'secs_elapsed'].transform(max) == aggregated_lvl1['secs_elapsed']
df_primary = pd.DataFrame(aggregated_lvl1.loc[
    index, ['user_id', 'device_type', 'secs_elapsed']])
df_primary.rename(columns = {
    'device_type': 'primary_device', 
    'secs_elapsed': 'primary_secs'}, inplace = True)
df_primary = convert_to_onehot(df_primary, column_to_convert='primary_device')
df_primary.drop('primary_device', axis = 1, inplace = True)

# Determine secondary device
remaining = aggregated_lvl1.drop(aggregated_lvl1.index[index])
index = remaining.groupby(
    ['user_id'], sort = False)['secs_elapsed'].transform(max) == remaining['secs_elapsed']
df_secondary = pd.DataFrame(
    remaining.loc[index, ['user_id', 'device_type', 'secs_elapsed']])
df_secondary.rename(columns = {
    'device_type': 'secondary_device', 'secs_elapsed': 'secondary secs'}, inplace = True)
df_secondary = convert_to_onehot(df_secondary, 'secondary_device')
df_secondary.drop('secondary_device', axis = 1, inplace = True)

Determine action counts

Determine action counts for the three columns action, action_type, action_detail, to generate 3 sepparate tables. Then we join the three tables together on the basis of the user_id.

In [13]:
# function to count occurrences of value in a column
def convert_to_counts(df, id_col, column_to_convert):
    id_list = df[id_col].drop_duplicates()
    df_counts = df.loc[:,[id_col, column_to_convert]]
    df_counts['count'] = 1
    df_counts = df_counts.groupby(by = [id_col, column_to_convert], 
                                  as_index = False, sort = False).sum()
    
    new_df = df_counts.pivot(index = id_col, columns = column_to_convert, values = 'count')
    new_df = new_df.fillna(0)
    
    #rename columns
    categories = list(df[column_to_convert].drop_duplicates())
    for category in categories:
        cat_name = str(category).replace(
            " ", "_").replace("(", "").replace(")", "").replace(
            "/", "_").replace("-", "").lower()
        col_name = column_to_convert + '_' + cat_name
        new_df.rename(columns = {category: col_name}, inplace = True)
        
    return new_df

# Aggregate and combine actions taken columns
session_actions = sessions.loc[:, ['user_id', 'action', 'action_type', 'action_detail']]
columns_to_convert = ['action', 'action_type', 'action_detail']
session_actions = session_actions.fillna('not provided')

# flag indicating the first loop
first = True

for column in columns_to_convert:
    print("Converting " + column + " column...")
    current_data = convert_to_counts(df = session_actions, id_col = 'user_id', column_to_convert=column)
    if first:
        first = False
        actions_data = current_data
    else:
        actions_data = pd.concat([actions_data, current_data], axis = 1, join = 'inner')
Converting action column...
Converting action_type column...
Converting action_detail column...

Combine data sets

The last steps are to combine the various datasets into one large dataset.

First we combine the two device dataframes (df_primary and df_secondary) to create a device dataframe. Then we combine the device dataframe with the actions dataframe to create a sessions dataframe with all the features we extracted from sessions.csv.

Finally, we combine the sessions dataframe with the user data dataframe.

The first two joins need outer join because not all users have a secondary deivce. The second merge could use an outer join or an inner join, as both the device and actions datasets should contain all users. In this case we use an outer join just to ensure that if a user is missing from one of the datasets (for whatever reason), we will still capture them. For the third step we use an inner join for a key reason – we want our final training dataset to only include users that also have sessions data. Using an inner join here is an easy way to join the datasets and filter for the users with sessions data in one step.

In [14]:
# Combine device datasets
df_primary.set_index('user_id', inplace = True)
df_secondary.set_index('user_id', inplace = True)
device_data = pd.concat([df_primary, df_secondary], axis = 1, join = 'outer', sort = False)

#Combine device and actions datasets
combined_results = pd.concat([device_data, actions_data], axis = 1, join = 'outer', sort = False)
df_sessions = combined_results.fillna(0)

#Combine user and sessions datasets
df_all.set_index('id', inplace = True)
df_all = pd.concat([df_all, df_sessions], axis = 1, join = 'inner', sort = False)
df_all.head()
Out[14]:
age gende_unknown gende_male gende_female gende_other signu_facebook signu_basic signu_google signu_weibo signu_0 ... action_detail_view_resolutions action_detail_view_search_results action_detail_view_security_checks action_detail_view_user_real_names action_detail_wishlist action_detail_wishlist_content_update action_detail_wishlist_note action_detail_your_listings action_detail_your_reservations action_detail_your_trips
d1mm9tcy42 62.0 0 1 0 0 0 1 0 0 1 ... 0.0 23.0 0.0 0.0 0.0 25.0 0.0 0.0 0.0 0.0
yo8nz8bqcq -1.0 1 0 0 0 0 1 0 0 1 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
4grx6yxeby -1.0 1 0 0 0 0 1 0 0 1 ... 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
ncf87guaf0 -1.0 1 0 0 0 0 1 0 0 1 ... 0.0 32.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 0.0
4rvqpxoh3h -1.0 1 0 0 0 0 1 0 0 0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 720 columns

Create a Model

So far, the df_all dataset is ready to be used to train and test a model to predict the first booking destination country for each new user.

The traing algorithm we will use is the popular XGBoost. From my perspective, this method is superior to random forest. It builds a first tree, typically a shallower tree than if you use one single decision tree, and makes predictions using that tree. Then the algorithm finds the records that are misclassified by that tree, and assigns a higher weight of importance to those records than the records that were correctly classified. The algorithm then builds a new tree with these new weightings. This whole process is repeated as many times as specified by the user. Once the specified number of trees have been built, all the trees built during this process are used to classify the records, with a majority rules approach used to determine the final prediction.

Cross validation

To avoid overfitting in our model, I will use 10-fold cross validation. K-fold cross validation involves splitting the training data into k subsets (where k is greater than or equal to 2), training the model using k – 1 of those subsets, then running the model on the subset that was not used in the training process. Because all of the data used in the cross validation process is training data, the correct classification for each record is known and so the predicted category can be compared to the actual category. Once all folds have been completed, the average score across all folds is taken as an estimate of how the model will perform on other data.

In [15]:
# Import libraries
import xgboost as xgb
from sklearn import decomposition
from sklearn.model_selection import GridSearchCV, cross_validate
from sklearn.preprocessing import LabelEncoder

Prepare training data

We previously combined the training and test data to simplify the cleaning and transforming process. To feed these into the model, we also need to split the training data into the three main components – the user IDs (we don’t want to use these for training as they are randomly generated), the features to use for training (X), and the categories we are trying to predict (y).

In [16]:
# Prepare training data for model training
df_train.set_index('id', inplace = True)
df_train = pd.concat([df_train['country_destination'], df_all], axis = 1, join = 'inner', sort = False)

index_train = df_train.index.values
labels = df_train['country_destination']
le = LabelEncoder()
y = le.fit_transform(labels) # training labels
x = df_train.drop('country_destination', axis = 1, inplace = False) # training data

Now that we have our training data ready, we can use GridSearchCV to run the algorithm with a range of parameters, then select the model that has the highest cross validated score based on the chosen measure of a performance (in this case accuracy, but there are a range of metrics we could use based on our needs).

In [17]:
# Grid Search - used to find the best combination of parameters
XGB_model = xgb.XGBClassifier(objective = 'multi:softprob',
                              subsample = 0.8, colsample_bytree = 0.8, seed = 0)
param_grid = {'max_depth': [3,4,5], 
              'learning_rate': [0.1, 0.3], 'n_estimators': [25, 50]}
model = GridSearchCV(estimator = XGB_model,  param_grid = param_grid, 
                     scoring = 'accuracy', verbose = 10, n_jobs = 1,
                     iid = True, refit = True, cv = 3)

# Model training
model.fit(x, y)
print("Best score: %0.3f" % model.best_score_)
print("Best parameters set:")
best_parameters = model.best_estimator_.get_params()
for param_name in sorted(param_grid.keys()):
    print("\t%s: %r" % (param_name, best_parameters[param_name]))
Fitting 3 folds for each of 12 candidates, totalling 36 fits
[CV] learning_rate=0.1, max_depth=3, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.7min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=25, score=0.7030189752549673, total= 2.7min
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  5.1min remaining:    0.0s
[CV] learning_rate=0.1, max_depth=3, n_estimators=25 .................
[CV]  learning_rate=0.1, max_depth=3, n_estimators=25, score=0.6923795976427556, total= 2.4min
[CV] learning_rate=0.1, max_depth=3, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  7.5min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=25, score=0.6961665108337737, total= 2.3min
[CV] learning_rate=0.1, max_depth=3, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed: 12.1min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=50, score=0.7052943805615375, total= 4.5min
[CV] learning_rate=0.1, max_depth=3, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed: 16.7min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=50, score=0.6867709815078236, total= 4.6min
[CV] learning_rate=0.1, max_depth=3, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed: 21.4min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=50, score=0.7000691085003455, total= 4.6min
[CV] learning_rate=0.1, max_depth=4, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed: 24.3min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=4, n_estimators=25, score=0.7052943805615375, total= 2.9min
[CV] learning_rate=0.1, max_depth=4, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed: 27.3min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=4, n_estimators=25, score=0.6942491363543996, total= 2.9min
[CV] learning_rate=0.1, max_depth=4, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed: 30.3min remaining:    0.0s
[CV]  learning_rate=0.1, max_depth=4, n_estimators=25, score=0.7007195414447742, total= 2.9min
[CV] learning_rate=0.1, max_depth=4, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=4, n_estimators=50, score=0.7052943805615375, total= 5.8min
[CV] learning_rate=0.1, max_depth=4, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=4, n_estimators=50, score=0.6862426336110546, total= 5.6min
[CV] learning_rate=0.1, max_depth=4, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=4, n_estimators=50, score=0.7025082320419529, total= 5.7min
[CV] learning_rate=0.1, max_depth=5, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=5, n_estimators=25, score=0.7049286904229816, total= 3.6min
[CV] learning_rate=0.1, max_depth=5, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=5, n_estimators=25, score=0.6935175777281041, total= 3.7min
[CV] learning_rate=0.1, max_depth=5, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=5, n_estimators=25, score=0.703849749989837, total= 3.7min
[CV] learning_rate=0.1, max_depth=5, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=5, n_estimators=50, score=0.704847425947747, total= 7.3min
[CV] learning_rate=0.1, max_depth=5, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=5, n_estimators=50, score=0.6824629140418614, total= 7.3min
[CV] learning_rate=0.1, max_depth=5, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.1, max_depth=5, n_estimators=50, score=0.7048660514655067, total= 7.2min
[CV] learning_rate=0.3, max_depth=3, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=3, n_estimators=25, score=0.7021250660273861, total= 2.3min
[CV] learning_rate=0.3, max_depth=3, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=3, n_estimators=25, score=0.6848201585043691, total= 2.3min
[CV] learning_rate=0.3, max_depth=3, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=3, n_estimators=25, score=0.7008008455628277, total= 2.3min
[CV] learning_rate=0.3, max_depth=3, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=3, n_estimators=50, score=0.701353053512657, total= 4.5min
[CV] learning_rate=0.3, max_depth=3, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=3, n_estimators=50, score=0.6821784190205242, total= 4.6min
[CV] learning_rate=0.3, max_depth=3, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=3, n_estimators=50, score=0.7010447579169885, total= 4.5min
[CV] learning_rate=0.3, max_depth=4, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=4, n_estimators=25, score=0.7028970785421154, total= 3.0min
[CV] learning_rate=0.3, max_depth=4, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=4, n_estimators=25, score=0.6659215606584028, total= 2.9min
[CV] learning_rate=0.3, max_depth=4, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=4, n_estimators=25, score=0.7034838814585959, total= 2.9min
[CV] learning_rate=0.3, max_depth=4, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=4, n_estimators=50, score=0.7018000081264475, total= 5.8min
[CV] learning_rate=0.3, max_depth=4, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=4, n_estimators=50, score=0.6597845966267019, total= 5.9min
[CV] learning_rate=0.3, max_depth=4, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=4, n_estimators=50, score=0.7045001829342656, total= 5.8min
[CV] learning_rate=0.3, max_depth=5, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=5, n_estimators=25, score=0.7033034009182886, total= 3.6min
[CV] learning_rate=0.3, max_depth=5, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=5, n_estimators=25, score=0.6759195285511075, total= 3.6min
[CV] learning_rate=0.3, max_depth=5, n_estimators=25 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=5, n_estimators=25, score=0.7001097605593724, total= 3.6min
[CV] learning_rate=0.3, max_depth=5, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=5, n_estimators=50, score=0.700743569948397, total= 7.5min
[CV] learning_rate=0.3, max_depth=5, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[CV]  learning_rate=0.3, max_depth=5, n_estimators=50, score=0.6653932127616338, total= 7.6min
[CV] learning_rate=0.3, max_depth=5, n_estimators=50 .................
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
C:\Users\hche958\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
  if diff:
[Parallel(n_jobs=1)]: Done  36 out of  36 | elapsed: 161.6min finished
[CV]  learning_rate=0.3, max_depth=5, n_estimators=50, score=0.6992560673198097, total= 7.5min
Best score: 0.701
Best parameters set:
	learning_rate: 0.1
	max_depth: 5
	n_estimators: 25

Make Predictions on test data

Now we can use our trained model (with best training parameters) to make predictions on our test data.

In [18]:
# Prepare test data for prediction
df_test.set_index('id', inplace = True)
df_test = pd.merge(df_test.loc[:, ['date_first_booking']], 
                   df_all, how = 'left', left_index=True,
                   right_index=True, sort=False)
x_test = df_test.drop('date_first_booking', axis = 1, inplace = False)
x_test = x_test.fillna(-1)
id_test = df_test.index.values

# Make predictions
y_pred = model.predict_proba(x_test)


Comments

comments powered by Disqus