Frame the problem
The first thing you have to do before you solve a problem is to define exactly what it is
Data Collection: The very first and the most important step is to collect relevant data corresponding to our problem statement
Data Pre-Processing: The data gathered from the previous step most probably is not fit to be used by our machine learning algorithm yet, as this data might be incomplete, inconsistent and is likely to contain many errors and missing values
- Missing values, perhaps customers without an initial contact date
- Corrupted values, such as invalid entries
Creating Transformers
Before we can use the dataset to estimate a model, we need to do some transformation.
Spark machine learning algorithms work with two columns that must be named
Spark machine learning algorithms work with two columns that must be named
features
and label
, by default. The features
column must be a vectorrepresentation of the features we intend to use to estimate a model while the label
column represents the column with the different outcomesCreating an Estimator
We shall now create our estimator
Creating a Pipeline
Now, create a Pipeline to pull the different transformations together:
Model Evaluation: After the model is trained, it is then evaluated, using some evaluation metric, on the test dataset
Performance Improvement: The performance of the model can further be improved on both the training and testing datasets
No comments:
Post a Comment