High Availbility

OS & Virtualization

Monday, August 12, 2019

Typical Steps in Machine Learning



Frame the problem

The first thing you have to do before you solve a problem is to define exactly what it is

Data Collection: The very first and the most important step is to collect relevant data corresponding to our problem statement

Data Pre-Processing: The data gathered from the previous step most probably is not fit to be used by our machine learning algorithm yet, as this data might be incompleteinconsistent and is likely to contain many errors and missing values
  • Missing values, perhaps customers without an initial contact date
  • Corrupted values, such as invalid entries

Creating Transformers


Before we can use the dataset to estimate a model, we need to do some transformation.

Spark machine learning algorithms work with two columns that must be named 
features and label, by default. The features column must be a vectorrepresentation of the features we intend to use to estimate a model while the label column represents the column with the different outcomes

Creating an Estimator

We shall now create our estimator

Creating a Pipeline

Now, create a Pipeline to pull the different transformations together:


Model Evaluation: After the model is trained, it is then evaluated, using some evaluation metric, on the test dataset

Performance Improvement: The performance of the model can further be improved on both the training and testing datasets

No comments: