Importing, Inspecting, and Scoring With MOJO Fashions Inside H2O

Machine-learning fashions created with H2O could also be exported in two primary methods:

  1. Binary format,
  2. Mannequin Object, Optimized (MOJO).

An H2O mannequin may be saved in a binary format, which is tied to the very particular model of H2O it has been created with. There are a number of causes for such a restriction. One of many vital causes is that model-building algorithms might evolve in time. The algorithm’s hyperparameters, in addition to the “conduct” of the algorithm itself, might change. To acquire extra details about H2O fashions, please go to the official documentation.

You might also like:  Machine Studying With H2O — Palms-On Information for Knowledge Scientists

The second choice is a MOJO. Not like binary fashions, MOJOs are supposed to productionize H2O fashions. These are self-contained fashions, deployable right into a manufacturing setting. Sometimes, as soon as a mannequin is well-performing, a MOJO is exported and given to engineers to be deployed into manufacturing, bridging the hole between engineering and knowledge science. An in-depth description of H2O MOJOs is supplied in Productionizing H2O documentation.

Since H2O launch 3.26.0.8, it’s potential to re-import MOJO fashions again into H2O and:

  1. Examine hyperparameters used to coach the mannequin
  2. See the scoring historical past
  3. Predict
  4. Show variable importances
  5. Use it precisely as native H2O mannequin, aside from checkpointing

With the brand new MOJO import performance, all of the details about the mannequin is offered for the H2O person to examine. Additionally, there isn’t any want to make use of the GenModel for scoring a dataset if solely the MOJO mannequin is offered — by importing it again into H2O, doing predictions with such imported mannequin are made accessible. And in case the MOJO will get misplaced and the H2O cluster has it nonetheless loaded, it may be re-exported once more.

This performance is offered through all H2O interfaces: Circulate, Python & R.

Word: Apart from MOJO, an analogous performance named POJO used to exist in H2O as effectively. POJOs at the moment are deprecated and the performance described on this article doesn’t apply to POJOs.

Circulate

There are two methods to import a MOJO utilizing Circulate:

  1. Use MOJO import performance immediately
  2. Pre-upload the MOJO to the H2O cluster after which use MOJO import performance

To entry MOJO import, within the upmost menu of Circulate, choose the “Mannequin” choice and within the backside a part of the menu, then click on on “Import MOJO Mannequin”. A dialogue seems, asking for:

  1. Mannequin ID
  2. MOJO file key
  3. Path to the MOJO

Mannequin ID is already pre-generated by H2O and modifying it’s non-compulsory. MOJO file secret’s an non-compulsory parameter, usable when a MOJO was pre-uploaded from H2O person̈́’s native filesystem to the cluster after which imported. If the MOJO zip file saved out of attain of the H2O cluster, clicking on the “Knowledge” choice in Circulate’s upmost menu after which utilizing the “Add file” dialogue to add the MOJO first makes it potential for a MOJO to be imported.

By default, solely the final enter field named path is stuffed by the person. It represents path on the H2O cluster’s filesystem to the import MOJO zip file.

By clicking on the import button, the MOJO mannequin is definitely imported and registered inside H2O. To any extent further, it may be used like a standard H2O mannequin, with just a few restrictions listed above. By clicking on the View button, import MOJO mannequin’s particulars may be displayed.

Discover the predict button is energetic — customers are capable of make predictions with imported MOJO fashions.

Python

As in Circulate and R, there are two methods to import a MOJO utilizing Circulate:

  1. Use MOJO import performance immediately
  2. Pre-upload the MOJO to the H2O cluster, then use MOJO import performance

The pre-upload performance is beneficial when the MOJO mannequin cannot be imported immediately, being out of attain of H2O cluster’s filesystem, e.g. residing on person’s native filesystem. Merely importing the MOJO utilizing h2o.upload_file('/some/path/to/mojo.zip') after which utilizing the import performance solves this downside. Nonetheless, importing the file manually after which calling the H2OGenericEstimator’s constructor introduces numerous overhead. Due to this fact, we’ve launched h2o.upload_mojo('/path/to/some/mojo.zip') comfort perform. Nonetheless, for the only of use instances, there’s a perform named h2o.import_mojo('/some/path/to/mojo.zip'). This perform takes a path accessible by the H2O cluster, imports the MOJO and creates an H2OGenericEstimator. Such H2OGenericEstimator can maintain any mannequin, together with every kind of imported MOJO fashions, therefore the identify Generic.

Such a mannequin can then be used to do predictions, similar to any H2O mannequin with mojo_model.predict(airways). In reality, all of the parameters, the scoring historical past, it’s all there! The itemizing beneath exhibits a primary use-case the place a GBM mannequin is created, saved into a brief MOJO zip file after which loaded again into H2O. As soon as the mannequin is imported, making predictions with the imported MOJO mannequin is demonstrated utilizing h2o.predict perform.

 airlines_data = h2o.import_file("https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv") ## Create a GBM mannequin, solely to later export it as a MOJO from h2o.estimators import H2OGradientBoostingEstimator original_model = H2OGradientBoostingEstimator(ntrees = 1) original_model.practice(x = ["Origin", "Dest"], y = "IsDepDelayed", training_frame=airlines_data) #Save the beforehand created mannequin into a brief file import tempfile original_model_filename = tempfile.mkdtemp() original_model_filename = original_model.download_mojo(original_model_filename) # Load the mannequin from the non permanent file mojo_model = h2o.import_mojo(original_model_filename) predictions = mojo_model.predict(airlines_data)

As a substitute for the h2o.import_mojo('/some/path/to/mojo.zip') perform, making a generic mannequin immediately is feasible as effectively with the H2OGenericEstimator.from_file('/some/path/to/mojo.zip') perform. The result’s precisely the identical as with the h2o.import_mojo perform. See the runnable instance beneath for comparability.

 airlines_data = h2o.import_file("https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv") ## Create a GBM mannequin, solely to later export it as a MOJO from h2o.estimators import H2OGradientBoostingEstimator original_model = H2OGradientBoostingEstimator(ntrees = 1) original_model.practice(x = ["Origin", "Dest"], y = "IsDepDelayed", training_frame=airlines_data) #Save the beforehand created mannequin into a brief file import tempfile original_model_filename = tempfile.mkdtemp() original_model_filename = original_model.download_mojo(original_model_filename) # Load the mannequin from the non permanent file utilizing H2OGenericEstimator from h2o.estimators import H2OGenericEstimator mojo_model = H2OGenericEstimator.from_file(original_model_filename) predictions = mojo_model.predict(airlines_data)

Add a MOJO in Python

If the MOJO zip file isn’t reachable by the H2O cluster, it could should be uploaded first with h2o.upload_file('path/to/some/mojo.zip') after which, the important thing to the uploaded file could be required to be equipped to the H2OGenericEstimator’s constructor. Nonetheless, for importing a MOJO not reachable immediately by the H2O cluster, there’s a comfort perform h2o.upload_mojo('/path/to/some/mojo.zip'). Internally, the MOJO zip file is uploaded into H2O and represented as a Body of bytes. Afterwards, the important thing of such byte Body is equipped to the H2OGenericEstimator, making a Generic mannequin by utilizing the supplied body, as an alternative of attempting to import a file from cluster’s filesystem.

A totally reproducible instance is to be discovered within the following instance.

 airlines_data = h2o.import_file("https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv") ## Create a GBM mannequin, solely to later export it as a MOJO from h2o.estimators import H2OGradientBoostingEstimator original_model = H2OGradientBoostingEstimator(ntrees = 1) original_model.practice(x = ["Origin", "Dest"], y = "IsDepDelayed", training_frame=airlines_data) #Save the beforehand created mannequin into a brief file import tempfile original_model_filename = tempfile.mkdtemp() original_model_filename = original_model.download_mojo(original_model_filename) # Add a MOJO mannequin and create a Generic mannequin out of it mojo_model = h2o.upload_mojo(original_model_filename) predictions = mojo_model.predict(airlines_data)

R

As in Circulate and Python, there are two methods to import a MOJO utilizing Circulate:

  1. Use MOJO import performance immediately,
  2. Pre-upload the MOJO to the H2O cluster, then use MOJO import performance.

The pre-upload performance is beneficial when the MOJO mannequin cannot be imported immediately, being out of attain of H2O cluster’s filesystem, e.g. residing on person’s native filesystem. Merely importing the MOJO utilizing h2o.upload_file('/some/path/to/mojo.zip') after which utilizing the h2o.generic(model_key = 'some_model_key') performance solves this downside, however is numerous work to do. Due to this fact, there’s a comfort perform named h2o.upload_mojo('/path/to/some/mojo.zip), which does every thing in a single name. MOJO add in R has its devoted part beneath, named “Add a MOJO in R”. Nonetheless, for the only of use instances, there’s a perform named h2o.import_mojo('/some/path/to/mojo.zip'). This perform takes a path accessible by the H2O cluster, imports the MOJO and creates an H2OGenericEstimator. Such H2OGenericEstimator can maintain any mannequin, together with every kind of imported MOJO fashions, therefore the identify Generic.

Such a mannequin can then be used to do predictions, similar to any H2O mannequin with mojo_model.predict(airways). In reality, all of the parameters, the scoring historical past, it’s all there ! The itemizing beneath exhibits a primary use-case the place a GBM mannequin is created, saved into a brief MOJO zip file after which loaded again into H2O. As soon as the mannequin is imported, making predictions with the imported MOJO mannequin is demonstrated utilizing h2o.predict perform.

airlines_data <- h2o.importFile("https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv") ## Create a GBM mannequin, solely to later export it as a MOJO
original_model <- h2o.gbm(x = c("Origin", "Dest"), y = "IsDepDelayed", training_frame=airlines_data, ntrees = 1) #Save the beforehand created mannequin into a brief file
original_mojo_path <- h2o.download_mojo(mannequin = original_model, path = tempdir())
original_mojo_path <- paste0(tempdir(),"/",original_mojo_path) # Load the mannequin from the non permanent file
mojo_model <- h2o.import_mojo(original_mojo_path)
predictions <- h2o.predict(mojo_model, airlines_data)

As a substitute for the h2o.import_mojo('/some/path/to/mojo.zip') perform, making a generic mannequin can also be potential by calling h2o.genericModel('/some/path/to/mojo.zip') perform. The result’s precisely the identical as with the h2o.import_mojo perform. See the runnable instance beneath for comparability.

airlines_data <- h2o.importFile("https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv") ## Create a GBM mannequin, solely to later export it as a MOJO
original_model <- h2o.gbm(x = c("Origin", "Dest"), y = "IsDepDelayed", training_frame=airlines_data, ntrees = 1) #Save the beforehand created mannequin into a brief file
original_mojo_path <- h2o.download_mojo(mannequin = original_model, path = tempdir())
original_mojo_path <- paste0(tempdir(),"/",original_mojo_path) # Load the mannequin from the non permanent file
mojo_model <- h2o.genericModel(original_mojo_path)
predictions <- h2o.predict(mojo_model, airlines_data)

Add a MOJO in R

If the MOJO zip file isn’t reachable by the H2O cluster, it could should be uploaded first with h2o.upload_file('path/to/some/mojo.zip') after which, the important thing to the uploaded file could be required to be equipped to the h2o.generic perform. Nonetheless, for importing a MOJO not reachable immediately by the H2O cluster, there’s a comfort perform h2o.upload_mojo('/path/to/some/mojo.zip'). Internally, the MOJO zip file is uploaded into H2O and represented as a Body of bytes. Afterwards, the important thing of such byte Body is equipped to the h2o.generic(model_key = 'some_h2o_key'), making a Generic mannequin by utilizing the supplied body, as an alternative of attempting to import a file from cluster’s filesystem.

A totally reproducible instance is to be discovered within the following instance.

airlines_data <- h2o.importFile("https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv") ## Create a GBM mannequin, solely to later export it as a MOJO
original_model <- h2o.gbm(x = c("Origin", "Dest"), y = "IsDepDelayed", training_frame=airlines_data, ntrees = 1) #Save the beforehand created mannequin into a brief file
original_mojo_path <- h2o.download_mojo(mannequin = original_model, path = tempdir())
original_mojo_path <- paste0(tempdir(),"/",original_mojo_path) # Load the mannequin from the non permanent file
mojo_model <- h2o.upload_mojo(original_mojo_path)
predictions <- h2o.predict(mojo_model, airlines_data)

Documentation and Closing Ideas

The H2O MOJO import performance evolves over time. To discover the entire performance and potential limitations, please go to H2O MOJO Import official documentation.

Bear in mind, H2O.ai is open-source and may be discovered on GitHub. Discovered a bug? Head to H2O JIRA and file a difficulty. Have questions? H2O presents neighborhood Gitter and Slack.

Additional Studying

Utilizing an AutoML H2O Mannequin to Predict Attrition and LIME to Clarify the Predicted Class

Anomaly Detection With Isolation Forests Utilizing H2O

readofadmin

Leave a Reply

Next Post

Star Wars: The Rise of Skywalker Closing Trailer is Each Darkish and Inspiring

Tue Oct 22 , 2019
The ultimate trailer of Star Wars: The Rise of Skywalker is out, and it has some moments that can induce nostalgia, terror, hope, and a large spectrum of different feelings in followers eagerly ready for the movie. The trailer made its debut earlier immediately and showcases some fan favorite characters from the […]