头图

ML.NET is an open-source, cross-platform machine learning framework for .NET developers that can integrate custom machine learning into .NET applications. We're happy to tell you about the work we've done over the past few months.

AutoML update

Automated Machine Learning (AutoML) automates this process by making it easier to find the best algorithm for your scenario and dataset. AutoML is a backend that supports model generators and the ML.NET CLI training experience. Last year, we announced updates to our AutoML implementation in our Model Builder and Neural Network Intelligence (NNI) -based ML.NET CLI tool, as well as in Microsoft Research's Fast Lightweight AutoML (FLAML) technology. These updates provide several benefits and improvements over the previous solution, including:

  • The number of models explored increases.
  • Improved timeout error rate.
  • Improved performance metrics (eg accuracy and r-squared).

Until recently, you could only take advantage of these AutoML improvements in our tools.

We're excited to announce that we've integrated AutoML's NNI/FLAML implementations into the ML.NET framework so you can use them from a code-first experience.

To get started with the AutoML API, install the latest pre-release versions of Microsoft's Microsoft.ML and Microsoft.ML.Auto NuGet packages using the ML.NET daily feed.

Experimental API

An experiment is a collection of training or trials. Each trial produces information about itself, such as:

  • Evaluation Metrics: Metrics used to evaluate the predictive ability of the model.
  • Pipeline: Algorithms and hyperparameters for training the model.
    The Experiments API provides AutoML with a set of defaults that make it easier for you to add it to your training pipeline.

     // 配置AutoML管道
    var experimentPipeline =    
      dataPrepPipeline
          .Append(mlContext.Auto().Regression(labelColumnName: "fare_amount"));
    // 配置实验
    var experiment = mlContext.Auto().CreateExperiment()
                     .SetPipeline(experimentPipeline)
                     .SetTrainingTimeInSeconds(50)
                     .SetDataset(trainTestSplit.TrainSet, validateTestSplit.TrainSet)
                     .SetEvaluateMetric(RegressionMetric.RSquared, "fare_amount", "Score");
    // 运行实验
    var result = await experiment.Run();

    In this code snippet, the dataprepipeline is a series of transformations that transform the data into a format suitable for training. The AutoML component that trains regression models is attached to this pipeline. The same concept applies to other supported scenarios, such as classification.

When you create an experiment with a defined training pipeline, the settings you can customize include training time, training and validation sets, and optimized evaluation metrics.

After defining the pipeline and experiment, call the Run method to start training.

▌Search Spaces and Cleanable Estimators

If you need more control over the hyperparameter search space, you can define the search space and add it to the training pipeline using clearable estimators.

 // 配置搜索空间
var searchSpace = new SearchSpace<LgbmOption>();


// 初始化估计器管道
var sweepingEstimatorPipeline =
    dataPrepPipeline
        .Append(mlContext.Auto().CreateSweepableEstimator((context, param) =>
                 {
                     var option = new LightGbmRegressionTrainer.Options()
                     {
                         NumberOfLeaves = param.NumberOfLeaves,
                         NumberOfIterations = param.NumberOfTrees,
                         MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,
                         LearningRate = param.LearningRate,
                         LabelColumnName = "fare_amount",
                         FeatureColumnName = "Features",
                         HandleMissingValue = true
                     };


                     return context.Regression.Trainers.LightGbm(option);
                 }, searchSpace));

The search space defines the range of hyperparameters used for the search.

Cleanable estimators enable you to use the search space in the ML.NET pipeline like any other estimator.

To create and run experiments, you need to use the same process of CreateExperiment and run methods.

Model Builder and ML.NET CLI Updates

We've made several updates to Model Builder and the ML.NET CLI. Two of which I would like to highlight are:

  • Model Builder for Time Series Forecasting Scenarios
  • New version of .NET CLI

▌Time Series Forecasting Scenario (Preview)

Time series forecasting is the process of identifying patterns in time-related observations and making forecasts for several future periods. Real world use cases are:

  • Forecast product demand
  • Energy consumption forecast

In ML.NET, choosing a trainer for time series forecasting is not too difficult, since you only have one choice, ForecastBySsa . The hard part is finding parameters like the time window to analyze and how far into the future. Finding the right parameters is an experimental process, and it's an excellent job of AutoML. Updates to our AutoML implementation enable intelligent search through hyperparameters, simplifying the process of training time series forecasting models.

As a result of these efforts, we are happy to share that you can now train time series forecasting models in Model Builder.

Download or update to the latest version of Model Builder to start training your time series forecasting model.

▌New version of ML.NET CLI

ML.NET CLI is our cross-platform .NET global tool that leverages AutoML to train machine learning models on x64 and ARM64 devices running Windows, MacOS, and Linux. A few months ago, we released a new version of the ML.NET CLI, which brought:

  • .NET 6 support
  • Support ARM64 architecture
  • New scene image classification (for x64 architecture)
    Suggested forecast

Install the ML.NET CLI and start training the model from the command line.

Keyboard Shortcuts for Notebooks

Interactive Notebooks are widely used in data science and machine learning. They are useful for data exploration and preparation, experimentation, model interpretation, and education.

Last October, we released the Visual Studio Notebook Editor extension based on .NET Interactive . We've been improving performance and stability over the past few months.

In our latest release, we've made it easier for you to work without leaving your keyboard by enabling keyboard shortcuts. If you've used notebooks before, you should be familiar with many of the shortcuts.

image.png

Execute/run the cell and move focus down

The keys in the table are capitalized, but capitalization is not required.

Install the latest version of Notebook Editor and start creating notebooks in Visual Studio.

What's next for ML.NET?

We are actively working towards the areas outlined in the roadmap .

▌Deep Learning

A few months ago, we shared our plans for deep learning . A large part of the plan revolves around improving the ONNX consumption experience and enabling new scenarios through TorchSharp, a .NET library that provides access to the libraries that drive PyTorch. Some of the progress we have made towards this plan include:

Enable global GPU flags for ONNX inference. Before this update, the FallbackToCpu and GpuDeviceId flags in the ApplyOnnxModel transform were not saved as part of the pipeline when you wanted to use the GPU to infer an ONNX model. Therefore, piping has to be installed every time. We've made these flags accessible as part of the MLContext, so you can save them as part of your model.

TorchSharp targets .NET standards. TorchSharp originally targeted .NET 5. As part of our work to integrate TorchSharp into ML.NET, we updated TorchSharp to target the .NET standard.
Over the next few weeks, we're excited to share our progress on TorchSharp's integration with ML.NET.

.NET DataFrames

Clear and representative data helps improve model performance. Therefore, the process of understanding, cleaning, and preparing training data is a critical step in a machine learning workflow. We introduced the DataFrame type in .NET a few years ago as a preview of the Microsoft.Data.Analysis NuGet package. DataFrame is still in preview. We understand how important it is to use tools to perform data cleaning and processing tasks, and have begun to organize and prioritize feedback so we address existing stability and developer experience pain points. This feedback is organized as part of GitHub issues.

We created this tracking issue to track and organize feedback. If you have any feedback you'd like to share with us, please vote for individual issues in the description or comment directly in the tracked issue.

MLOps

Machine Learning Operations (MLOps) is like DevOps for the Machine Learning lifecycle. This includes model deployment and management as well as data tracking, which facilitates productization of machine learning models. We're always evaluating ways to improve this experience with ML.NET.

We recently published a blog post that walks you through the process of setting up an Azure Machine Learning dataset, training an ML.NET model with the ML.NET CLI, and configuring a retraining pipeline with Azure Devops. For more details, see the article "Training ML.NET Models in Azure ML" .

Getting Started and Resources

Learn more about ML.NET, Model Builder, and the ML.NET CLI in the Microsoft documentation .

If you encounter any issues, feature requests, or feedback, please file an issue in the ML.NET repo on GitHub or in the ML.NET Tools (Model Builder & ML.NET CLI) repo .



Long press to identify the QR code and follow Microsoft China MSDN

Click to learn more about ML.NET ~


微软技术栈
418 声望994 粉丝

微软技术生态官方平台。予力众生,成就不凡!微软致力于用技术改变世界,助力企业实现数字化转型。