Continuing our series of open source developer tools. Today we talk about frameworks and libraries for ML – Transformers, Accord.NET and MLflow.
Photo – Franck V. – Unsplash
This is a library of natural language processing models on TensorFlow 2.0 and PyTorch. It contains more than 32 pre-trained models – BERT, DistilBert, XLM, GPT-2, XLNet and others.
The authors of the library were engineers from HuggingFace, a company that develops NLP-algorithms. They introduced the Hierarchical Multi-Task Learning (HMTL) machine multitasking model, which took another step toward solving the problem of “catastrophic forgetfulness”. HMTL was shown at AAAI 2019, an international academic conference on artificial intelligence systems.
A key characteristic of Transformers is the ability to exchange trained models and convert them from one framework to another: TF2.0 or PyTorch. The developers note that their solution allows you to describe the procedure for training the model with three lines of code.
An extensive community has formed around the library – almost 15 thousand stars on GitHub. You can evaluate the capabilities of Transformers yourself on the project website: the developers taught the neural network to complete sentences for you.
A framework tailored for C # that provides basic tools for data analysis and machine learning: from testing statistical hypotheses to building models of computer vision and image processing. Accord.NET is one of the most popular ML solutions in the .NET ecosystem. Initially, it was an extension of the AForge.NET library, but then absorbed it.
The tool offers probability distributions, core functions and benchmarks for evaluating the performance of models. Accord.NET is divided into libraries available as executable modules, compressed archives, or NuGet packages. Among them are: Math for working with matrices, Imaging for image processing and Audio with sound functions. You can also highlight Neuro with the Levenberg - Marquardt and deep learning algorithms.
Accord.NET was used for research by engineers from universities in the UK, Egypt, China and other countries. And in general, the framework uses a fairly large number of developers - it has more than 3.5 thousand stars on GitHub.
Confusing documentation, which is difficult for beginners, is one of its shortcomings. Although the situation is slightly simplified by the availability of a quick start guide and detailed comments in the code. Further information on Accord.NET can also be found in the literature. The developers themselves recommend Machine Learning Projects for .NET Developers, F # for Machine Learning Essentials, and a couple of others.
It is a platform for the full cycle of machine learning, simplifying the development, deployment, and exchange of models. It offers a set of APIs that work with any library (TensorFlow, PyTorch, XGBoost, etc.) and in any environment, including the cloud. MLflow developers are programmers from Databricks, a startup founded by people from Apache Spark.
MLflow has built-in integrations with Docker, TensorFlow, PyTorch, Kubernetes, Java, Spark and other open source projects. At the same time, MLflow is used by organizations such as Microsoft, Accenture, SK Telecom and even Washington University.
Among the disadvantages of MLflow, one can single out the lack of support for R and Java, despite their popularity in the field of machine learning. However, the point here is the relative youth of the project, and the developers promise to add appropriate APIs in the future. The youth of the instrument leaves another imprint - there are bugs in its work.
If you want to independently evaluate MLflow at work, you can start familiarizing yourself with the official documentation. If you have questions, a relatively small but active community on StackOverflow or Google Groups will help with their solution