Late in July 2014 Microsoft released a “preview” version of a new machine learning application for their Azure cloud platform. Appropriately called Azure ML, this platform offers a data analyst or data scientist a visual workspace UI to design data experiments from loading, transformation, training, evaluating and consuming in applications. Even though it is in its infancy, Azure ML could become the best data science tool that is both very accessible and scalable. This article, however, focuses on the motivation behind this product in order to understand its future. This is not Microsoft’s first venture into the machine learning space. So, what has the machine learning journey been for Microsoft, how has it changed and where is it going?
Mining Structures is a product that shipped with Microsoft SQL Server (a part of its SSAS Business Intelligence service) for over a decade, including the latest SQL Server 2014. Mining Structures also allows analysts a fully featured UI to conduct data mining projects. Once a training dataset is defined, 9 different mining algorithms (such as clustering, logistic regression, decision tree and neural networks) can be applied to it in order to gain insights into underlying data structures/patterns. The fact that every database developer and BI analyst already has this product freely available and the fact that it is tightly integrated with the rest of the SQL Server products makes Mining Structures a very attractive product. However, Mining Structures in SQL Server is an entirely different product from Azure ML. Why release another product for similar solutions that does not leverage the previous products? Why continue to support both of these products at the same time? What exactly are the differences between these products and does Mining Structures have any advantages over Azure ML?
The Cloud Bet
Microsoft has developed their own cloud platform called Azure on which they are making available some previously on-premises software like SQL Server. In the cloud users can benefit from cheap entry into high-end software, flexible billing and scalable performance. So what does this mean for on-premises software like Mining Structures? The future of Mining Structures is similar as SQL Server. Microsoft has declared that the priority will be the cloud and has recently appointed a “cloud guy” as their CEO (Satya Nadella). The future of SQL Server and Mining Structures looks to be the cloud. Development and significant support for Mining Structures stopped with the 2008 version. SQL Server will undoubtedly continue on-premises, but Mining Structures doesn’t have nearly as big of a user base and could suffer a blow in the next version or two of SQL Server.
Despite being GUI driven and fully featured, SSAS Mining Structures serves a unique audience. It is packaged with the multidimensional BI product to be used by presumably by programmers, database or data warehouse developers. However, data mining is not suitable for such technicians, but rather advanced analysts. And this will partly be the cause of this product’s downfall, despite the fact that it is a good product that is tightly integrated in a data environment which data scientists today value, but perhaps didn’t jump on early enough. Microsoft was ahead of the more recent data analytics wave, didn’t see the transition of database developers to analysts they had hoped, and perhaps didn’t do enough to foster the community.
It is uncertain if the cloud will be the only place where advanced analytics that Microsoft shop users will use. They, and the rest of the analytics community certainly don’t currently and there are numerous reasons for that. Most popular analytics tools don’t live in the cloud (SAS, R, SPSS, etc.) and most organizations haven’t moved their data to a cloud platform like Azure. There will be a gulf where either the pull of products like Azure ML will be sufficient for companies to move to the cloud or to force them to go with a different vendor, because Microsoft will soon be out of the on-premises high analytics game.
The Comparison Between Azure ML and SSAS Mining Structures
Mining Structures is packaged in a supportive ecosystem with tools like SQL Server, .NET and SSIS (SQL Server Integration Services). Therefore, in that context it is a fully featured, end-to-end product. It is also entirely GUI based for more non-coders but fully programmable for developers wishing to automate things like model training, testing, prediction/segmentation and even model creation. The set of canned algorithms (decision trees, neural networks, clustering, market basket analysis, logistic regression, linear regression, and Naïve Bayes) it sports is diverse and suitable for almost any data mining task. However, the versions of these algorithms are fairly out of date and underpowered. Their relative simplicity does have some advantages in that Mining Structures allows for analysis of the developed models themselves and an ability to see what you are mining – which is why Microsoft emphasized “data mining” over “machine learning” in the Mining Structures product.
Azure ML, on the other hand, doesn’t have such model assessment capabilities but does come with the most current and popular machine learning algorithms. Despite being a part of a growing ecosystem like Azure, Azure ML is not as tightly integrated. This is why it had to incorporate some aspects of other necessary tools like SSIS for ETL work inside it. Azure ML has a “workflow” canvas that is similar to the SSIS, but not as feature rich. It abandoned the SSAS DMX language for its still underpowered web API – in Mining Structures DMX and .NET libraries allowed users to do anything you can do visually through code while the Azure web API does not. The web API does allow tighter integration in across platform applications that can make HTTP calls, however.
Azure ML also promises to integrate with R in a variety of ways and eventually be able to handle custom or 3rd party algorithm libraries.
Perhaps the greatest feature of Azure ML is that it promises to scale. While most current machine learning problems can be handled on on-premises solutions, some aren’t and likely won’t be down the line. Azure is a cloud and Azure ML should be able to scale at cloud levels.
Mining Structures is not very easy to start off with because it shared a lot of UI components with multidimensional data warehouse design (through the Visual Studio interface anyway). This confuses analysts. Once you’re up and running, however, it is pretty smooth sailing. Mining Structures can be accessed through Excel, which makes its usability and accessibility better than any analytical tool out there. Mining Structures also has very nice visuals to help interpret models which, as mentioned, Azure ML does not have.
Azure ML is pretty. The workflow metaphor works well and it is intuitive. There are some aspects of the drag-and-drop UI that do not need to be dragged-and-dropped and make the workspace messy. The customization of the modules which can be used is currently both cryptic and unpolished. The polishing is surely to take places since this is just a Preview version, but the terminology used in Azure ML for similar concepts in Mining Structures is different and perhaps not as friendly to non-veteran data scientists. Azure ML, however, is a tool for just such people. This tool is most appealing to analysts who can’t use or don’t know how to use tools like SAS, Python or R which is why this terminology is inappropriate.
If you purchased SQL Server, you get Mining Structures for free. So, it’s great because SQL Server is a very popular product. If you haven’t then you can’t use it. Azure ML has a lower entry barrier because it is a pay as you go model – but watch out, those tiny processing chargers can rack up quickly.
Support is a thorny issue here for both products. One is likely on its way out and the other hasn’t fully arrived. Mining Structures, has only 1 book completely devoted to it and it also has a relatively small online community to ask questions. Azure ML documentation is very sparse currently the user base is non-existent. The first book is due out later this month. So, one can only expect that the support will get better for Azure ML but for somebody currently evaluating what to use for mission critical projects I would say to use something else or wait and see.