Please add top Java Machine Learning frameworks
The DSVM is currently missing a proper environment to develop enterprise applications for Machine Learning and Deep Learning with Java. Despite of the fact that all the Python frameworks are the best tools for modeling and research, to get ready for production the backend in most case is more suitable to be C++, C# or Java, to address performances, multi-threading and backward interoperability with enterprise systems (distributed or not).
So I would suggest to add Java frameworks for Machine and Deep Learning like
- DeepLearning4J (see benchmarks here: https://github.com/deeplearning4j/dl4j-benchmark/blob/master/README.md)
- Weka: this is a powerful research tool, well known and used in the Research world, that provides powerful tools for clustering and predictive modeling.
- ELKI: another well known application for data clustering that covers the totality of known classic unsupervises clustering techniques most of them used today for data pre-processing and visualization;
- other libraries are Apache Spark Mlib and H2O;
Java BLAS libraries like
I will suggest also to provide GPU support for ND4J. A possibile approach would be to use isolated docker containers with nvidia-docker, that works pretty much fast and can handle multiple and distributed gpus as well.
To be clear, DL4J is the top open-source deep learning framework on the JVM. Weka uses on DL4J for its deep learning functionality.
Spark MLlib could be much more computationally efficient by relying on a scientific computing library like ND4J to push matrix manipulations down to C++.
H2O's deep learning offering is mostly limited to wrapping TensorFlow. They wrote their own MLPs.
Deeplearning4j supports almost any neural network, integrates with Spark, Kafka, Hadoop and other tools on the JVM, and most importantly for MSFT, it runs on Windows. How many other DL frameworks do that?
Gopi Kumar (MSFT) commented
Thanks for the suggestion and details of your scenario. Some of the ones in the list above are already available - Weka, Spark MLLib and H2O. Here is a more detailed list of salient tools available (still not 100% exhaustive!):
We also have docker engine, nvidia-docker built into the DSVM. So you can bring in containers to run on the DSVM and can leverage GPU hardware if needed.
We will evaluate what it would take to offer some of the other frameworks.
Also with the momentum behind frameworks like Tensorflow, Torch etc for deep learning and ability to export models in standard way with ONNX to run the inferencing on a variety of run time natively (like WinML for Windows), hopefully the integration with backends including Java, C++ will become more easier from what are seeing in terms of developments in the broader ML/AI industry.
Please do chime in with votes for specific tools to help us decide.