AI / ML / DL with OpenShift and IBM Power Systems
What is Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)?
Artificial Intelligence (AI)
Machine Learning (ML)
ML is, within AI, the ability to learn, using different models, without being directly programmed to do so. Algorithms and statistical systems such as patterns and inferences are used to achieve certain unattended learning capabilities.
Deep Learning (DL)
What does OpenShift bring to DL and ML systems?
By using containers within our hybrid cloud to deploy our Deep Learning and Machine Learning workloads, we can make much better use of our infrastructure investment: storage, servers and networking. Since OpenShift version 4.7, if deployed on Power Systems (specifically on AC922 and IC922 models) it allows running different ML and DL models even sharing GPUs. This represents a real revolution in on-premises ML projects and more than significant cost savings compared to existing cloud alternatives: think that in addition to the costs of running the various training sessions, all the data must be uploaded and downloaded from the cloud, with the high costs that this entails.
OpenShift Container Storage and DevOps
With the help of OpenShift Container Storage (OCS), each developer can manage different instances and versions of the same model following devops practices and on our own storage. When the model is ready to be deployed, the user can start a configuration and deployment process at any time. A version control system and advanced orchestration capabilities are available including automatic testing of the new code. This is made possible by the latest advances in GPUs virtualization technologies and integration into the only HW platform that has dedicated connections between GPUs (FPGAs) and sockets. This avoids bottlenecks with bandwidth several times that of Intel processor-based architectures. We can also run different models simultaneously on GPUs (FPGAs) of the same graphics card.
Collaboration between data scientists and developers
OpenShift is a unified platform where data scientists, software developers and system administrators can collaborate in a simple and robust way. This allows you to accelerate the deployment of applications of all kinds, including ML/IA in minutes thanks to its self-service portal. Quickly create, scale, reproduce, test and share the results of AI/DL/ML models in an agile way with the other people involved in this type of projects, including project managers, mathematicians, programmers and customers.
Can I still use AWS, Azure, or Google Cloud for ML?
Of course. There will be certain models, workloads or projects where it is interesting for various reasons to use cloud provider services. In others, either because of its high costs or requirements arising from data protection, we will choose to make it our own infrastructure. OpenShift allows you to manage it easily and transparently.