Close

Web-based multi-user concurrent job scheduling system on the shared computing resource objects

Current status: Accepted.

Abstract:  We propose a user-friendly jobs and resources allocation manager for the ML server. We introduce some unique features of the designed system such as protection of user’s sensitive data, automatic cleaning of unused information, secure of the host OS via environment virtualization (container), and direct access to the container via SSH. The proposed web-based tool allows users to request and allocate resources on a server and monitor the progress of their tasks. It is created to simplify access to servers particularly ML servers, to allocate computational resources while satisfying data security concerns. The proposed tool also relieves system administrators form manually allocating resources to users and monitor the progress. The tool is user friendly and transparent so that the system administrator and the user can simply view all jobs in progress to find the best allocation for their tasks.

Conclusion. This paper is devoted to the design and implementation of the multi-user concurrent web-based job scheduling system on the shared computing resource objects. We show here how the computing environment can be automated to execute deep learning computations in parallel.

There are a number of cloud services that allow the user to use cloud-based computational resources, however, there is no task manager with a user-friendly interface available on the deep learning servers. Therefore, in the future when many users (particularly the ones with no computer science background or skills) use the system such a task manager may be very helpful. Web-based job scheduling system on the shared computing resource objects may be deployed and used at the institutions, which purchased ML servers.

The designed job scheduling system is a fault-tolerant and consists of four key features such as to request resources, job scheduling, access to the virtual environment, and automatic data cleaning. It is fully operational and can be accessed from the inner UAEU network (http://dgx1-request.aa.uaeu.ac.ae).

Leave a Reply

Your email address will not be published. Required fields are marked *