Running Inference on a Distributed Heterogeneous Network

06 September 2023

Previous experience working with distributed networks in separate computing applications such as blockchain and AI has revealed the opportunities and challenges of said networks. The most crucial step in determining utility is through understanding certain criteria that are most appropriate for this concept. The idea of leveraging multiple interconnected machines to perform computations has given rise to more advanced techniques to be used in machine learning (ML) training, particularly as it pertains to the discussions around compute supply and demand.

Performance Variables

The complexity of this paradigm is underscored by the multitude of factors that each affect the performance of the others, all of which undoubtedly play a significant role in the continuity of such a dynamic system. A distributed heterogeneous compute network means that nodes have varying levels of computational power, memory, and storage. Ensuring that the ML inference workload is appropriately allocated to match the capabilities of each node can be a challenge in itself. Resource allocation is a time-intensive process that requires periodic evaluation of multiple server parameters, CPU units, memory size, network characteristics such as bandwidth and topology, amongst other variables. Moreover, the increasing adoption of novel hardware using ASICs and GPU makes evaluations between server parameters difficult.

All of these components making up the network collectively exhibit an innate lack of uniformity that ultimately could result in a compromise on model accuracy, depending on network architecture, data distribution, and the algorithms applied. Some proposed frameworks trade model accuracy for the ability to be run simultaneously on-device and on the cloud, while others partition the model to be run across edge devices without any cloud processing, thereby not altering the model’s architecture and sacrificing accuracy [1]. This approach is called distributed deep learning, in which each node contains a partitioned portion of the model and trains on local data. Local models are then combined to update the global model but this process requires continuous coordination of machines to ensure consistency and convergence of the global model. Inference can then be performed directly on local devices to reduce latency and conserve bandwidth by avoiding having to send data to a centralized server. Distributed inference models have since been created for models that cannot fit on a singular device, leading to partitioning of the trained model [2]. Communication overhead is a persistent concern as the network quality varies greatly across nodes. Practitioners must be aware of the networking constraints unique to a distributed network; latency can diminish the benefits of distribution if the network is bereft of rapid data and model parameter synchronicity, a necessity in ML inference as it constantly requires the most updated data.

IoT Environment

Edge computing has shown promising results for deep learning as opposed to cloud computing especially in the case of bandwidth efficiency since data does not need to travel far from its local device to be processed. Savings on bandwidth overhead is of note particularly when there is a slew of devices in IoT environments. The characteristics of a deep neural network (DNN) being of high accuracy, scalability and reconfigurability complements the burgeoning advancement in hardware and lightweight frameworks that provide a foundation for the deployment of deep learning models to the IoT edge [3]. With IoT devices in play, the discussion of ML model deployment in resource-constrained communities and environments [4] has expedited widespread research in edge computing scenarios, bringing to the fore data privacy and security of local data.

Federated Learning

At one point I was able to connect with one of the authors of [5] to discuss a potential partnership. Upon further communication exchanges, both parties realized that there are certain geopolitical impediments that were difficult to resolve given the nature of sensitive data on local devices as it pertains to nodes located in various international countries and their respective privacy laws. Naturally, distributing ML models across a decentralized network introduces security risks when devices must communicate recurringly to share data and model updates. Federated learning (FL) is one such technique enables a model to be trained locally using the node’s data, and instead of sharing raw data, only the model updates (gradients) are sent to cloud servers. “FL is a sub-area of distributed optimization where both data collection and model training is pushed to a large number of edge clients that have limited communication and computation capabilities [6].” In a FL model, the effects of heterogeneity in devices and updates are exacerbated by non-uniform local dataset sizes, computing speeds, learning rates, and local optimizers. Therefore it must be acknowledged that in FL there is an objective inconsistency that may not be present in traditional distributed optimization frameworks since consensus is performed after each local gradient computation. FL requires that the subset of clients selected in each communication round perform multiple local model updates before aggregating them to update the global model.

Conclusion

Nevertheless, challenges surrounding communication overhead, data consistency, and resource heterogeneity remind the industry that seamless integration requires careful consideration and engineering. As the fields of distributed systems and machine learning continue to evolve, researchers and engineers are collaboratively addressing these challenges, devising innovative solutions that balance the benefits of distributed inference with the intricacies of network dynamics. By harnessing the strengths of both paradigms and acknowledging their limitations, such methodologies pave the way for more efficient, robust, and impactful applications of machine learning in the distributed computing landscape. However, due to limited resources such as computing capability, memory resources and communication bandwidth of edge devices in the IoT environment, under the premise of ensuring timely collection and processing of data, and with little or no loss of accuracy, coordinating edge clusters to complete deep learning inference efficiently faces with many challenges. Therefore, how to effectively divide, allocate and schedule deep learning inferences in resource-constrained IoT edge clusters is an urgent problem to be solved.

References

[1] Parthasarathy, A., Krishnamachari, B. DEFER : Distributed Edge Inference for Deep Neural Networks. COMSNETS 749-753 (2022). https://doi.org/10.48550/arXiv.2201.06769
[2] Mahajan, K., Desai, R. Serving distributed inference deep learning models in serverless computing. IEEE Conference on Cloud Computing (CLOUD) (2022). https://research.facebook.com/publications/serving-distributed-inference-deep-learning-models-in-serverless-computing/
[3] Li, Q., Huang, L., Tong, Z., Du, T., Zhang, J., Wang, S. DISSEC : A distributed deep neural network inference scheduling strategy for edge clusters. Neurocomputing 500 :449-460 (2022). https://doi.org/10.1016/j.neucom.2022.05.084
[4] Truong, HL., Truong-Huu, T., Cao, TD. Making distributed edge machine learning for resource-constrained communities and environments smarter: contexts and challenges. J Reliable Intell Environ 9, 119–134 (2023). https://doi.org/10.1007/s40860-022-00176-3
[5] Filho CP, Marques E Jr., Chang V, dos Santos L, Bernardini F, Pires PF, Ochi L, Delicato FC. A Systematic Literature Review on Distributed Machine Learning in Edge Computing. Sensors 22(7):2665 (2022). https://doi.org/10.3390/s22072665
[6] Wang, J., Liu, Q., Liang, H., Joshi, G., Vincent Poor, H. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. (2020). https://doi.org/10.48550/arXiv.2007.07481