DCAART 2021 Abstracts

Short Papers
Paper Nr: 2

Two-dimensional Bin Packing Problem with Irregular Pieces Solving through Gamification and Machine Learning


Marcin Puchalski

Abstract: Recently, we have been observing advances in artificial intelligence that can be used to solve many old and new problems. This can result in new approaches to known problems that previously demanded a purely mathematical and algorithmic approach. A good example is the bin packing problem that we propose to treat as a 2D game that can be solved using machine learning with visual observations enabled. This work includes the practical part, in which we implement this approach using the Unity ML-Agents Toolkit.

Paper Nr: 4

Assured Multi-Agent Reinforcement Learning using Quantitative Verification


Joshua Riley

Abstract: In multi-agent reinforcement learning, several agents converge together towards optimal policies that solve complex decision-making problems. This convergence process is inherently stochastic, meaning that its use in safety-critical domains can be problematic. To address this issue, we introduce a new approach that combines multi-agent reinforcement learning with a formal verification technique termed quantitative verification. Our assured multi-agent reinforcement learning approach constrains agent behaviours in ways that ensure the satisfaction of requirements associated with the safety, reliability, and other non-functional aspects of the decision-making problem being solved. The approach comprises three stages. First, it models the problem as an abstract Markov decision process, allowing quantitative verification to be applied. Next, this abstract model is used to synthesise a policy which satisfies safety, reliability, and performance constraints. Finally, the synthesised policy is used to constrain agent behaviour within the low-level problem with a greatly lowered risk of constraint violations.

Paper Nr: 5

A Hybrid Approach for Reinforcement Learning using Virtual Policy Gradient for Balancing an Inverted Pendulum


Dylan Bates

Abstract: Using the policy gradient algorithm, we train a single-hidden-layer neural network to balance a physically accurate simulation of a single inverted pendulum. The trained weights and biases can then be transferred to a physical agent, where the are robust enough to to balance a real inverted pendulum. This hybrid approach of training a simulation allows thousands of trial runs to be completed orders of magnitude faster than would be possible in the real world, resulting in greatly reduced training time and more iterations, producing a more robust model. When compared with existing reinforcement learning methods, the resulting control is smoother, learned faster, and able to withstand forced disturbances.