Classification of Offline Reinforcement Learning

Offline Reinforcement Learning

Conservative value-based approaches

  • [CQL]
    • Kumar, A., Zhou, A., Tucker, G., and Levine, S. Conservative Q-learning for offline reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  • [COMBO]
    • Yu, T., Kumar, A., Rafailov, R., Rajeswaran, A., Levine, S., and Finn, C. COMBO: conservative offline model-based policy optimization. In Advances in Neural Information Processing Systems (NeurIPS), pp. 28954–28967, 2021.
  • [SAC-N]
    • An, G., Moon, S., Kim, J., and Song, H. O. Uncertainty-based offline reinforcement learning with diversified q-ensemble. In Advances in Neural Information Processing Systems (NeurIPS), pp. 7436–7447, 2021.
  • []
    • Lyu, J., Ma, X., Li, X., and Lu, Z. Mildly conservative Q-learning for offline reinforcement learning. arxiv Preprint arxiv:2206.04745, 2022.
  • [RORL]
    • Yang, R., Bai, C., Ma, X., Wang, Z., Zhang, C., and Han, L. RORL: robust offline reinforcement learning via conservative smoothing. arxiv Preprint arxiv:2206.02829, 2022.

Regularized policy-based approaches

  • [BCQ]
    • Fujimoto, S., Meger, D., and Precup, D. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning (ICML), pp. 2052–2062, 2019.
  • [AWAC]
    • Nair, A., Dalal, M., Gupta, A., and Levine, S. Accelerating online reinforcement learning with offline datasets. arxiv Preprint arxiv:2006.09359, 2020.
  • []
    • Kostrikov, I., Fergus, R., Tompson, J., and Nachum, O. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning (ICML), pp. 5774–5783, 2021a.
  • [TD3+BC]
    • Fujimoto, S. and Gu, S. S. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), pp. 20132–20145, 2021.
  • [SPOT]
    • Wu, J., Wu, H., Qiu, Z., Wang, J., and Long, M. Supported policy optimization for offline reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), pp. 31278–31291, 2022.
  • [PRDC]
    • Yuhang Ran, Yi-Chen Li, Fuxiang Zhang, Zongzhang Zhang, Yang Yu, Policy Regularization with Dataset Constraint for Offline Reinforcement Learning, ICML2023, 2023