Offline RL Papers

Offline Reinforcement Learning Papers

# Date Book Presentation Paper Presentation Notice
01 03월 03일(월) - 강의 소개
02 03월 10일(월) - Deep Q Network (DQN)
- Double DQN (DDQN)
- Playing Atari with Deep Reinforcement Learning
- Deep Reinforcement Learning with Double Q-learning
03 03월 17일(월) - Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed Deep Deterministic Policy Gradient (TD3)
- Continuous Control with Deep Reinforcement Learning - Addressing Function Approximation Error in Actor-Critic Methods
04 03월 24일(월) - Soft Actor-Critic (SAC) - Soft Actor-Critic Algorithms and Applications
05 03월 31일(월) - Imitation Learning (IL) - A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges
Imitation Learning의 목표
  1. 시연을 통해 에이전트가 특정 작업이나 행동을 학습하도록 함.
  2. 시연 데이터는 **관찰(observations)**과 **행동(actions)** 간의 매핑을 학습하는 데 사용.
- 참고 자료
06 04월 07일(월) - Batch Constrained Q-learning (BCQ) ICML2019 Offline only
- Bootstrapping Error Accumulation Reduction (BEAR) NIPS2019 Offline only
- 논문 발표 01: Off-Policy Deep Reinforcement Learning without Exploration
- 논문 발표 02: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
07 04월 14일(월) - Conservative Q-Learning (CQL) NIPS2020 Offline and Offline-to-Online
- Policy in Latent Action Space (PLAS) CoRL2020 Offline only
- 논문 발표 03: Conservative Q-Learning for Offline Reinforcement Learning
- 논문 발표 04: PLAS: Latent Action Space for Offline Reinforcement Learning
08 04월 21일(월) - Critic Regularized Regression (CRR) NIPS2020 Offline only
- Advantage Weighted Actor-Critic (AWAC) Rejected from ICLR2021 Offline and Offline-to-Online
- 논문 발표 05: Critic Regularized Regression
- 논문 발표 06: AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
09 04월 28일(월) - TD3+BC NIPS2021 Offline only
- Implicit Q-Learning (IQL) ICLR2022 Offline and Offline-to-Online
- 논문 발표 07: A Minimalist Approach to Offline Reinforcement Learning
- 논문 발표 08: Offline Reinforcement Learning with Implicit Q-Learning
10 05월 05일(월) 공휴일 (휴강)
11 05월 12일(월) - ReBRAC NIPS2023 Offline and Offline-to-Online
- Policy Regularization with Dataset Constraint (PRDC) ICML2023
- 논문 발표 09: Revisiting the Minimalist Approach to Offline Reinforcement Learning
- 논문 발표 10: Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
12 05월 19일(월) - Supported Policy OpTimizatio (SPOT) NIPS2022 Offline-to-Online only
- Calibrated Q-Learning (Cal-QL) NIPS2023 Offline-to-Online only
- 논문 발표 11: Supported Policy Optimization for Offline Reinforcement Learning
- 논문 발표 12: Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
13 05월 26일(월) - SAC-N NIPS2021 Offline only
- Ensemble-Diversified Actor Critic (EDAC) NIPS2022 Offline only
- Large-Batch SAC (LB-SAC) NIPS2022 Offline only
- 논문 발표 13: Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
- 논문 발표 14: Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
14 06월 02일(월) - Decision Transformer (DT) Offline only
- 논문 발표 15: Decision Transformer: Reinforcement Learning via Sequence Modeling
15 06월 09일(월) - Gato
- 논문 발표 16: A Generalist Agent
16 06월 16일(월) 기말 고사