Offline RL Papers
Offline Reinforcement Learning Papers
# | Date | Book Presentation | Paper Presentation | Notice |
---|---|---|---|---|
01 | 03월 03일(월) | - 강의 소개 | ||
02 | 03월 10일(월) | - Deep Q Network (DQN) - Double DQN (DDQN) | - Playing Atari with Deep Reinforcement Learning - Deep Reinforcement Learning with Double Q-learning | |
03 | 03월 17일(월) | - Deep Deterministic Policy Gradient (DDPG) - Twin Delayed Deep Deterministic Policy Gradient (TD3) | - Continuous Control with Deep Reinforcement Learning - Addressing Function Approximation Error in Actor-Critic Methods | |
04 | 03월 24일(월) | - Soft Actor-Critic (SAC) | - Soft Actor-Critic Algorithms and Applications | |
05 | 03월 31일(월) | - Imitation Learning (IL) | - A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges | Imitation Learning의 목표 - 참고 자료
|
06 | 04월 07일(월) | - Batch Constrained Q-learning (BCQ) ICML2019 Offline only - Bootstrapping Error Accumulation Reduction (BEAR) NIPS2019 Offline only | - 논문 발표 01: Off-Policy Deep Reinforcement Learning without Exploration - 논문 발표 02: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction | |
07 | 04월 14일(월) | - Conservative Q-Learning (CQL) NIPS2020 Offline and Offline-to-Online - Policy in Latent Action Space (PLAS) CoRL2020 Offline only | - 논문 발표 03: Conservative Q-Learning for Offline Reinforcement Learning - 논문 발표 04: PLAS: Latent Action Space for Offline Reinforcement Learning | |
08 | 04월 21일(월) | - Critic Regularized Regression (CRR) NIPS2020 Offline only - Advantage Weighted Actor-Critic (AWAC) Rejected from ICLR2021 Offline and Offline-to-Online | - 논문 발표 05: Critic Regularized Regression - 논문 발표 06: AWAC: Accelerating Online Reinforcement Learning with Offline Datasets | |
09 | 04월 28일(월) | - TD3+BC NIPS2021 Offline only - Implicit Q-Learning (IQL) ICLR2022 Offline and Offline-to-Online | - 논문 발표 07: A Minimalist Approach to Offline Reinforcement Learning - 논문 발표 08: Offline Reinforcement Learning with Implicit Q-Learning | |
10 | 05월 05일(월) | 공휴일 (휴강) | ||
11 | 05월 12일(월) | - ReBRAC NIPS2023 Offline and Offline-to-Online - Policy Regularization with Dataset Constraint (PRDC) ICML2023 | - 논문 발표 09: Revisiting the Minimalist Approach to Offline Reinforcement Learning - 논문 발표 10: Policy Regularization with Dataset Constraint for Offline Reinforcement Learning | |
12 | 05월 19일(월) | - Supported Policy OpTimizatio (SPOT) NIPS2022 Offline-to-Online only - Calibrated Q-Learning (Cal-QL) NIPS2023 Offline-to-Online only | - 논문 발표 11: Supported Policy Optimization for Offline Reinforcement Learning - 논문 발표 12: Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning | |
13 | 05월 26일(월) | - SAC-N NIPS2021 Offline only - Ensemble-Diversified Actor Critic (EDAC) NIPS2022 Offline only - Large-Batch SAC (LB-SAC) NIPS2022 Offline only | - 논문 발표 13: Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble - 논문 발표 14: Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size | |
14 | 06월 02일(월) | - Decision Transformer (DT) Offline only | - 논문 발표 15: Decision Transformer: Reinforcement Learning via Sequence Modeling | |
15 | 06월 09일(월) | - Gato | - 논문 발표 16: A Generalist Agent | |
16 | 06월 16일(월) | 기말 고사 |