Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Devdhar Patel, Terrence Sejnowski, Hava Siegelmann

2023-05-30Reinforcement Learning Continuous Control OpenAI Gym reinforcement-learning

Abstract

The current reinforcement learning framework focuses exclusively on performance, often at the expense of efficiency. In contrast, biological control achieves remarkable performance while also optimizing computational energy expenditure and decision frequency. We propose a Decision Bounded Markov Decision Process (DB-MDP), that constrains the number of decisions and computational energy available to agents in reinforcement learning environments. Our experiments demonstrate that existing reinforcement learning algorithms struggle within this framework, leading to either failure or suboptimal performance. To address this, we introduce a biologically-inspired, Temporally Layered Architecture (TLA), enabling agents to manage computational costs through two layers with distinct time scales and energy requirements. TLA achieves optimal performance in decision-bounded environments and in continuous control environments, it matches state-of-the-art performance while utilizing a fraction of the compute cost. Compared to current reinforcement learning algorithms that solely prioritize performance, our approach significantly lowers computational energy expenditure while maintaining performance. These findings establish a benchmark and pave the way for future research on energy and time-aware control.

Results

Task	Dataset	Metric	Value	Model
OpenAI Gym	HalfCheetah-v2	Action Repetition	0.1805	TLA
OpenAI Gym	HalfCheetah-v2	Average Decisions	831.42	TLA
OpenAI Gym	HalfCheetah-v2	Mean Reward	9571.99	TLA
OpenAI Gym	InvertedDoublePendulum-v2	Action Repetition	0.7522	TLA
OpenAI Gym	InvertedDoublePendulum-v2	Average Decisions	247.76	TLA
OpenAI Gym	InvertedDoublePendulum-v2	Mean Reward	9356.67	TLA
OpenAI Gym	Pendulum-v1	Action Repetition	0.7032	TLA
OpenAI Gym	Pendulum-v1	Average Decisions	62.31	TLA
OpenAI Gym	Pendulum-v1	Mean Reward	-154.92	TLA
OpenAI Gym	Ant-v2	Action Repetition	0.1268	TLA
OpenAI Gym	Ant-v2	Average Decisions	860.21	TLA
OpenAI Gym	Ant-v2	Mean Reward	5163.54	TLA
OpenAI Gym	Walker2d-v2	Action Repetition	0.4745	TLA
OpenAI Gym	Walker2d-v2	Average Decisions	513.12	TLA
OpenAI Gym	Walker2d-v2	Mean Reward	3878.41	TLA
OpenAI Gym	Hopper-v2	Action Repetition	0.5722	TLA
OpenAI Gym	Hopper-v2	Average Decisions	423.91	TLA
OpenAI Gym	Hopper-v2	Mean Reward	3458.22	TLA
OpenAI Gym	MountainCarContinuous-v0	Action Repetition	0.914	TLA
OpenAI Gym	MountainCarContinuous-v0	Average Decisions	10.6	TLA
OpenAI Gym	MountainCarContinuous-v0	Mean Reward	93.88	TLA
OpenAI Gym	InvertedPendulum-v2	Action Repetition	0.8882	TLA
OpenAI Gym	InvertedPendulum-v2	Average Decisions	111.79	TLA
OpenAI Gym	InvertedPendulum-v2	Mean Reward	1000	TLA

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Abstract

Results

Related Papers

Optimizing Attention and Cognitive Control Costs Using Temporally-Layered Architectures

Abstract

Results

Related Papers