Q-Learning is an off-policy temporal difference control algorithm:
Q(S_t,A_t)←Q(S_t,A_t)+α[Rt+1+γmax_aQ(S_t+1,a)−Q(S_t,A_t)]
The learned action-value function Q directly approximates q_∗, the optimal action-value function, independent of the policy being followed.
Source: Sutton and Barto, Reinforcement Learning, 2nd Edition