Start building your own chatbot now!


If you haven’t yet read Part 1 or Part 2 of our reinforcement learning series, you can check them out here and here.

Part 1: You will learn about key concepts of reinforcement learning that will help you better understand reinforcement learning.

Part 2: It will take you through comparisons and specific considerations for reinforcement learning algorithms.  

In this article, we are going to celebrate what we’ve learned about reinforcement learning! We will take a look into some of the cool things people have done with reinforcement learning, some of the major obstacles that remain open challenges in reinforcement learning, and discuss some resources you could get started with if you wanted to start working with reinforcement learning yourself! 

Who knows… Maybe you’ll create the next AI system to defeat humans at games like Google did with AlphaGo!

Reinforcement learning breakthroughs 

With all our reinforcement learning knowledge in hand, we now have a good basis for how reinforcement learning works and some of the factors that developers must look at when deciding how to make their RL application. Let’s go through what kind of cool things has reinforcement learning achieved.

Reinforcement learning beats humans at Dota 2

OpenAI has developed a set of five neural networks that have learned to coordinate with one another, defeating real humans at the RTS game, Dota 2. On August 5th, 2018, the team of five neural networks competed against real, human, competitive players and won two out of three games. A huge accomplishment in game AI! 

OpenAI’s Five aren’t done yet either. This first competition they held was just the start of what they really hope to do: Compete at an international and professional level. They ended up doing this in late August, which they won one round and then lost against two other professional human competitors. A loss, but in a way, also still a win!

Reinforcement learning for hyperparameter tuning

Google has developed an approach to hyperparameter tuning using reinforcement learning that they call AutoML. They set up a problem, and evolve new neural networks based on potential network mutations (actions) and get feedback in the form of new network performance. 

Tweaking hyperparameters to get the best performance out of machine learning models is really hard. Google’s AutoML service can this for us using what we know about reinforcement learning to come up with better parameters and faster.

Bonsai for industrial applications

Bonsai was a cool startup that was recently acquired by Microsoft that focused on using RL in industrial and enterprise domains. Their focus was in the improvement of control systems and real time decisions that would increase automation and efficiency in realms such as robotics, manufacturing, energy, and more.  

Bonsai’s idea was that we could train industrial grade machinery in simulation using machine learning. By doing so, we mitigate the risk of breaking anything which could cost a company a lot of money. to understand trading strategies is a cool group that is leveraging RL to better reason about and understand trading algorithms. They have a lofty mission of using RL to help replace humans from investment management to help to cut down costs. 

Using RL to come up with good trading strategies… Sounds a lot like our example with the stock market. Do you think they frame their tasks as episodic or continuous?

DeepMind reduces cooling costs

Using RL, Google’s DeepMind helped to reduce the cost to cool its data centers by 40%. 

Think about how much 40% is at a Google-level scale… Oh my!

Challenges of reinforcement learning

There’s no denying that reinforcement learning can do a lot of cool things. It provides a new way of thinking about machine learning; it’s a different way to approach a machine learning problem.

That doesn’t mean it is the best way to approach every problem. Reinforcement learning can sometimes be the hardest way to solve a problem. We can best understand this by looking at some of the obstacles that deter applications from being built around RL. 


Data is critical for machine learning. Full stop. RL requires enormous amount of data to be functional. Think of our agent playing through Mario. It must play the game over and over again to learn how to do even the most basic tasks. Without all that gameplay data, our agent would never learn to play the game, let alone play the game well. This is an issue, particularly when data is hard to obtain. 

Data is a big issue for all machine learning for sure. But where for supervised tasks, sometimes data is simply an input and label pair, RL tasks oftentimes require much more complex data in order to teach systems to do what we wish.


RL algorithms need to have goals. Since they are task-driven, they always need to strive towards that goal whether that’s to earn the most money trading or to beat the level as fast as it can. In complex tasks, the question of “what is the goal?” quickly becomes harder and harder to answer. If the objective is not properly thought out, an agent may gravitate towards doing something that you might not intend for it to do.  

Think of a hypothetical algorithm placed in a robot that is tasked with keeping a human safe. Let’s say it runs a simulation and concludes that the best way to keep the human safe, is to eradicate all other human life and to sedate the human in question. That’s not at all what we wanted to begin with, but that’s what the algorithm calculated would best keep that person safe for as long as possible based on the way its goal, policy, and value function were defined. Therefore, goal definition is critical.   

Making sure our algorithms and agents do what we want and expect them to do is critical for deploying systems in the real world. These are issues that touch security, ethics, safety, and more. 

Complex tasks in sparse environments  

This issue inherits from the worst-case scenarios of the last two. How do we take an agent that needs to learn to do something very complex in an environment where it rarely receives a reward signal? There are many approaches to solving this issue, such as creating a complex policy to handle complex tasks, or breaking complex tasks into smaller, more obvious tasks (see OpenAI with Dota 2 where they formulate small rewards agents can received that inherently result in the large reward that is desired). This is still a huge area of research. 

Think about the task of trying to teach a robot how to physically play the piano. This is an incredibly complex task that doesn’t necessarily feature a lot of feedback that can be turned into a reward signal. This would require some major goal engineering, which ties back into our previous issue.

Large number of states and actions 

In Mario, the number of actions an agent can take is limited. In the real world, the number of actions an agent can take is infinite. So are the number of states of the environment that can be observed. How does an agent handle this? How can an algorithm mathematically represent this? These are huge questions, big areas of research, and critical things that need to be better understood to make complex agents that can interact in the real world.  

The second we try to deploy an agent into the real world, the stakes are higher, and the problem becomes exponentially more difficult. Even teaching a robot to walk using RL can become very hard.

By now, you may be thinking “Wow, there’s so many cool things that reinforcement learning can do and so many cool problems still left to solve. How can I get started?”

Resources to start learning RL

With that in mind, I pulled some resources that I think would be a good start:

  • Reinforcement Learning: An Introduction – If you are up for some heavy reading, this is a good book to dive into to really break down the theoretical components behind reinforcement learning. It’s written by Richard Sutton and Andrew Barto (who have done a good deal of work in RL) and is really nice (I’m currently working through it myself).  
  • University College London’s Reinforcement Learning Course – This is a course (largely based on the previous book) that is good to work through. It features slides and video lectures too! 
  • UC Berkley – CS 294 – These are the course videos from UC Berkley’s course on reinforcement learning. 
  • Udacity’s Deep Reinforcement Learning Course – Feeling like you want to get more hands on? Do you learn better by doing? Then maybe trying out Udacity’s Deep Reinforcement Learning Course might be more your speed! 
  • Reinforcement Learning GitHub Repo – This repo has a collection of reinforcement learning algorithms implemented in Python. But more than that, it takes the book by Sutton and Barto as well as the UCL videos and combines them into a bit of a learning plan with some exercises to guide how you might approach using the two resources. If that sounds more like your speed, you should check it out! 


It’s my belief that reinforcement learning is going to be the technique that brings forth a new revolution in machine learning, creating truly intelligent applications that use techniques from supervised and unsupervised learning to observe the environment that the agent is acting in. If reinforcement learning is in the future, it is going to be a bright future! 

Ask your questions on SAP Answers or get started with SAP Conversational AI!

Follow us on