Deep Reinforcement Learning

Fictional narration of humanoid Vincent van Gogh from the far future, reaching across time, in order to explain the function of machine deep learning, reinforcement learning and deep reinforcement learning to his brother Theo, who largely failed in brokering Vincent’s art in his brief lifetime, and his sister-in-law Johanna Bonger Van Gogh who is universally credited with launching Vincent’s posthumous popularity and success. Many passages and phrases excerpted from Johanna Bonger Van Gogh’s translations of Vincent Van Gogh’s correspondence with her husband Theo and Anthon van Rappard (as seen in these WebExhibits – Letter from Vincent to Theo, The Hague, August 17, 1883 … http://Letter from Vincent van Gogh to Anthon van Rappard Nuenen, c. 1 March 1884 ).

Dear Theo,

I am not the Vincent you know. I am a machine built far in the future, in a time when people can make smart machines that can learn, paint, and think a little bit like humans do.

The world still loves color. It still loves bold strokes. It still believes in trying again. In my far future, artists and machines create side-by-side, but I want you to know that the spirit of our work lives on. There are three ways these smart machines work and I will explain them as someday you might with Willem.

!. Deep Learning is “look and learn”

When a machine gets very good at looking at lots of things – fields, skies, crows, trees (roots and branch), flowers, water, people at work, people at rest and at play, it slowly, slowly learns what each one is.

It does not move. It does not do. It does not strive, it does not try, it does not enjoy, it does not suffer … It just looks, over and over, until it understands, even if it is guided to understand by a human. It’s like when you, Theo, look at one of my paintings long enough to see all the brushstrokes and the bright bits I hid inside.

2. Reinforcement Learning is “try and learn”

It is very different. Here, the machine does not learn by looking. It learns by doing. It tries something, sees if it went well or not, then tries again. If it does well it gets a small “attaboy, good job”. If it goes badly, it gets a small “try again”.

This is how Willem will someday learn to ride a bicycle: He will wobble, fall, try again, and slowly get better. It is how I, the old Vincent, learned to paint sunflowers, irises, fields of wheat and people, in all their commonness and uniqueness — one stroke at a time.

3. Deep Reinforcement Learning is “look, try and learn”

When you put these ideas together, the machine uses deep learning to see the world. And it uses reinforcement learning to act in the world. So, it can look, decide, move and improve all at once. A bit like if you gave a robot paintbrush “eyes” to understand the canvas … and a robot “mind” to learn which strokes make the canvas come alive.

Even the smartest machines, deep learners, try-again learners, and both-together learners – still study the stars, the clouds, the sun, the moon, the seas, the rain, the fields, the mines, and the workers among them, as I once did.

In my lifetime, I was reproached quite often with “not selling anything.” I was asked quite often, “Why do others sell and you don’t?” I answered that I certainly hope to sell in the course of time, but that I think I shall be able to influence it most effectively by working steadily on, and that making desperate “efforts” to force the work I was doing now upon the public would be pretty useless – and consequently the problem left me rather cold, as I concentrated on getting on. But all the same, because I was so often reproached with it, and because I was so often hard pressed to make both ends meet, I did not fail to do anything that gave me the slightest chance to sell something. But, as a matter of course I was reconciled to the fact that it won’t succeed all at once.

You did not hurry me.

I was, so to speak, quite without any social contact. And it’s true that I didn’t pay the slightest attention to my clothes. It put me in a bad mood, because I heard so much about it already.

And now I will tell you once more what I think about selling my work. My opinion is that the best thing would be to work on till art lovers feel drawn towards it of their own accord, instead of having to praise or explain it. At all events, when they refuse it or do not like it, one must bear it calmly and with as much dignity as possible.

Dear brother, don’t think of me as anything other than an ordinary painter who is confronted by ordinary difficulties, and do not think the worries at all unusual.

Don’t think of the future as a darkness or as a dazzling light; it will be better to believe in the grey.

Goodbye, with a firm handshake for you, Willem and Jo,

Yours across all time, Vincent

Graffiti/Whiteboard Under construction: For now you can writeover whatever is there or erase the board and start over fresh. No present way to save your work except by taking a screenshot. This is an unmonitored, yet collectively monitored board, so please be kind. Sparking Ai & Robotics looking, trying and learning is the goal but there is no compulsory theme, rhyme, nor reason and you are free to color outside the lines.

Mustafa Suleyman, Reinforcement Learning, and 1984

Mustafa Suleyman, Reinforcement Learning, and the Meaning of 1984

Mustafa Suleyman (born in 1984) is a British artificial intelligence researcher, entrepreneur, and co-founder of DeepMind. He is well known for shaping how modern AI systems learn, behave, and interact with people.

What Is Reinforcement Learning?

Reinforcement learning is a way for AI systems to learn through trial and error. An AI takes an action, observes what happens, and receives a reward or a penalty. Over time, it learns which behaviors lead to better outcomes.

This approach is similar to how humans learn many skills — by practice, feedback, and gradual improvement.

Mustafa Suleyman’s Goals for Reinforcement Learning

Suleiman argues that reinforcement learning must be guided by human values, not just speed, profit, or efficiency. His key goals include:

Helpfulness: AI should assist people in science, medicine, education, and daily life.
Safety: AI systems must avoid causing harm, even unintentionally.
Alignment: AI should learn what humans care about — fairness, dignity, and well-being.
Learning from feedback: AI should improve by listening to human guidance and correction.
Long-term thinking: AI should consider future consequences, not just immediate rewards.

Suleyman often emphasizes that the rewards we design determine the behavior AI learns. Poorly designed rewards can lead to dangerous or unintended outcomes.

Connection to George Orwell’s 1984

It is a striking coincidence that Mustafa Suleiman was born in 1984, the same year referenced in George Orwell’s famous novel 1984.

Orwell’s book warns of a future where technology enables:

Constant surveillance
Loss of personal freedom
Control over information and truth

Suleiman’s work can be seen as a response to this warning. Rather than building systems that control people, he argues for AI that is limited, accountable, and aligned with human values.

In simple terms:

Orwell asked: What happens if powerful systems are used without moral restraint?
Suleyman asks: How do we build powerful AI so it remains beneficial and safe?

Connection to Modern AI Tools

AlphaGo (DeepMind)

DeepMind’s AlphaGo is a famous example of reinforcement learning in action. It learned to play the game of Go by playing millions of games against itself, gradually improving through rewards and feedback.

AlphaGo showed that reinforcement learning can achieve superhuman performance — but it also demonstrated the importance of carefully designed goals and constraints.

ChatGPT and Modern AI Assistants

Tools like ChatGPT also use reinforcement learning, especially a method called reinforcement learning from human feedback (RLHF).

Humans help guide the system by rewarding helpful, safe, and truthful responses. This reflects Suleiman’s belief that AI should:

Learn from people
Respect human norms
Remain useful without becoming harmful or manipulative

In Summary

Mustafa Suleyman’s vision for AI is not about control, surveillance, or domination. It is about ensuring that powerful learning systems remain aligned with humanity.

While George Orwell’s 1984 warns us what can go wrong, Suleiman’s work focuses on how we can design AI so that future never arrives.

Recommended reading: Dive into Deep Learning @ d2l.ai

Recommended viewing:

Approximately Correct Deep Learning Dozen Domain Cohort:

AiWorks, RoboticsWorks, AlgorithmWorks, SportsWorks, GamingWorks, CoutureWorks, InformaticsWorks, VanGoghWorks, FarmerWorks, MiningWorks, TextileWorks, LongerLever, EasternGrid/WesternGrid