In search of a problem to solve
Kickoff for my thesis
So the first thing I need to do is find a topic to research, build and write about. And while it may sound weird this means finding a problem to solve (and ideally a problem to fall in love with) because only then will I be motivated and only then will the work be at least a bit meaningful/useful.
Since I already worked at a lab that does a lot of research on RL (at TU Darmstadt) I quickly decided that I want to stay in that domain. My other big interest lies in Continual Machine Learning (also often called online ML or lifelong ML). By bachelor's thesis was about online ML for time series data, and I also published a paper based on that thesis. The obvious intersection of these topics would be Continual Reinforcement Learning (CRL), so that is where I am now looking for problems.
My initial thoughts
This subsection will be a rather entangled mess of ideas which popped up in my head while just thinking about the domain of CRL, without doing much research on it.
Before starting to stress about my thesis I already read a lot of content on the comparison of human intelligence (the brain) and artificial learning algorithms. Those were mainly RL-focused and I have to say that I was just fascinated. One book I can particularly recommend is A Brief History of Intelligence by Max Bennett1.
When I finally started to read just a bit about CRL it was immediately clear that the main problem in the field is Catastrophic Forgetting and the Stability/Plasticity dilemma. Agents perform fairly well on some base environment/task, and when trained further on some modified task they tend to forget skills which they had before. Now they perform well on the modified task, but are underperforming in the original base task.
Further there is a lack of standardized benchmarks for CRL and the metrics that come along with it (e.g. retention). While there are some different benchmarks there is no clear collection of baselines and comparisons between algorithms are a bit blurry in a lot of papers. Thats where I instantly saw a research (or tooling) gap that my thesis could fill. Create a unified protocol with which different CRL approaches can be systematically compared. Basically: use mods of environments not for looking at zero-shot generalization but instead for few-shot adaptation (continual learning).
I'm also interested in how the "difficulty" or "difference" of multiple environments/tasks affects the forgetting (retention) of an RL agent. One would assume that when the tasks sequence an agent gets tuned on gets progressively harder, it will retain less and less knowledge about the original task (linear decay) OR there will be one moment in time where the retention just takes one big dip. I dont know if there already is an answer to this question or if a systematic study on this was ever conducted.
This question then raises another question: How would one measure this difficulty or difference between two tasks? I dont think we can directly compare two environments and mathematically determine a scalar score. But there are a few ideas around for heuristics:
- e.g. Score / Return relative to some expert agent
- training/time it takes to reach a certain return
- ...
Maybe it's a good idea to conduct a user study on a preselected set of environments (and modifications) to obtain a kind of "human perceived difficulty" score. We could then either use this directly or search for a heuristic that fits this score best. Or maybe even train some predictor?
I also have to mention that I will focus on Atari environments for any of those ideas, since I am part of a team that builds JAX based Atari environments which yield crazy gains in terms of training speed (by leveraging GPUs and parallelism).
A rather different direction I also thought about would be change detection of drifting environments/tasks.
Next To-Do's
The blog is not only for tracking my thoughts and results of research/experiments but also for capturing what I should do next. Right now I have two main objectives to pursue.
Conduct thorough research on the topic of CRL to get a better understanding and to find the problem I want to work on.
For the problems I already identified (or just imagined?) I would like to run some initial experiments. Basically sanity checks to see if it is worth to pursue these leads further. These should give me some guidance and traction.
Get in Touch
If you have any feedback you want to share with me feel free to reach out at mail@sebastianwette.de. I would be more than happy to chat about it.
Not an affiliate link or paid in any way.↩