93 lines
4.8 KiB
TeX
93 lines
4.8 KiB
TeX
\documentclass[a4paper, 14pt]{scrartcl}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[english]{babel}
|
|
\usepackage{parskip}
|
|
\usepackage{microtype}
|
|
\usepackage[margin=1in]{geometry}
|
|
\usepackage{hyperref}
|
|
|
|
\title{Stock Trading with Reinforcement Learning}
|
|
\author{Marcel Zinkel}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
\tableofcontents
|
|
|
|
\section{Introduction}
|
|
I want to build a reinforcement learning project about single asset stock trading. First I
|
|
want to start a simple environment with just the actions buy and sell. For the reward function I
|
|
also want to keep it simple at first by just using the profit as reward.
|
|
|
|
In contrast to the algorithms we already heard in the lecture, I have to try out deep
|
|
reinforcement learning algorithms because the price is a continuous variable. In theory, you could
|
|
model the price with a specific resolution with many states. However, this can very quickly become
|
|
impractical for classic reinforcement learning methods. Also, deep reinforcement learning can
|
|
recognize pattern to act good in previously unseen states.
|
|
|
|
I want to try out different reinforcement learning algorithms to see with works best for the trading
|
|
environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and
|
|
uses Epsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay.
|
|
|
|
Because DQN often overestimates the Q-values, I want to try out a variation of DQN, called Double
|
|
DQN. It uses two networks for updating the policy. The
|
|
online network selects the action with the highest Q-value and the target network evaluates the
|
|
action. This causes more stable and better learning. I will try, if Double DQN will improve the
|
|
results. At last, I want to try out the Proximal Policy Optimization algorithm.
|
|
|
|
After implementing these different algorithms, I need to train these and compare the
|
|
results.
|
|
|
|
I find it also very interesting, if providing the RL agent with additional information then just the
|
|
price, positively impacts the results. For example, I can add technical indicators, market volume or
|
|
an online news score about the company. The last one is probably a bit difficult because you need a
|
|
LLM which gives web scrapped articles a score how good the news is for a company. After adding this
|
|
information, I need to reevaluate which algorithm is the best.
|
|
|
|
\section{Libraries and Tools}
|
|
The project will be implemented in Python using \texttt{gym-anytrading} to build the trading
|
|
environment. For initial experiments, I will use the built-in datasets from
|
|
\texttt{gym\_anytrading.datasets} such as \texttt{STOCKS\_GOOGL}, and later switch to real
|
|
historical stock data via \texttt{yfinance}.
|
|
|
|
The reinforcement learning algorithms will be implemented using the \texttt{stable-baselines3}
|
|
library. I will start with the standard DQN algorithm and experiment with different epsilon decay
|
|
strategies. Since \texttt{stable-baselines3} does not directly support Double DQN, I plan to modify
|
|
the DQN implementation myself. Specifically, I will adjust the target calculation so that the action
|
|
is selected using the online network but evaluated using the target network, as required in Double
|
|
DQN. This will allow me to better understand the internal workings of the algorithm and directly
|
|
control its behavior.
|
|
|
|
In addition to DQN and Double DQN, I will also train PPO using the standard implementation in
|
|
\texttt{stable-baselines3}.
|
|
|
|
After training, I will evaluate all models using backtesting and performance metrics like total
|
|
profit, Sharpe ratio, and maximum drawdown. Later, I plan to extend the observation space with
|
|
technical indicators, volume data, or sentiment features. For technical indicators, I will use the
|
|
\texttt{pandas-ta} library since it is easy to install, well integrated with \texttt{pandas}, and
|
|
provides a wide range of indicators sufficient for prototyping and research. Alternatively,
|
|
\texttt{TA-Lib} is an option if higher performance is needed, but it has more complex installation
|
|
requirements.
|
|
|
|
After adding these features, I will retrain the models and compare their performance again.
|
|
|
|
|
|
\section{Development plan}
|
|
Depending on the exact time my presentation will be scheduled, I have about 9-10 weeks of time.
|
|
|
|
\subsection{Week 1--3}
|
|
I want to integrate the DQN algorithm as an example and train it already with historical data.
|
|
|
|
\subsection{Week 4--6}
|
|
I plan to implement the other RL algorithms and the variations and evaluate which works best. Also
|
|
change the reward function.
|
|
|
|
\subsection{Week 7 to the presentation}
|
|
Add the technical indicators and market volume to the environment. If I have too much time left, I can
|
|
try news analysis.
|
|
|
|
\section{Availability}
|
|
I am on vacation from the 04.08 to 13.08. On the 15. I am on an event, but I have time on the 14.
|
|
From the 18. onwards I am available for the next couple of weeks. I look forward to the
|
|
presentation, and thank you for giving me the additional time.
|
|
\end{document}
|