36 lines
1.7 KiB
TeX
36 lines
1.7 KiB
TeX
\documentclass[a4paper, 14pt]{scrartcl}
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[english]{babel}
|
|
\usepackage{parskip}
|
|
\usepackage{microtype}
|
|
\usepackage[margin=1in]{geometry}
|
|
\usepackage{hyperref}
|
|
|
|
\title{Stock Trading with Reinforcement Learning}
|
|
\author{Marcel Zinkel}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\section{Introduction}
|
|
I want to build a reinforcement learning project about single asset stock trading. First I
|
|
want to start a simple environment with just the actions buy and sell. For the reward function I
|
|
also want to keep it simple at first by just using the profit as reward.
|
|
|
|
In contrast to the algorithms we already heard in the lector I have to try out deep
|
|
reinforcement learning algorithms because the price is a continuous variable. In theory, you could
|
|
model the price with a specific resolution with many states. However, this can very quickly become
|
|
impractical for classic reinforcement learning methods. Also, deep reinforcement learning can
|
|
recognize pattern to act good in previously unseen states.
|
|
|
|
I want to try out different reinforcement learning algorithms to see with works best for the trading
|
|
environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and
|
|
uses Elpsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay.
|
|
Because DQN often overestimates the Q-values. It uses two networks for updating the policy. The
|
|
online network selects the action with the highest Q-value and the target network evaluates the
|
|
action. This causes more stable and better learning. I will try, if Double DQN will improve the
|
|
results. Then I want to try out the Proximal Policy Optimization algorithm.
|
|
|
|
I find it also very interesting, if providing the RL agent with additional information.
|
|
\end{document}
|