commit db19bca50e44e4209567da851a85b2a301b42d06 Author: MrGeorgen Date: Sun Jun 1 14:35:25 2025 +0200 intro project_proposal diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c554382 --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +/* +!/.gitignore +!*.tex +!*.sh +!*.sty diff --git a/project_proposal.tex b/project_proposal.tex new file mode 100644 index 0000000..3e3f226 --- /dev/null +++ b/project_proposal.tex @@ -0,0 +1,35 @@ +\documentclass[a4paper, 14pt]{scrartcl} +\usepackage[utf8]{inputenc} +\usepackage[english]{babel} +\usepackage{parskip} +\usepackage{microtype} +\usepackage[margin=1in]{geometry} +\usepackage{hyperref} + +\title{Stock Trading with Reinforcement Learning} +\author{Marcel Zinkel} + +\begin{document} +\maketitle + +\section{Introduction} +I want to build a reinforcement learning project about single asset stock trading. First I +want to start a simple environment with just the actions buy and sell. For the reward function I +also want to keep it simple at first by just using the profit as reward. + +In contrast to the algorithms we already heard in the lector I have to try out deep +reinforcement learning algorithms because the price is a continuous variable. In theory, you could +model the price with a specific resolution with many states. However, this can very quickly become +impractical for classic reinforcement learning methods. Also, deep reinforcement learning can +recognize pattern to act good in previously unseen states. + +I want to try out different reinforcement learning algorithms to see with works best for the trading +environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and +uses Elpsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay. +Because DQN often overestimates the Q-values. It uses two networks for updating the policy. The +online network selects the action with the highest Q-value and the target network evaluates the +action. This causes more stable and better learning. I will try, if Double DQN will improve the +results. Then I want to try out the Proximal Policy Optimization algorithm. + +I find it also very interesting, if providing the RL agent with additional information. +\end{document}