commit db19bca50e44e4209567da851a85b2a301b42d06
Author: MrGeorgen <moinl6162@gmail.com>
Date:   Sun Jun 1 14:35:25 2025 +0200

    intro project_proposal

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..c554382
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,5 @@
+/*
+!/.gitignore
+!*.tex
+!*.sh
+!*.sty
diff --git a/project_proposal.tex b/project_proposal.tex
new file mode 100644
index 0000000..3e3f226
--- /dev/null
+++ b/project_proposal.tex
@@ -0,0 +1,35 @@
+\documentclass[a4paper, 14pt]{scrartcl}
+\usepackage[utf8]{inputenc}
+\usepackage[english]{babel}
+\usepackage{parskip}
+\usepackage{microtype}
+\usepackage[margin=1in]{geometry}
+\usepackage{hyperref}
+
+\title{Stock Trading with Reinforcement Learning}
+\author{Marcel Zinkel}
+
+\begin{document}
+\maketitle
+
+\section{Introduction}
+I want to build a reinforcement learning project about single asset stock trading. First I
+want to start a simple environment with just the actions buy and sell. For the reward function I
+also want to keep it simple at first by just using the profit as reward.
+
+In contrast to the algorithms we already heard in the lector I have to try out deep
+reinforcement learning algorithms because the price is a continuous variable. In theory, you could
+model the price with a specific resolution with many states. However, this can very quickly become
+impractical for classic reinforcement learning methods. Also, deep reinforcement learning can 
+recognize pattern to act good in previously unseen states.
+
+I want to try out different reinforcement learning algorithms to see with works best for the trading
+environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and
+uses Elpsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay.
+Because DQN often overestimates the Q-values. It uses two networks for updating the policy. The
+online network selects the action with the highest Q-value and the target network evaluates the
+action. This causes more stable and better learning. I will try, if Double DQN will improve the
+results. Then I want to try out the Proximal Policy Optimization algorithm.
+
+I find it also very interesting, if providing the RL agent with additional information. 
+\end{document}