From a14a8631079e0250fe509ac5174399e6a93c8f2a Mon Sep 17 00:00:00 2001 From: MrGeorgen Date: Tue, 10 Jun 2025 22:46:43 +0200 Subject: [PATCH] Introduction finnished --- project_proposal.tex | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/project_proposal.tex b/project_proposal.tex index 3e3f226..a890a0a 100644 --- a/project_proposal.tex +++ b/project_proposal.tex @@ -11,13 +11,14 @@ \begin{document} \maketitle +\tableofcontents \section{Introduction} I want to build a reinforcement learning project about single asset stock trading. First I want to start a simple environment with just the actions buy and sell. For the reward function I also want to keep it simple at first by just using the profit as reward. -In contrast to the algorithms we already heard in the lector I have to try out deep +In contrast to the algorithms we already heard in the lecture, I have to try out deep reinforcement learning algorithms because the price is a continuous variable. In theory, you could model the price with a specific resolution with many states. However, this can very quickly become impractical for classic reinforcement learning methods. Also, deep reinforcement learning can @@ -25,11 +26,29 @@ recognize pattern to act good in previously unseen states. I want to try out different reinforcement learning algorithms to see with works best for the trading environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and -uses Elpsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay. -Because DQN often overestimates the Q-values. It uses two networks for updating the policy. The +uses Epsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay. + +Because DQN often overestimates the Q-values, I want to try out a variation of DQN, called Double +DQN. It uses two networks for updating the policy. The online network selects the action with the highest Q-value and the target network evaluates the action. This causes more stable and better learning. I will try, if Double DQN will improve the -results. Then I want to try out the Proximal Policy Optimization algorithm. +results. At last, I want to try out the Proximal Policy Optimization algorithm. -I find it also very interesting, if providing the RL agent with additional information. +After implementing these different algorithms, I need to train these and compare the +results. + +I find it also very interesting, if providing the RL agent with additional information then just the +price, positively impacts the results. For example, I can add technical indicators, market volume or +an online news score about the company. The last one is probably a bit difficult because you need a +LLM which gives web scrapped articles a score how good the news is for a company. After adding this +information, I need to reevaluate which algorithm is the best. + +\section{Libraries and Tools} + +\section{Development plan} + +\section{Availability} +I am on vacation from the 04.08 to 13.08. On the 15. I am on an event, but I have time on the 14. +From the 18. onwards I am available for the next couple of weeks. I look forward to the +presentation, and thank you for giving me the additional time. \end{document}