Introduction finnished

This commit is contained in:
2025-06-10 22:46:43 +02:00
parent db19bca50e
commit a14a863107

View File

@ -11,13 +11,14 @@
\begin{document} \begin{document}
\maketitle \maketitle
\tableofcontents
\section{Introduction} \section{Introduction}
I want to build a reinforcement learning project about single asset stock trading. First I I want to build a reinforcement learning project about single asset stock trading. First I
want to start a simple environment with just the actions buy and sell. For the reward function I want to start a simple environment with just the actions buy and sell. For the reward function I
also want to keep it simple at first by just using the profit as reward. also want to keep it simple at first by just using the profit as reward.
In contrast to the algorithms we already heard in the lector I have to try out deep In contrast to the algorithms we already heard in the lecture, I have to try out deep
reinforcement learning algorithms because the price is a continuous variable. In theory, you could reinforcement learning algorithms because the price is a continuous variable. In theory, you could
model the price with a specific resolution with many states. However, this can very quickly become model the price with a specific resolution with many states. However, this can very quickly become
impractical for classic reinforcement learning methods. Also, deep reinforcement learning can impractical for classic reinforcement learning methods. Also, deep reinforcement learning can
@ -25,11 +26,29 @@ recognize pattern to act good in previously unseen states.
I want to try out different reinforcement learning algorithms to see with works best for the trading I want to try out different reinforcement learning algorithms to see with works best for the trading
environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and
uses Elpsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay. uses Epsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay.
Because DQN often overestimates the Q-values. It uses two networks for updating the policy. The
Because DQN often overestimates the Q-values, I want to try out a variation of DQN, called Double
DQN. It uses two networks for updating the policy. The
online network selects the action with the highest Q-value and the target network evaluates the online network selects the action with the highest Q-value and the target network evaluates the
action. This causes more stable and better learning. I will try, if Double DQN will improve the action. This causes more stable and better learning. I will try, if Double DQN will improve the
results. Then I want to try out the Proximal Policy Optimization algorithm. results. At last, I want to try out the Proximal Policy Optimization algorithm.
I find it also very interesting, if providing the RL agent with additional information. After implementing these different algorithms, I need to train these and compare the
results.
I find it also very interesting, if providing the RL agent with additional information then just the
price, positively impacts the results. For example, I can add technical indicators, market volume or
an online news score about the company. The last one is probably a bit difficult because you need a
LLM which gives web scrapped articles a score how good the news is for a company. After adding this
information, I need to reevaluate which algorithm is the best.
\section{Libraries and Tools}
\section{Development plan}
\section{Availability}
I am on vacation from the 04.08 to 13.08. On the 15. I am on an event, but I have time on the 14.
From the 18. onwards I am available for the next couple of weeks. I look forward to the
presentation, and thank you for giving me the additional time.
\end{document} \end{document}