Introduction finnished
This commit is contained in:
@ -11,13 +11,14 @@
|
|||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
\maketitle
|
\maketitle
|
||||||
|
\tableofcontents
|
||||||
|
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
I want to build a reinforcement learning project about single asset stock trading. First I
|
I want to build a reinforcement learning project about single asset stock trading. First I
|
||||||
want to start a simple environment with just the actions buy and sell. For the reward function I
|
want to start a simple environment with just the actions buy and sell. For the reward function I
|
||||||
also want to keep it simple at first by just using the profit as reward.
|
also want to keep it simple at first by just using the profit as reward.
|
||||||
|
|
||||||
In contrast to the algorithms we already heard in the lector I have to try out deep
|
In contrast to the algorithms we already heard in the lecture, I have to try out deep
|
||||||
reinforcement learning algorithms because the price is a continuous variable. In theory, you could
|
reinforcement learning algorithms because the price is a continuous variable. In theory, you could
|
||||||
model the price with a specific resolution with many states. However, this can very quickly become
|
model the price with a specific resolution with many states. However, this can very quickly become
|
||||||
impractical for classic reinforcement learning methods. Also, deep reinforcement learning can
|
impractical for classic reinforcement learning methods. Also, deep reinforcement learning can
|
||||||
@ -25,11 +26,29 @@ recognize pattern to act good in previously unseen states.
|
|||||||
|
|
||||||
I want to try out different reinforcement learning algorithms to see with works best for the trading
|
I want to try out different reinforcement learning algorithms to see with works best for the trading
|
||||||
environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and
|
environment. First I want to try out the Deep-Q-Network algorithm. It predicts the Q-function and
|
||||||
uses Elpsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay.
|
uses Epsilon-Greedy Exploration. I plan to try out different formulas for the epsilon decay.
|
||||||
Because DQN often overestimates the Q-values. It uses two networks for updating the policy. The
|
|
||||||
|
Because DQN often overestimates the Q-values, I want to try out a variation of DQN, called Double
|
||||||
|
DQN. It uses two networks for updating the policy. The
|
||||||
online network selects the action with the highest Q-value and the target network evaluates the
|
online network selects the action with the highest Q-value and the target network evaluates the
|
||||||
action. This causes more stable and better learning. I will try, if Double DQN will improve the
|
action. This causes more stable and better learning. I will try, if Double DQN will improve the
|
||||||
results. Then I want to try out the Proximal Policy Optimization algorithm.
|
results. At last, I want to try out the Proximal Policy Optimization algorithm.
|
||||||
|
|
||||||
I find it also very interesting, if providing the RL agent with additional information.
|
After implementing these different algorithms, I need to train these and compare the
|
||||||
|
results.
|
||||||
|
|
||||||
|
I find it also very interesting, if providing the RL agent with additional information then just the
|
||||||
|
price, positively impacts the results. For example, I can add technical indicators, market volume or
|
||||||
|
an online news score about the company. The last one is probably a bit difficult because you need a
|
||||||
|
LLM which gives web scrapped articles a score how good the news is for a company. After adding this
|
||||||
|
information, I need to reevaluate which algorithm is the best.
|
||||||
|
|
||||||
|
\section{Libraries and Tools}
|
||||||
|
|
||||||
|
\section{Development plan}
|
||||||
|
|
||||||
|
\section{Availability}
|
||||||
|
I am on vacation from the 04.08 to 13.08. On the 15. I am on an event, but I have time on the 14.
|
||||||
|
From the 18. onwards I am available for the next couple of weeks. I look forward to the
|
||||||
|
presentation, and thank you for giving me the additional time.
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|||||||
Reference in New Issue
Block a user