|
|
![]() FunctionConventional recommendation engines, especially those based on shopping basket analysis and collaborative filtering, are based on the principle that the items which should be recommended are those which users are most likely to choose (on the basis of analyses of user behaviour). This is not the best approach even in those cases where the principle has been empirically proven. The task of real-time analytics on the other hand is to recommend content to the user which matches optimised parameters as closely as possible. A real-time analytics system learns from the interplay between analysis and action and therefore requires a different quality of learning. Learning in real-time analytics is handled by reinforcement learning (RL), a method based on dynamic programming, a mathematical area used for optimised control. RL is used for the control of independent systems such as those used by robots and also for self-learning games like backgammon or more recently, chess. Reinforcement Learning is based primarily on the interplay between proven, good actions and new, unproven actions. The use of proven actions is known as "exploit" while the use of new, unproven actions is known as "explore". Reinforcement Learning provides the right interplay between "exploit"and "explore". Markov formulations are then used to select not only the strongest actions but also those which will maximise the presumed chain of following actions. The RL methods are long-term and permanent. Reinforcement learning is comparable with classic data mining in that it can learn from ordinary historic transactions offline. The main RL methods learn online, i.e. from close interaction with the user. Thanks to its uniform theoretical framework, RL can combine both types of learning: The initial learning model is created from historical transaction data in the off-line mode. The model is then continuously improved in the online mode. Advantages
An important add-on to the prudsys system is hierarchical reinforcement learning. Here learning takes place simultaneously on the multiple levels of a hierarchy. This increases learning speed and improves the interpretability of models. IntegrationReinforcement Learning forms the central framework of prudsys RDE. This contains a large number of off-line, online and batch online variants of RL processes and hierarchical RL add-ons. (The batch online mode is a user-modified online procedure on historic data). Here RL is not only used in the preparation of recommendations but is also employed in a wide range of functions for dynamic price optimization (Algorithms for dynamic planning and price optimization). All RL learning methods are included in the XELOPES library and can also be incorporated in a wide range of other applications. |
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||