Agent based reinforcement learning with decision-making

The decision-making process of the negotiation agent has been developed based on the Reinforcement Learning

The process of negotiation can be modeled as a Markov decision process(MDP) with a stochastic policy on a discrete domain. Therefore, Reinforcement Learning(RL) is used to resolve the decision-making process of the Negotiation Agent. Reinforcement learning is the learning of mapping from states to actions in order to maximise scalar reward or reinforcement signal. CodeGen’s Negotiation Agents uses RL to maximise the reward that negotiation agent receives for a particular state transition. The current state, previous state, and the action selected by an agent for a state transition are taken into consideration to set a reward for an agent.

Negotiation agents are implemented using Jason and some decisions are directly executed from the strategies built in the Jason Agent. When there is a need to make much complex decisions, it is directed to the RL decision-making process to select a suitable action.

As in most forms of machine learning, in RL the learner is unaware of which action to take beforehand. Instead, the learner should discover the actions that yield the highest reward by trying them. Since RL is a trial and error approach, the agent should be able to explore all the possible actions for a particular situation. This scenario makes the negotiation agent capable of selecting an action from the action pool while initializing.

Action selection is done using Upper Confidence Bound (UCB) policy which assures fair execution of actions. The reward level is selected, once the agent approaches a new state, as a result of the selected action in the previous state. A numeric value is assigned to the reward. Our negotiation agent receives rewards after each state transition (immediate reward) and at the end of a chat sequence (overall reward).

In RL, the negotiation agent is as important as setting rewards. RL agent uses off-policy learning which retrieves episodic data recorded from the actual conversation and updates its model. RL model used here is a Q network which is updated by the SARSA(Lambda) learning algorithm with eligibility traces. A separate RL model is maintained for each user type which was recognised in buyer behavioural pattern clustering. A learning scheduler is executed to fetch new data from the Elasticsearch database and update a model that matches a particular user type. The data that are fed to this model is used for future decision-making of the negotiation agent.

CodeGen’s Agent Systems are based on, a JADE centralised main agent container and Jason technologies. Endpoint seller agents are created using Jason, and some different other agents are registered to the system based on different purposes of the agent system.

Agents communicate with each other to make accurate decisions on critical situations. With the help of a revenue management agent, the negotiation agent is capable of making decisions on prices considering past records such as current demand and demand fluctuations. The system is also integrated with a contract-net for proper and efficient communication between agents.

0 comments on Agent based reinforcement learning with decision-making

Post a comment

Your email address will not be published. Required fields are marked *