Search International and National Patent Collections
Some content of this application is unavailable at the moment.
If this situation persist, please contact us atFeedback&Contact
1. (WO2019063079) SYSTEM, DEVICE AND METHOD FOR ENERGY AND COMFORT OPTIMIZATION IN A BUILDING AUTOMATION ENVIRONMENT
Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

System, Device and Method for Energy and Comfort Optimization in a Building Automation Environment

Building automation systems include heating, ventilation and air conditioning (HVAC) systems, security systems, fire systems, or other systems. The Buildings automation systems are currently used to improve energy efficiency, reduce operating costs, and improve data access and analysis. Conventionally, the building automation systems are controlled by rule-based engines, which are believed to be easier to handle. However, this advantage is negated as the system evolves. When multiple rules are stacked upon each other, the system is prevented from scaling with dynamic changes to the environment. For example, the system may be less responsive in adapting to new types of data, such as data sourced from an upgraded sensor, or a new sensor of previously unutilized data. Rule-based systems can also fail to adapt to a changing domain, e.g., a new furniture layout, new lighting sources, changes to occupants' profile and seasonality factors.

To address the challenges of rule-based engines, machine learning techniques are used to define effective rules. For example, building automation systems use supervised learning algorithm to perform automated energy and comfort optimization.

Such building automation systems are disclosed in the publication US 2016/0223218 Al . The document provides a method of automating thermostats by utilizing parallel learning on a communication network. User comfort level is provided through Quality of Experience (QoE) assessments by constantly interacting with the building automation system via smart phone application, web application or through direct interaction with the system. However, the building automation system is limited to thermostats and HVAC systems. Further, the method to perform QoE assessment in the present document is only capable of supporting automation of thermostats in similar environments .

Publication US 9078299 B2 provides a predictive daylight harvesting system to maximize energy savings while maintaining the occupants' requirements for the environmental conditions. The document proposes a fuzzy-logic inference engine for learn-ing the patterns from various input sources such as weather, occupancy, etc. The fuzzy-logic inference engine uses rules and fuzzy classes defined by a domain expert to identify patterns. Therefore, the fuzzy-logic inference engine requires a lot of human intervention.

In light of the above there exists a need to accurately optimize energy and comfort in a building automation environment without manual supervision. Therefore, it is an object of the present invention to provide a method, a device and a system to effectively optimize energy and comfort based on user behaviour in a non invasive manner.

The method, device and system according to the present invention achieve the aforementioned object by generating a building model for the building automation environment. The building model is represented by a set of states comprising energy profiles and comfort profiles. Further, reward vectors are determined for the set of states with the energy profiles and the set of states with the comfort profiles. The reward vectors are determined based on probabilities of transition from a current state to remaining states of the set of states. The reward vectors are used to determine an optimization policy for energy and comfort. Based on the optimization policy an action is performed whereby the building automation environment transitions to a new state in the set of states. Accordingly, optimization of the energy and comfort is derived by iteratively determining the reward vectors and the optimiza-tion policy for the new state.

As used herein, "building automation environment" refers to an environment created by the combination of one or more buildings and its mechanical, structural and electrical components, its occupants along with a computer-based system installed in the buildings. The system in the present invention controls and monitors the buildings' mechanical, structural and electrical equipment such as ventilation, lighting, power systems, fire systems, and security systems and monitors occupant behaviour and activity to optimize energy and comfort.

The term "occupant" is not limited to living entities such as human, animal occupants and can include non-living entities such as motors and robots.

The term "environment data" includes in its meaning sensor data, energy data, occupant data and ambient data associated with the building automation environment. For example, room temperature, humidity, air quality, electricity consumption, available renewable energy, possible electricity outage, occupancy sensor data, blind sensor data, etc.

The term "state" refers to a status of the building automation environment such as the presence of occupants in the rooms, occupant profile, the energy consumed from the energy sources, the plug load etc. For example, one occupant in a room with HVAC systems turned on is the status of the room, which is used to define the state. Further, the states have either an energy profile or a comfort profile or a combination of both. The environment data can be used to generate a building model for the building automation environment. The building model is virtual representation of the building automation environment using the sensor data, energy data and ambient data. In another embodiment, the building model for the building automation environment can be generated based on a historical environment data and a predicted environment data. In other words, the building model is a simulation model that is pre-trained using the historical environment data and the predicted environment data. The historical environment data refers to the sensor data, the energy data and the ambient data of the building automation environment that is recorded prior to implementa-tion of the present method. The predicted environment data is generated based on probability models applied on the historical environment data.

A current state from the set of states is determined based on the environment data. The current state is represented by energy profile, comfort profile or a combination of both. The above example of one occupant in bedroom with HVAC systems turned on can be considered as the current state when the status is viewed in real-time. The current state also includes the environment data at the given time instant. Accordingly, the current state includes sensor data, energy data, ambient data captured for that time instant.

Further, the current state also includes emotion and behaviour of the occupant. The emotion and behaviour pattern is captured in the form of audio data, video data and image data by means of capturing devices in the building automation environment. Alternatively, the emotion and behaviour pattern can be input by means of an application in a user device operated by the occupants. Once the emotion and behaviour is captured the audio data, video data and image data are auto-correlated in a chronological sequence using one or more neural networks. In an embodiment, the neural networks include a combination of recurrent neural network and convolutional neural network.

Determining the current state also includes monitoring an energy data including parameters such as plug load and energy sources associated with the building automation environment. Also, an occupant data including occupant metabolic rate, occupant energy consumption, occupant clothing is monitored. Further, an ambient data such as weather, air quality, air temperature, radiant temperature, air velocity, relative hu-midity, time of day is determined.

In an embodiment of the present invention, a state matrix is generated with the probabilities of transition from the current state to remaining states. The probabilities of transition include probabilities of achieving an effective goal associated with optimizing energy and comfort in the building automation environment.

According to another embodiment of the present invention, the reward vectors are determined in parallel for the set of states with the energy profiles and the set of states with the comfort profiles. The reward vectors are determined using the state matrix and is therefore based on probabilities of transition from the current state to remaining states of the set of states. In an embodiment, the reward vector is predicted for the set of states by based on a static learning method and a dynamic learning method. In an embodiment the reward vector is predicted based on deep reinforcement learning method.

As used herein "deep reinforcement learning" includes a multi-layered neural network. Deep reinforcement learning methods are obtained when the multi-layer neural networks are used to approximate components of reinforcement learning owing to a large or continuous set of states and/or actions. Further, the deep reinforcement learning includes multiple objectives such as reduced energy utilization and improved comfort in the building automation environment.

According to yet another embodiment of the present invention, the optimization policy is determined based on the reward vectors to ensure both energy and comfort is optimized. The optimization policy is generated by determining highest effec-tive reward if the current state is transitioned. The action is chosen based on the optimization policy. The action can be performed directly without any intervention, whereby the building automation environment transitions to a new state in the set of states. However, if intervention is required by the occupants, the determination of intervention is performed.

As used herein "action" refers to change in operation parameters of the energy sources and the plug load including fans, air conditioners, heaters, radiators, humidifiers, vents, win-dow blinds, lighting systems and consumer electronics, associated with the building automation environment.

According to an embodiment of the present invention, the action is displayed as a recommended action to the occupant by means of the application in the user device. The recommended action includes change in operation parameters of the energy sources and the plug load including HVAC systems, windows, blinds, lighting systems and consumer electronics, associated with the building automation environment. The recommended action is displayed along with a recommended state that will be entered if the recommended action is performed. The term "recommended action" is used to indicate that a probability of transition to the recommended state is high. Therefore, the probability of achieving an effective goal associated with optimizing energy and comfort in the building automation environment is high .

Apart from displaying the recommended action, feedback from the occupant can also be received. The feedback from the application on the user device can be taken when the occupant is not in the building automation environment. The occupant feedback on the new state of the building automation environment is received after analyzing emotion and behaviour of the occupant. The building environment model is updated at predetermined intervals with the occupant feedback. In a preferred embodiment, the building model is updated when there is least dynamicity in the building automation environment.

The occupant feedback is used to determine whether transition to the new state or the recommended state is to be made. For example, if the occupant affirms recommended state by selecting the recommended action via the application on the user device. If the occupant does not select the recommended action, the method is repeated.

The energy and comfort is optimized by transitioning to the new state. Further, the process of optimizing the energy and comfort is continued by iteratively determining the reward vectors and the optimization policy for the new state. In an embodiment, the building automation environment includes an office space with multiple floors with multiple occupants. The optimization of the energy and the comfort in the office space is performed by clustering the occupants into groups based on similarity in occupant profile. As used herein "occupant profile" is emotion and behaviour patterns of the occupant, oc-cupant data such as gender, metabolic rate, occupant energy consumption, and occupant clothing. Further, the energy and comfort is optimized by optimizing energy and comfort at multiple sections based on a section optimization policy for each of the sections. Accordingly, considering the example of office space with multiple floors, optimization is performed at each floor and this is used to optimize the energy and comfort for the office space.

According to the present invention also disclosed are a com-puting device and a system for energy and comfort optimization in a building automation environment. The computing device includes one or more processors, a memory coupled to the processors, and a communication unit. The memory is configured to store computer program instructions defined by modules, for example, a model generator module, learning module, policy module etc. The model generator module when executed generates a building model for the building automation environment. The learning module is executed to determine reward vectors for the set of states with the energy profiles and the set of states with the comfort profiles. The reward vectors are determined based on probabilities of transition from a current state to remaining states of the set of states. The policy module is executed to determine an optimization policy for energy and comfort based on the reward vectors. The computing device performs an action based on the optimization policy whereby the building automation environment transitions to a new state in the set of states. Accordingly, energy and comfort are optimized by iteratively determining the reward vectors and the optimization policy for the new state.

The computing device is can be placed in any building automation environment to optimize energy consumption. The computing device is capable of being retrofit to already existing build-ing automation environments with minimal prior understanding the building automation environments.

The present invention also provides a system for managing energy and comfort optimization in a building automation environment. The system includes a server, a network communicatively coupled to the server and one or more computing device communicatively coupled to the server via the network. The computing devices are associated to a respective section of the building automation environment. For example, the building automation environment can include multiple sections comprising one or more buildings, floors or rooms. Accordingly, computing devices can be used to optimize energy and comfort in every room. Optimization is then performed for multiple floors by associated computing devices based on the optimized energy and comfort in the rooms. Further optimization can be performed for the building based on the optimization on all the floors.

The system for optimization of energy and comfort in a building automation environment includes a building automation environ-ment, an office automation environment, a factory automation environment etc. The building automation environment includes multiple floors, rooms and occupants.

The building automation environment also includes energy sources such as grid supply, photo-voltaic supply, wind power supply, solar power and battery supply. Further, the building automation environment is monitored by means of sensors in- eluding carbon dioxide sensor, air temperature sensor, humidity sensor, light sensor. The sensors are associated with HVAC systems, lighting devices, blinds, and plug loads for appliances or electric vehicles in the building automation environ-ment .

As used herein "HVAC systems" refer to fans, air conditioners, heaters, radiators, humidifiers, vents in the building automation environment. Additionally, the building automation en-vironment is observed by means of cameras, microphones and voice recorders, and the recorded information is referred to as observation data. The information associated with the energy sources, sensor data, observation data, weather, time, etc is collectively referred to as "environment data" of the building automation environment.

According to an embodiment of the present invention, the system includes one or more servers, network interfaces, databases and computing devices. The server includes a controller and a memory. The server is communicatively coupled to the network. The memory is configured to store computer program instructions defined by modules, for example, building model generator. In an embodiment, building model generator can also be implemented on the cloud computing environment.

The building model generator generates a building model of the building automation environment. The building model is a virtual replica of the building automation environment that is trained based on historical environment data or predicted en-vironment data or a combination of both. As used herein "predicted environment data" refers to a simulation of the building automation environment and "historical environment data" refers to the environment data that is received in a training phase prior to the deployment of the system. The predicted environment data and historical environment data are stored in the database.

According to an embodiment of the present invention, the computing device includes an energy optimizer, a comfort optimizer and a policy module. The computing device includes other components such as processor, memory and communication unit, as described hereinbefore. The energy optimizer receives sensor data from the carbon dioxide sensor, the air temperature sensor, the humidity sensor, the light sensor. Accordingly, information regarding operation and performance of the HVAC systems, the lighting devices, the blinds, and the plug loads for appliances or electric vehicles in the building automation environment is received. Also, the energy optimizer receives information from the energy sources regarding available energy, consumed energy, energy pricing and scheduled supply-cut .

The energy optimizer comprises a state analyzer, an energy learning agent and an energy action selector. The sensor data from the sensors is received by the state analyzer and is used to determine a current state of the building automation environment. The state analyzer receives state information of all the possible states associated with the building automation environment from the memory of the computing device. In an embodiment, the database includes the state information associated with the building automation environment based on the historical environment data or predicted environment data or both. Accordingly, the state analyzer receives the state information from the database.

As used herein "energy learning agent" refers to a learning agent that optimizes energy utilization in the building automation environment. In a preferred embodiment, the learning agent to optimize energy utilization includes a deep rein-forcement learning method.

The energy learning agent receives the current state of the building automation environment from the state analyzer and determines energy reward vectors for the set of states with the energy profiles. Based on the energy reward vectors a new energy state is determined. The new energy state exhibits efficient optimization of energy in the building automation environment. Based on the new energy state a new energy action is selected by the energy action selector to facilitate tran-sition to the new energy state. The term "new energy state" refers to a new state that will achieve energy optimization in the building automation environment. Accordingly, "new energy action" refers to an action that will enable the building automation environment to enter into the new energy state.

According to an embodiment of the present invention, the energy learning agent is configured to learn the performance of building automation environment in terms of plug load, available energy sources and consumed energy sources. The computing de-vice is capable of computing the performance within the building automation environment without needing historical environment data.

The parameters, which are be monitored in building automation environment to optimize energy consumption, are:

• Plug load (electrical devices/appliances connected to the electrical network)

• HVAC systems

• Lighting systems

• Environmental conditions such as temperature, humidity, pressure, natural lighting, etc

• Alternative energy sources (wind power, solar, storage batteries etc)

• Occupants' energy usage

Data from the above mentioned systems is referred cumulatively as energy data. The energy data is provided as input to the energy learning agent. The energy learning agent is capable online adaptive learning of optimal energy states to be achieved without hand-engineered rules. In an embodiment, the energy learning agent is a deep neural network for reinforcement learning. Accordingly, the energy learning agent observes the current state. An action is chosen according to an optimization policy. A reward vector is associated with the action, the reward vector is used to indicate whether the action ut can achieve reduced energy consumption. Once the action is chosen, the energy action selector performs the action such that the building automation environment transitions to a new state .

The energy learning agent has an objective to maximize an expectation over the discounted return. In an embodiment, the energy learning agent is a deep neural network for reinforcement learning. Accordingly, a multi-layered neural network for optimizing the energy and comfort profiles of building automation environment is built.

According to another embodiment of the present invention, the energy learning agent can be trained using experience replay. To perform experience replay, state transitions are stored for pre-determined time period in a replay memory of the computing device. As used herein "state transitions" includes the current state, the new state and all further new states that are transitioned to for the pre-determined period. The state transitions are sampled uniformly at random from the replay memory to compute the updates of the network. Out of all the actions outputted by the energy learning agent, the action that results in maximum cumulative reward used to update the building model.

In an embodiment, the energy learning agent is initially trained with the predicted environment data for the building automation environment. In another embodiment, the energy learning agent can be trained to learn situations it has not encountered previously or if sub-optimal actions were taken, on nightly basis. The training is based on the experience gained throughput the day i.e. from the time the building model was previously trained.

The comfort optimizer of the computing device includes an emotion and behaviour analyzer, a comfort learning agent and a comfort action selector. Similar to the operation of the energy optimizer, the comfort optimizer also receives the environment data. The observation data of the environment data is analysed to determine an emotion and behaviour pattern of occupants in the building automation environment.

As used herein "comfort learning agent" refers to a learning agent that optimizes comfort of occupants in the building automation environment. In a preferred embodiment, the comfort learning agent to optimizes comfort by a deep reinforcement learning method.

The emotion and behaviour analyzer is capable of analyzing the emotion of the occupants using one or more neural networks. In an embodiment, a combination of convolutional neural network and recurrent neural network are used to determine the emotion and behaviour of the occupants. The comfort learning agent receives the current state of the building automation environ-ment and the emotion of the occupant from the emotion and behaviour analyzer. The comfort learning agent determines comfort reward vectors for the set of states with the comfort profiles. Based on the comfort reward vectors a new comfort state is determined. The new comfort state exhibits efficient optimization of comfort in the building automation environment. Based on the new comfort state a new comfort action is selected by the comfort action selector to facilitate transition to the new comfort state. The term "new comfort state" refers to a new state that will achieve comfort optimization in the building automation environment. Accordingly, "new comfort action" refers to an action that will enable the building automation environment to enter into the new comfort state.

According to a preferred embodiment of the present invention, the convolutional neural networks are capable of extracting relevant information from the pixels of the video data and image data which is fed into a fully-connected neural network. Accordingly, the environment data is provided as a set of parameters to one or more nodes of an input layer of the convolutional neural network. The set of parameters includes identification parameters to identify occupants. The behaviour or activity such as sleeping, eating, cooking, walking, position of the occupant, etc. are recognized by determining a softmax output to profile the occupants.

The feature is transformed through convolutional feature extraction layers according to the equation

hlt - =
) where 1 denotes the layer index, k denotes the feature map index, hO corresponds to the image pixel array, and are the filters and biases, which correspond to the 1-th layer and k-th feature map, learned from training examples, and f is an element wise function such as tanh(x) or max (0, x) .

In another embodiment, pooling layers may be used subsequently to some or all of the convolutional layers, which aggregate spatially local regions according to some aggregation function. After one or several convolutional layers with optional pooling layers, the resulting 3-dimensional array is either flattened to a vector of length number of feature maps< height<width . Alternatively, the features are "globally" pooled along its spatial dimensions using some aggregation function yielding a vector of length number of feature maps.

The output vector of the CNN is then taken as an input in order to determine behaviour pattern and activity pattern of the occupants. The sequences of behaviour and activity are determined using which the behaviour pattern is identified. In an embodiment, the sequence identification is performed using recurrent neural network. In recurrent neural network, the output (zt) and hidden state (ht) are updated and depend on the input xt from the convolutional neural network and the previous hidden state (ht-1) at every time step t. Thus, the behaviour pattern and activity pattern is learnt in chronological order, i.e in order hi = fW (xl,h0) = fW (xl, 0), then h2 = fW (x2, hi), etc., up to hT to understand the sequence of events taking place. The output of the recurrent neural network is the behaviour pattern and activity pattern of the occupants in the building automation environment.

The policy module receives the new comfort state and the new energy state from the comfort analyzer and the energy analyzer. The policy module determines an optimization policy based on energy reward vectors and comfort reward vectors associated with all the states. The recommended state is compared with the new energy state and new comfort state to determine the trade-off .

The computing device includes a recommendation module config-ured to select the recommended state when an occupant selection is required. Accordingly, the occupant can either accept or reject the recommended state and the system automatically adapts to feedback from the occupant in order to provide maximum energy savings while not compromising on the occupant comfort levels. Alternatively, recommended action to achieve the recommended state is performed.

The recommendation module interacts with a user device via an application. Typical recommendations shared by the recommen-dation module in a home automation environment include but are not limited to changes in

- Temperature settings thermostats in the HVAC systems in one or more rooms of the building automation environment depending on weather conditions, time of the day and date.

- Lighting settings to adjust blinds and turning on/off of lights in one or more rooms depending on weather conditions, time of the day and date.

- Water heating/cooling based on the availability of alternate sources of electricity, considering the complete cycle time for heating/cooling

- Optimal time for using consumer goods such as washing machine, dish washer, charging of electric vehicles etc.

In other building automation environments, the recommendations include but are not limited to

- Charging of electric vehicles in a commercial building such as office space

- Operating elevators & escalators as not all elevators/escalators need to be operated at all times in an office complex or shopping malls

In an embodiment, the application includes a dashboard of the possible energy savings that could be achieved by adapting to the recommended state. If the occupant is not comfortable with the recommended state, the application indicates possible loss in savings if the recommended state is overridden.

The feedback can be provided either via the application installed on the user device or directly through interaction with the system via voice commands. Further, the comfort learning agent in combination with the emotion and behaviour analyzer understand discomfort in voice patterns of the occupant to determine the comfort reward vectors that automatically generate a new comfort action.

Accordingly, the system optimizes the energy source from one or more options, such as grid, photo-voltaic (PV or solar) , wind and stored power based on the occupant's activity, energy pricing and weather conditions. The choice of energy supply into the building environment is automatically performed.

Further, the system is non-intrusive as the optimization based on observations of the occupant activities is performed on the computing device deployed within the building environment. The observations are used to optimize energy utilization and comfort without sending any data outside the building environment.

According to an embodiment of the present invention, optimization of energy and comfort is performed across multiple buildings in a building automation environment. The building automation environment includes additional buildings that can be added to the building automation environment dynamically.

Each building includes a learning agent capable of optimizing energy and comfort for each of the buildings individually. Each floor of each of the buildings is provided with a local agent to optimize energy and comfort. In an embodiment, optimization at the local agents is treated of equal priority and in such an embodiment, energy and comfort is achieved using multi-objective deep reinforcement learning. In the embodi-ment, energy and comfort profiles are determined by the global agents from environment data associated with the buildings. The energy and comfort profiles contain all potential optimal solutions of combinations of effective goals of each of the local agents.

In an embodiment, deep multi-objective reinforcement learning is used by the global agents to learn the energy and comfort profiles for policies of the associated local agents. Accordingly, each energy and comfort profile is represented by a neural network. To represent the energy and comfort profile as a neural network, the effective goals are represented as a sequence of scalarized single-objective goals with energy and comfort optimized independently using Deep Q-Networks . The reinforcement learning algorithms that can be used include and are not limited to Neural-Fitted Q-Iteration, Deep Q-Learning, Double Deep Q-Learning, Asynchronous Advantage Actor-Critic (A3C) and model-based reinforcement learning methods such as Recurrent Control Neural Network.

The global learning agents use the energy and comfort profiles to generate a global optimization policy, which is divided into sub-policies for each floor of the buildings. The execu-tion of the sub-policies is done by the local agents. The distributed execution of the sub-policies on the local agents ensures efficient utilization of the available computing resources. In an embodiment, a default policy is defined if the sub-policies executed for the local agents violate pre-defined constraints. For example, if the execution of the sub-policies results in frequent turning on and off of HVAC systems or lighting systems in buildings.

The above-mentioned and other features of the invention will now be addressed with reference to the accompanying drawings of the present invention. The illustrated embodiments are intended to illustrate, but not limit the invention.

The present invention is further described hereinafter with reference to illustrated embodiments shown in the accompanying drawings, in which:

FIG 1 illustrates a method of optimization of energy and comfort in a building automation environment;

FIG 2 is a block diagram of a computing device for optimization of energy and comfort in a building automation environment according to the present invention;

FIG 3 is a block diagram of a system for optimization of energy and comfort in a building automation environment according to the present invention;

FIG 4 illustrates steps performed to update a building model associated with the building automation environment in FIG 3;

FIG 5 illustrates steps performed to automatically determine emotion, activity and behaviour pattern of an occupant in the building automation environment in FIG 3;

FIG 6 is a system for optimizing comfort for the building automation environment in FIG 3;

FIG 7 is a system optimization for energy in the building automation environment in FIG 3;

FIG 8 is a system for comfort in a building automation envi-ronment having a plurality of occupants according to the present invention;

FIG 9 illustrates optimization of energy and comfort across multiple buildings in a building automation environment ac-cording to the present invention; and

FIG 10 illustrates steps performed to optimize of energy and comfort across the multiple buildings in FIG 9.

Various embodiments are described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. Further, numerous specific details are set forth in order to provide thorough understanding of one or more embodiments of the present invention. These exam-pies must not be considered to limit the application of the invention to configurations disclosed in the figures. It may be evident that such embodiments may be practiced without these specific details.

FIG 1 illustrates a method 100 of optimization of energy and comfort in a building automation environment. As used herein, "building automation environment" refers to an environment created by the combination of one or more buildings and its mechanical, structural and electrical components, its occupants along with a computer-based system installed in the buildings. The buildings are not limited to homes or domestic living spaces and include work spaces such as offices, facto-ries and manufacturing units. The term "occupant" is not limited to living entities such as human, animal occupants and can include non-living entities such as motors and robots.

The method 100 begins at step 105 by receiving environment data associated with the building automation environment. As used herein "environment data" refers to sensor data, energy data, occupant data and ambient data associated with the building automation environment. For example, room temperature, humidity, air quality, electricity consumption, available renew-able energy, possible electricity outage, occupancy sensor data, blind sensor data, etc.

The environment data can be used to generate a building model for the building automation environment at step 110. The build-ing model is virtual representation of the building automation environment using the sensor data, energy data and ambient data. In another embodiment, the building model for the building automation environment can be generated based on a historical environment data and a predicted environment data. In other words, the building model is a simulation model that is pre-trained using the historical environment data and the predicted environment data. The historical environment data refers to the sensor data, the energy data and the ambient data of the building automation environment that is recorded prior to implementation of the present method. The predicted environment data is generated based on probability models applied on the historical environment data.

In the present invention, the building model can be represented by a set of states. As used herein "state" refers to status of the building automation environment such as the presence of occupants in the rooms, occupant profile, the energy consumed from the energy sources, the plug load etc. For example, one occupant a room with HVAC systems turned on, is the condition of the room and is used to define the state. Further, the state has either an energy profile or a comfort profile or a combination of both. Considering the previous example, the energy profile includes electricity consumed by the HVAC systems that are turned on and currently available energy source. The comfort profile includes behaviour and emotion of the occupant in the room.

At step 115, a current state from the set of states is determined based on the environment data. The current state is represented by energy profile, comfort profile or a combination of both. The above example of one occupant in bedroom with HVAC systems turned on can be considered as the current state when the status is viewed in real-time. The current state also includes the environment data at the given time instant. Accordingly, the current state includes sensor data, energy data, ambient data captured for that time instant.

Further, the current state also includes emotion and behaviour of the occupant. Therefore, at step 115, emotion and behaviour pattern of occupants in the building automation environment is identified. The emotion and behaviour pattern is captured in the form of audio data, video data and image data by means of capture devices in the building automation environment, Alternatively, the emotion and behavior pattern can be input by means of an application in a user device operated by the oc-cupants. Once the emotion and behavior is captured the audio data, video data and image data are auto-correlated in a chronological sequence using one or more neural networks. In an embodiment, the neural networks include a combination of recurrent neural network and convolution neural network. The step of identifying the emotion and behavior pattern is detailed in FIG 5.

Step 115 of determining current state also includes monitoring an energy data including parameters such as plug load and energy sources associated with the building automation environment. Also, an occupant data including occupant metabolic rate, occupant energy consumption, occupant clothing is monitored. Further, an ambient data such as weather, air quality, air temperature, radiant temperature, air velocity, relative humidity, time of day is determined. This is further elaborated in FIG 6.

At step 120, a state matrix is generated with the probabilities of transition from the current state to remaining states. The probabilities of transition include probabilities of achieving an effective goal associated with optimizing energy and comfort in the building automation environment.

At step 125, reward vectors are determined. The reward vectors are determined parallel for the set of states with the energy profiles and the set of states with the comfort profiles. The reward vectors are determined using the state matrix and is therefore based on probabilities of transition from the current state to remaining states of the set of states. In an embodiment, the reward vector is predicted for the set of states based on a static learning method and a dynamic learning method. In an embodiment the reward vector is predicted based on deep reinforcement learning method.

At step 130, an optimization policy is determined based on the reward vectors to ensure both energy and comfort is optimized. The optimization policy is generated by determining highest effective reward if the current state is transitioned.

At step 135, an action is chosen based on the optimization policy. The action can be performed directly without any intervention at step 160, whereby the building automation envi-ronment transitions to a new state in the set of states. However, if intervention is required by the occupants, the determination of intervention is performed at step 140. The term "action" refers to change in operation parameters of the energy sources and the plug load including HVAC systems, windows, blinds, and consumer electronics, associated with the building automation environment.

At step 145, the action is displayed as a recommended action to the occupant by means of the application in the user device. The recommended action includes change in operation parameters of the energy sources and the plug load including HVAC systems, windows blinds, lighting systems and consumer electronics, associated with the building automation environment. The recommended action is displayed along with a recommended state that will be entered if the recommended action is performed. The term "recommended action" is used to indicate that a probability of transition to the recommended state is high. Therefore, the probability of achieving an effective goal associated with optimizing energy and comfort in the building automation environment is high.

Apart from displaying the recommended action, feedback from the occupant can also be received at step 150. The feedback from the application on the user device can be taken when the occupant is not in the building automation environment. The occupant feedback on the new state of the building automation environment is received after analyzing emotion and behaviour of the occupant. At step 155, the building environment model is updated at predetermined intervals with the occupant feedback. In a preferred embodiment, the building model is updated when there is least dynamicity in the building automation environment .

At step 160, the occupant feedback is used to determine whether transition to the new state or the recommended state is to be made. For example, if the occupant affirms recommended state by selecting the recommended action via the application on the user device. If the occupant does not select the recommended action, the method 100 is repeated from step 105, indicated by arrow 165.

At step 170, the energy and comfort is optimized by transi-tioning to the new state. Further, the process of optimizing the energy and comfort is continued by iteratively determining the reward vectors and the optimization policy for the new state, as indicated by arrow 175. In an embodiment, the building automation environment includes an office space with mul-tiple floors with multiple occupants. The optimization of the energy and the comfort in the office space is performed by clustering the occupants in into groups based on similarity in occupant profile. As used herein "occupant profile" is emotion and behaviour patterns of the occupant, occupant data such as gender, metabolic rate, occupant energy consumption, and occupant clothing. Further, the energy and comfort is optimized by optimizing energy and comfort at multiple sections based on a section optimization policy for each of the sections. Accordingly, considering the example of office space with multiple floors, optimization is performed at each floor and this is used to optimize the energy and comfort for the office space .

FIG 2 is a block diagram of a computing device 200 for optimization of energy and comfort in a building automation environment according to the present invention. The computing device 200 is any device in the building automation environment capable of performing the method 100 in FIG 1. In an embodiment, the computing device can include multiple processing units to execute the method 100. Alternatively, processing of the method 100 is distributed across several devices capable of communicating with each other.

In an embodiment, the computing device 200 is an edge computing device. As used herein "edge computing" refers to computing that is capable of being performed by an edge device, which is a compact computing device that has a small form factor and resource constraints in terms of computing power.

As shown in FIG 2, the computing device 200 includes a communication unit 202, a processor 204, a capturing device 206, a display 208 and a memory 210 communicatively coupled to each other. The communication unit 202 includes a transmitter, a receiver and Gigabit Ethernet port. The memory 210 may include 2 Giga byte Random Access Memory (RAM) Package on Package (PoP) stacked and Flash Storage. The memory 210 is provided with modules stored in the form of computer readable instructions, for example, 212, 214, 216, 218 and 220. The processor 204 is configured to execute the defined computer program instructions in the modules. The execution of the modules can also be performed using co-processors such as Graphical Processing Unit (GPU) , Field Programmable Gate Array (FPGA) or Neural Processing/Compute Engines. Further, the processor 202 is configured to execute the instructions in the memory 204 simultaneously. The display 208 includes a High-Definition Multi-media Interface (HDMI) display 208 and a cooling fan (not shown in the figure) .

According to an embodiment of the present invention, the computing device 200 can be installed on and accessible by a user device 260, for example, a personal computing device, a workstation, a client device, a network enabled computing device, any other suitable computing equipment, and combinations of multiple pieces of computing equipment. Additionally, an occupant in the building automation environment using the user device 260 can access the computing device 200 via a GUI (graphic user interface) 262. The GUI 262 is, for example, an online web interface, a web based downloadable application interface, etc.

The computing device 200 disclosed herein is in operable communication with a database 250 over a communication network 205. In another embodiment according to the present invention, the computing device 200 is configured as a web based platform, for example, a website hosted on a server or a network of servers. The computing device 200 is implemented in the cloud computing environment. The computing device 200 is developed, for example, using Google App engine cloud infrastructure of Google Inc., Amazon Web Services® of Amazon Technologies, Inc.

In an embodiment, the computing device 200 is configured as a cloud computing based platform implemented as a service for analyzing data.

The modules executed by the processor 204 include a model generator module 212, a learning module 214, a policy module 216, a recommendation module 218, a render module 220, a model updater module 222 and a clustering module 224.

The model generator module 212 generates a building model for the building automation environment. The building model is virtual representation of the building automation environment using the sensor data, energy data and ambient data. The building model is represented by a set of states comprising energy profiles and comfort profiles.

The learning module 214 determines reward vectors for the set of states with the energy profiles and the set of states with the comfort profiles. The reward vectors are determined based on probabilities of transition from a current state to remaining states of the set of states. The learning module 214 includes a state module 232, a reward module 234, a comfort optimizer module 236 and an energy optimizer module 238.

The state module 232 uses emotion and behaviour pattern of the occupant and energy data to determine a current state of the building automation environment. The comfort optimizer module 236 identify emotion and behaviour pattern of the occupant in the building automation environment. The energy optimizer mod-ule 238 includes an energy analyzer 242, a consumption analyzer 244, and an ambient analyzer 246. The energy analyzer 242 monitors an energy data comprising plug load and energy sources associated with the building automation environment. The consumption analyzer 244 monitors the occupant data that includes occupant metabolic rate, occupant energy consumption, occupant clothing, etc. The occupant data is used to build the occupant profile of the occupant in the building automation environment. Further, the ambient analyzer 246 determines an ambient data associated with the building automation environment. The ambient data includes weather, air quality, air temperature, radiant temperature, air velocity, relative humidity, time, day.

Further, the state module 232 generates a state matrix with the probabilities of transition from the current state to re-maining states. The probabilities of transition are the probabilities of achieving an effective goal associated with optimizing energy and comfort in the building automation environment. For example, the effective goal is to reduce energy consumption by 5% without changing the comfort in the building automation environment. In another example, the effective goal can be reducing energy consumption by 30 % with a 5% change in comfort. Accordingly, the state matrix will include the probability of achieving the effective goal if the current state is transitions to the remaining states.

The reward module 234 predicts reward vector for the remaining states based on a static learning method and a dynamic learning method. The reward vector refers to positive or negative value that is used to indicate whether the transition from the cur-rent state is a good decision or a bad decision. The reward vector acts like an incentive or a deterrent to transition from the current state. The reward vector can be pre-determined and set by an operator of the building automation environment.

In an embodiment, the reward vector can be dynamically learnt by means of reward shaping. In another embodiment, the reward vector is dynamically learnt by automatically determining the set of states that it is capable of being transitioned. A generative neural network can be used to determine the reward vector based on the set of states and the effective goal.

The policy module 216 uses the reward vectors to determine an optimization policy for energy and comfort. Since the optimization policy is based on the reward vectors it is capable of being generated automatically from the environment data of the building automation environment. Further, the optimization policy can be tailored for different building automation en-vironments based on the environment data. The processor 204 of computing device performs an action based on the optimization policy whereby the building automation environment transitions to a new state in the set of states. The term "action" refers to change in operation parameters of the energy sources and the plug load including HVAC systems, windows, blinds, and consumer electronics, associated with the building automation environment .

In another embodiment the action is converted to a recommended action by the recommendation module 218 such that it is suitably displayed on an application. Further, the render module 220 displays the recommended action to an occupant by means of the application in the user device. The probability of achieving the effective goal associated with optimizing energy and comfort in the building automation environment is high, if the recommended action is chosen by the occupant.

Apart from displaying the recommended action, feedback from the occupant via the application is used to update the building model using the model updater module 222. In a preferred embodiment, the building model is updated when there is least dynamicity in the building automation environment.

In an embodiment where there are multiple occupants, the clustering module 224 clusters the occupants into groups based on similarity in occupant profile. The occupant profile includes emotion and behaviour patterns of the occupant, occupant data such as gender, metabolic rate, occupant energy consumption, and occupant clothing. Once the new state is transitioned to, the processor 204 executes the modules once again to determine the optimized energy and comfort for the building automation environment. This is iteratively performed when there is a change in the state.

FIG 3 is a block diagram of a system 300 for optimization of energy and comfort in a building automation environment ac-cording to the present invention. As shown, the building automation environment is a home automation environment 302 with multiple rooms 304 and occupants 305. For the purpose of FIG 3, the building automation environment will be referred to as home automation environment 302.

The home automation environment 302 also includes energy sources 306 such as grid supply, photo-voltaic supply, wind power supply, solar power and battery supply. Further, the home automation environment 302 is monitored by means of sen-sors 308 including carbon dioxide sensor, air temperature sensor, humidity sensor, light sensor. The sensors 308 are associated with HVAC systems, lighting devices, blinds, and plug loads for appliances or electric vehicles in the home automation environment 302. As used herein "HVAC systems" refer to fans, air conditioners, heaters, radiators, humidifiers, vents in the home automation environment 302. Additionally, the home automation environment 302 is observed by means of cameras 310, microphones and voice recorders, and the recorded information is referred to as observation data. The information associated with the energy sources 306, sensor data, observation data, weather, time, etc is collectively referred to as "environment data" of the home automation environment 302.

The system 300 includes a server 320, a network 350, a database 330 and a computing device 340. The system 300 also includes computing devices 340 and that are associated with different floors in the home automation environment 302.

As shown in FIG 3, the server 320 is communicatively coupled to the database 330. The database 330 is, for example, a structured query language (SQL) data store or a not only SQL (NoSQL) data store. In an embodiment, the database 330 may be a location on a file system directly accessible by the server 320. In another embodiment, the database 330 may be configured as cloud based database 330 implemented in a cloud computing environment, where computing resources are delivered as a service over the network 350. As used herein, "cloud computing environment" refers to a processing environment comprising configurable computing physical and logical resources, for example, networks, servers, storage, applications, services, etc., and data distributed over the network 350, for example, the inter-net. The cloud computing environment provides on-demand network access to a shared pool of the configurable computing physical and logical resources. The network 350 is, for exam- pie, a wired network, a wireless network, a communication network, or a network formed from any combination of these networks .

The server 320 includes a controller 322 and a memory 324. The server 320 is communicatively coupled to the network 350. The memory 324 is configured to store computer program instructions defined by modules, for example, building model generator 326. In the present embodiment, building model generator 326 can also be implemented on the cloud computing environment as indicated in FIG 3.

The building model generator 326 generates a home model of the home automation environment 302. The home model is a virtual replica of the home automation environment 302 that is trained based on historical environment data or predicted environment data or a combination of both. As used herein "predicted environment data" refers to a simulation of the home automation environment 302 and "historical environment data" refers to the environment data that is received in a training phase prior to the deployment of the system 300. The predicted environment data and historical environment data are stored in the database 330.

Further, FIG 3 shows that the computing device 340 includes an energy optimizer, a comfort optimizer and a policy module. The computing device 340 includes other components such as processor, memory and communication unit, as described in detail in FIG 2. The energy optimizer 370 receives sensor data from the carbon dioxide sensor, the air temperature sensor, the humidity sensor, the light sensor. Accordingly, information regarding operation and performance of the HVAC systems, the lighting devices, the blinds, and the plug loads for appliances or electric vehicles in the home automation environment 302 is received. Also, the energy optimizer 370 receives information from the energy sources 306 regarding available energy, consumed energy, energy pricing and scheduled supply-cut.

The energy optimizer 370 comprises a state analyzer 372, an energy learning agent 374 and an energy action selector 376. The sensor data from the sensors 308 is received by the state analyzer 372 and is used to determine a current state of the home automation environment 302. The state analyzer 372 receives state information of all the possible states associated with the home automation environment 302 from the memory 324 of the computing device 340. In an embodiment, the database 330 includes the state information associated with the home automation environment 302 based on the historical environment data or predicted environment data or both. Accordingly, the state analyzer 372 receives the state information from the database 330.

As used herein "state" refers to status of the building (home) automation environment such as the presence of occupants 305 in the rooms, occupant profile, the energy consumed from the energy sources 306, the plug load etc. For example, one occupant in bedroom 304f with HVAC systems turned on is the status of the room 304f and is used to define the state. Further, the states have either an energy profile or a comfort profile or a combination of both. In previous example, the energy profile is plug load from the HVAC systems and the comfort profile will be the behaviour pattern and activity of the occupant in the room 304f.

The energy learning agent 374 receives the current state of the home automation environment 302 from the state analyzer 372 and determines energy reward vectors for the set of states with the energy profiles. Based on the energy reward vectors a new energy state is determined. The new energy state exhibits efficient optimization of energy in the home automation envi-ronment 302. Based on the new energy state a new energy action is selected by the energy action selector 376 to facilitate transition to the new energy state. The term "new energy state" refers to a new state that will achieve energy optimization in the home automation environment 302. Accordingly, "new energy action" refers to an action that will enable the home automation environment 302 to enter into the new energy state.

The comfort optimizer 380 of the computing device 340 includes an emotion and behaviour analyzer 382, a comfort learning agent 384 and a comfort action selector 386. Similar to the operation of the energy optimizer 370, the comfort optimizer 380 also receives the environment data. The observation data of the environment data is analysed to determine an emotion and behaviour pattern of occupants 305 in the home automation envi-ronment 302. The occupants 305 are not limited to human beings and can include other living beings such as pets in the home automation environment 302. The understanding of occupants 305 can be extended to non-living entities and behaviour pattern is accordingly determined through condition monitoring.

The emotion and behaviour analyzer is capable of analyzing the emotion of the occupants 305 using one or more neural networks. In an embodiment, a combination of convolutional neural networks and recurrent neural networks are used to determine the emotion and behaviour of the occupants 305, as described in FIG 5.

The comfort learning agent 384 receives the current state of the home automation environment 302 and the emotion of the occupant from the emotion and behaviour analyzer. The comfort learning agent 384 determines comfort reward vectors for the set of states with the comfort profiles. Based on the comfort reward vectors a new comfort state is determined. The new comfort state exhibits efficient optimization of comfort in the home automation environment 302. Based on the new comfort state a new comfort action is selected by the comfort action selector 386 to facilitate transition to the new comfort state. The term "new comfort state" refers to a new state that will achieve comfort optimization in the home automation environment 302. Accordingly, "new comfort action" refers to an action that will enable the home automation environment 302 to enter into the new comfort state.

The policy module 355 receives the new comfort state and the new energy state from the comfort analyzer and the energy analyzer. The policy module 355 determines an optimization policy based on energy reward vectors and comfort reward vectors associated with all the states. The recommended state is compared with the new energy state and new comfort state to determine the trade-off.

The computing device 340 includes a recommendation module 365 configured to select the recommended state when an occupant selection is required. Accordingly, the occupant can either accept or reject the recommended state and the system 300 automatically adapts to feedback from the occupant in order to provide maximum energy savings while not compromising on the occupant comfort levels. Alternatively, recommended action to achieve the recommended state is performed.

The recommendation module 365 interacts with a user device 375 via an application. Typical recommendations shared by the recommendation module 365 include but are not limited to changes in

- Temperature settings thermostats in the HVAC systems in one or more rooms 304a-f of the home automation environment 302 depending on weather conditions, time of the day and date.

- Lighting settings to adjust blinds and turning on/off of lights in one or more rooms 304a-f depending on weather con-ditions, time of the day and date.

- Water heating/cooling based on the availability of alternate sources of electricity, considering the complete cycle time for heating/cooling

- Optimal time for using consumer goods such as washing ma-chine, dish washer, charging of electric vehicles etc.

In other building automation environments as in FIG 8, the recommendations include

- Charging of electric vehicles in a commercial building such as office space

- Operating elevators & escalators as not all elevators/escalators need to be operated at all times in an office complex or shopping malls

In an embodiment, the application includes a dashboard of the possible energy savings that could be achieved by adapting to the recommended state. If the occupant is not comfortable with the recommended state, the application indicates possible loss in savings if the recommended state is overridden.

The feedback can be provided either via the application installed on the user device 375 or directly through interaction with the system 300 via voice commands. Further, the comfort learning agent 384 in combination with the emotion and behaviour analyzer 382 understand discomfort in voice patterns of the occupant to determine the comfort reward vectors that automatically generate a new comfort action.

Accordingly, the system 300 optimizes the energy source from one or more options, such as grid, photo-voltaic (PV or solar) , wind and stored power based on the occupant's activity, energy pricing and weather conditions. The choice of energy supply into the home environment is automatically performed.

Further, the system 300 is non-intrusive as the optimization based on observations of the occupant activities using the computing device 340 deployed within the home environment. The observations are used to optimize energy utilization and comfort without sending any data outside the home environment.

FIG 4 illustrates steps performed to update a building model associated with the building automation environment in FIG 3. Accordingly, the building automation environment for the purpose of FIG 4 is the home automation environment.

At step 410, the computing device 340 learns the state of the home automation environment 302 and executed action. Parame-ters that alter the state of the home automation environment 302 and the effective goal of reducing energy consumption with improved comfort are stored in a database on the computing device. Further, at step 420, the computing device 340 retrains the home model at pre-determined time intervals. The re-training is based on the experience gained throughput the pre-determined interval i.e. from the time the home model was previously trained. For example, the computing device 340 trains the home model at night when there is little dynamicity in the home automation environment. The re-trained home model is then used to repeat the step 410 whereby the energy and comfort of the home automation environment 302 is optimized.

FIG 5 illustrates steps performed to automatically determine emotion, activity and behaviour pattern of an occupant in the home automation environment 302. At step 510, the environment data 512-518 from the home automation environment is captured. The camera in the home automation environment captures video data and image data related to the occupants. Further, microphones in the home automation environment capture audio data, which is used to understand the emotion, activity and behaviour pattern of the occupant.

Steps 520 to 540 are performed on the computing device to ensure that the environment data is restricted within the home automation environment and is not accessible via a public network. At step 520, the emotion and behaviour analyzer of the computing device performs feature transformation. In an embod-iment, the emotion and behaviour analyzer is configured to auto-correlating the environment data 512-518 using one or more neural networks .

In a preferred embodiment, the neural network used for feature transformation is a convolutional neural network. The convo-lutional neural networks are able of extracting relevant information from the pixels of the video data and image data which is fed into a fully-connected neural network. Accordingly, the environment data 512-518 is provided as a set of parameters to one or more nodes of an input layer of the convolutional neural network. The set of parameters includes identification parameters to identify occupants. The behaviour or activity such as sleeping, eating, walking, position of the occupant, occupant profile are recognized by computing by determining a softmax output to identify occupants.

In an embodiment, first the feature transformation of each frame is done by applying convolutional neural network 522-528 on the video data or image data to produce a fixed-length vector. More specifically, the image data, which is represented as a 3-dimensional array of pixels intensities. In an embodiment, the 3-dimensional array includes one for each of 3 colour channels. The other array dimensions include image height and image width.

The feature is transformed through convolutional feature extraction layers according to the equation

hlfj = φ((ννι(Ιί) *hl_1)ij + b[k)) where 1 denotes the layer index, k denotes the feature map index, hO corresponds to the image pixel array, and are the filters and biases, which correspond to the 1-th layer and k-th feature map, learned from training examples, and f is an element wise function such as tanh(x) or max (0, x) .

In another embodiment, pooling layers may be used subsequent to some or all of the convolutional layers, which aggregate spatially local regions according to some aggregation func-tion. After one or several convolutional layers with optional pooling layers, the resulting 3-dimensional array is either flattened to a vector of length number of feature maps< height<width . Alternatively, the features are "globally" pooled along its spatial dimensions using some aggregation function yielding a vector of length number of feature maps.

The output vector of the CNN 522-528 is then taken as an input at step 530. In order to determine behaviour pattern and activity pattern of the occupants, step 530 is performed. At step 530 sequences of behaviour and activity is determined using which the behaviour pattern is identified. In an embodiment, the sequence identification is performed using recurrent neural network 532-538. In recurrent neural network 532-538, the output (zt) and hidden state (ht) are updated and depend on the input xt from the convolutional neural network 522-528 and the previous hidden state (ht-1) at every time step t. Thus, the behaviour pattern and activity pattern is learnt in chronological order, i.e in order hi = fW (xl,h0) = fW (xl, 0), then h2 = fW (x2, hi), etc., up to hT to understand the sequence of events taking place. The output 540 of the recurrent neural network 532-538 is the behaviour pattern and activity pattern of the occupants in the home automation environment 302.

FIG 6 is a system 600 for optimizing comfort for the home automation environment 302. As shown in FIG 6, the system 600 includes a computing device 640, similar to the computing device 200 and 340 disclosed in FIGs 2 and 3.

In order to adapt to the occupants the system 600 is capable of identifying occupant behaviour and analyzing the same from the observation data 512-518. In particular, the system 600 uses the computing device 640 to create an occupant model of occupant profiles and their behaviour profiles. The occupant model enables the system 600 to reason about the effects of actions taken by the computing device 600.

The computing device 640 comprises an emotion and behaviour analyzer 682, a comfort learning agent 684 and a comfort action selector 686. The emotion and behaviour analyzer 682 is used to observe the occupants along with the capture devices in the home automation environment 302. The comfort learning agent 684 analyzes the observation data 512-518 and the comfort ac-tion selector takes decisions on optimizing comfort such that there is a positive feedback from the occupants in the home automation environment 302.

The emotion and behaviour analyzer 682 receives the observation data 512-518. The occupants' quality of living in the home automation environment 302 is mainly determined by three basic factors: thermal comfort, visual comfort, and indoor air quality comfort. These factors are used by the emotion and behaviour analyzer 682 to analyze the observation data 512-518. Further, for thermal comfort, the following six factors are taken into consideration:

• Metabolic rate (met) : The energy generated from the occupant' s body

• Clothing insulation (clo) : The amount of thermal insulation the person is wearing

• Air temperature: Temperature of the air surrounding the occupant

• Radiant temperature: The weighted average of all the temperatures from surfaces surrounding the occupant

· Air velocity: Rate of air movement given distance over time

• Relative humidity: Percentage of water vapour in the air

The metabolic rate of the occupants can be determined from the activity being performed. For example, low activity levels when the occupants are resting on the couch (e.g. reading, watching television) and high activity levels for the occupants performing housework.

The emotion and behaviour analyzer 682 also identifies and classifies clothing on the occupants and thus, the thermal insulation can be determined. Similarly, other factors can be obtained using sensors placed within the building. Based on the six factors, optimal operative temperature is determined by the comfort learning agent 684. The comfort action selector 686 determines the HVAC system settings to achieve the optimal operative temperature.

Similarly, indoor carbon dioxide (C02) concentration is used as an index to measure the air quality in the home automation environment 302. The vents of the HVAC system are utilized to provide fresh air in the home automation environment 302. Further, indoor illumination level is used to indicate the visual comfort in the home automation environment 302, which is measured in lux. The lighting systems serve as actuators to control the indoor illumination level.

The comfort learning agent 684 is a deep neural network rein-forcement learning agent. In an embodiment, the comfort learning agent 684 evaluate occupants' indoor comfort using the below equation

Comfort = wT [1- (eT / Tset )2 ] + wA[l- (eA / Aset )2 ] + wL[l-(eL / Lset )2 ]

The comfort learning agent 684 determines the comfort as occupants' comfort level inside the current indoor environment in the range of [0, 1] . Accordingly, wT, wL, and wA are weights that indicate the importance of three comfort factors and vary in the range of [0,1], with wT + wA + wL = 1. The above equation for comfort is optimized by the comfort learning agent 684 for individual occupants based on the occupant profile.

TSet/ Lset/ and Aset represent set points of the temperature, the illumination, and the indoor air quality, respectively. eT, eL, and eA are the differences between the measured values and set points of the temperature, the illumination, and the indoor air quality, respectively, i.e,

ST = Tactual _ Tset

6L = Lactual _ Lse

6A = Aactual - Aset

actual, Lactuai, and Aactuai are the measured actual value of the indoor temperature, the indoor illumination level, and the indoor air quality, respectively.

In multi-storey buildings such as offices, hospitals or schools, it becomes difficult to define comfort for individuals due to the number of occupants in a building. Accordingly, optimization for comfort is performed for clusters of people.

The clustering may be done based on similarity of activities.

For example, the occupants are in a meeting on the first floor of a building. Alternatively, the clustering can be done based on personalized occupant profiles. Method and system to perform such an optimization is provided in FIG 8.

FIG 7 is a system 700 for energy in the home automation environment 302. The system 700 includes a computing device 740 similar to the computing devices 200 and 340 in FIG 2 and 3. The computing device 740 comprises an energy learning agent 774 and an energy action selector 778. The energy learning agent 774 is configured to learn the performance of home automation environment 302 in terms of plug load, available en-ergy sources and consumed energy sources. The computing device 740 is capable of computing the performance within the home automation environment 302 without needing historical environment data.

The parameters, which are be monitored in home automation environment 302 to optimize energy consumption, are:

• Plug load (electrical devices/appliances connected to the electrical network)

• HVAC systems

• Lighting systems

• Environmental conditions such as temperature, humidity, pressure, natural lighting, etc

· Alternative energy sources (wind power, solar, storage batteries etc)

• Occupants' energy usage

Data from the above mentioned systems is referred cumulatively as energy data. The energy data is data that is observed from the above systems and therefore can also be generalized as observation data. The energy data is provided as input to the energy learning agent 774.

The energy learning agent 774 is capable of online adaptive learning of optimal energy states to be achieved without hand-engineered rules. In an embodiment, the energy learning agent is a deep neural network for reinforcement learning. Accordingly, the energy learning agent 774 observes the current state st, which belongs to the set of states (i.e. st e S) at each discrete time step t. An action ut e U, out of a list of actions U is chosen according to an optimization policy p. A reward vector rt is associated with the action ut, the reward vector is used to indicate whether the action ut can achieve reduced energy consumption. Once the action is chosen, the energy action selector 778 performs the action ut such that the home automation environment 302 transitions to a new state st+1.

The energy learning agent 774 has an objective to maximize an expectation over the discounted return, Rt = rt + βτί+ι + 2rt+2 + where rt is the reward received at time t and β e [0, 1] is a discount factor. Since the energy learning agent is a deep neural network for reinforcement learning, a multi-layered neural network for a given state of building xs' is built. The multi-layered neural network outputs a vector of action values Q(s, θ) , where Θ are the parameters of the network. The action selector selectively picks uta from Q using e-greedy policy.

Further, the energy learning agent 774 can be trained using a target network with experience replay. The target network is similar to an online network except that its parameters are copied every τ steps from the online network and kept the same on all other steps. The target network trains the energy learning agent 774 with the algorithm is given by:

Q (state, action) = R (state, action) + Gamma * Max [Q (next state, all actions) ]

To perform experience replay, state transitions are stored for pre-determined time period in a replay memory 720 of the computing device 740. As used herein "state transitions" includes the current state, the new state and all further new states that are transitioned to for the pre-determined period. The state transitions are sampled uniformly at random from the replay memory 720 to compute the updates of the network. Out of all the actions outputted by the energy learning agent 774, the action that results in maximum cumulative reward used to update the home model. In an embodiment, the energy learning agent 774 is initially trained with the predicted environment data for the home automation environment 302. In another embodiment, the energy learning agent 774 can be trained to learn situations it has not encountered previously or if sub-optimal actions were taken, on nightly basis.

The computing device 740 can be placed in any building auto-mation environment to optimize energy consumption. The computing device 740 is capable of being retrofit to already existing building automation environments with minimal prior understanding the building automation environments.

FIG 8 illustrates optimization of comfort in a building automation environment 802 having multiple occupants (not shown in figure) . The building automation environment 802 includes a building with multiple floors 802a-c and sensors 808. The sensors 808 include carbon dioxide sensor, air temperature sensor, humidity sensor, light sensor. The sensors 308 are associated with HVAC systems, lighting devices, blinds, and plug loads for appliances or electric vehicles in the building automation environment 802.

The building can be a home or an office space. In the present embodiment, the building is an office space with meeting rooms on floor 802a, cubicles on floor 802b and recreation space on floor 802c.

The computing device 840 includes an emotion and behaviour analyzer 882, a comfort learning agent 884 and a comfort action selector 886. The emotion and behaviour analyzer 882 is used to observe and cluster the occupants along with the capturing devices in the building automation environment 802. As shown in FIG 8, the clustering is done based on activity performed by the occupants in each floor. For example, cluster 805 is associated with the occupants in the meeting rooms on floor 804a. Cluster 810 is associated with the occupants in the cubicles on floor 804b and cluster 815 is associated with the occupants in the recreation room on floor 804c. Accordingly, the clustering is performed based on the location and on the similarity of activities. The emotion and behaviour analyzer 882 is capable of recognizing clusters of people based on video data and image data and the same is input to the comfort learning agent 886. The comfort learning agent 884 analyzes the clusters 805, 810 and 815 separately using deep reinforcement learning method to output action for each cluster. The comfort action selector 886 performs the actions such that there is a positive feedback from the occupants in the building automation environment 802.

FIG 9 illustrates optimization of energy and comfort across multiple buildings 912, 922 and 932 of a building automation environment. The building automation environment including the buildings 912, 922 and 932 and energy sources 915, 925 and 935 and sensors associated with the buildings 912, 922 and 932. Additional buildings can be added to the building automation environment dynamically.

As shown in FIG 9, building 912 includes floors 914, 916 and 918. Building 922 includes floors 924-928 and building 932 includes floors 934-938. Further, the energy sources 915, 925 and 935 include renewable and non-renewable sources such as wind power supply, solar power, grid supply, photo-voltaic supply and battery supply. In an embodiment, the battery supply includes the battery supply of electronic automobiles stationed in the building automation environment.

Each building 912, 922 and 932 includes a learning agent 910, 920 and 930. The learning agent 910, 920 and 930 is capable of optimizing energy and comfort for each of the buildings 912, 922 and 932, respectively. For example, the learning agent 910 globally optimizes energy and comfort for floors 914, 916 and 918. Similarly, learning agents 920 and 930 globally optimize energy and comfort for floors 924-928 and 934-938, respec-tively.

Each floor of each of the buildings is provided with a local agent to optimize energy and comfort. For example, for the floor 914, a local agent 956 optimizes energy and comfort based on inputs from an energy optimizer 952 and a comfort optimizer 954. For the floor 916, a local agent 966 optimizes energy and comfort based on inputs from an energy optimizer 962 and a comfort optimizer 964. For the floor 918, a local agent 976 optimizes energy and comfort based on inputs from an energy optimizer 972 and a comfort optimizer 974. This similarly seen in the floors 924-928 and 934-938, which have local agent 986 and local agent 996, respectively.

In an embodiment, optimization at the local agents 956, 966, 976, 986 and 996 are treated of equal priority and in such an embodiment, energy and comfort is achieved using multi-objective deep reinforcement learning. Accordingly, energy and comfort profiles are computed by the global agents 910, 920 and 930 from environment data associated with the buildings 912, 922 and 932. The energy and comfort profiles contain all potential optimal solutions of combinations of effective goals of each of the local agents 956, 966, 976, 986 and 996.

In an embodiment, deep multi-objective reinforcement learning is used by the global agents 910, 920 and 930, to learn the energy and comfort profiles for policies of the associated local agents 956, 966, 976, 986 and 996. Accordingly, each energy and comfort profile is represented by a neural network.

To represent each energy and comfort profile as a neural network, the effective goals are represented as a sequence of scalarized single-objective goals with energy and comfort optimized independently using Deep Q-Networks . The reinforcement learning algorithms that can be used include and are not limited to Neural-Fitted Q-Iteration, Deep Q-Learning, Double Deep Q-Learning, Asynchronous Advantage Actor-Critic (A3C) and model-based reinforcement learning methods such as Recurrent Control Neural Network.

The global learning agents 910, 920 and 930 use the energy and comfort profiles to generate a global optimization policy, which is divided into sub-policies for each floor of the buildings 912, 922 and 932. The execution of the sub-policies is done by the local agents 956, 966, 976, 986, and 996. The distributed execution of the sub-policies on the local agents 956, 966, 976, 986 and 996 ensures efficient utilization of the available computing resources. In an embodiment, a default policy is defined if the sub-policies executed for the local agents 956, 966, 976, 986 and 996 violate pre-defined constraints. For example, if the execution of the sub-policies results in frequent turning on and off of HVAC systems or lighting systems in buildings 912, 922 or 932.

FIG 10 illustrates exemplary steps performed to optimize of energy and comfort across the buildings 912, 922 and 932 in a building automation environment 1002. As shown in the FIG, the buildings 912, 922 and 932 include the floors 914, 916, 918 and 924-928 and 934-938, respectively. Further, the global agents 910, 920 and 930 are associated with the buildings 912, 922 and 932.

The global agents 910, 920 and 930 include a learning agent and an action selector. Referring to FIG 10, the global agent 910 includes a learning agent 1012 and an action selector 1014. Similarly, the global agent 920 includes a learning agent 1022 and an action selector 1024 and so forth for the global agent 930.

In operation, the learning agent 1012 receives real-time energy utilization 1015 at time t for the building 912. The learning agent also receives environment data 1016 associated with state T the building 912 from capturing devices in the building 912. The learning agent 1012 employs the method as disclosed in FIG 1 to determine an optimization policy for the building 912. The action selector 1014 determines an action 1018 based on the optimization policy. The action 1018 is implemented in the building 912 to transition to state T+l. The state T+l 1020 is input to the learning agent 1022, which receives real-time energy utilization and environment data 1026 at time t+l for building 922. The learning agent 1022 determines an optimiza-tion policy and the action selector 1024 determines an action 1028 based on the optimization policy. The action 1028 is performed to enable the building 922 transitions to state T+2 1030. The method is iteratively repeated for each building in the building automation environment 1002. Accordingly, the present invention can be applied to buildings in a smart grid.

The above disclosed method, device and system may be achieved via implementations with differing or entirely different components, beyond the specific components and/or circuitry set forth above. With regard to such other components (e.g., circuitry, computing/processing components, etc.) and/or computer-readable media associated with or embodying the present invention, for example, aspects of the invention herein may be implemented consistent with numerous general purpose or special purpose computing systems or configurations. Various exemplary computing systems, environments, and/or configurations that may be suitable for use with the disclosed subject matter may include, but are not limited to, various clock-related circuitry, such as that within personal computers, servers or server computing devices such as routing/connectivity components, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, smart phones, consumer electronic devices, network PCs, other existing computer platforms, distributed computing environments that include one or more of the above systems or devices, etc.

In some instances, aspects of the invention herein may be achieved via logic and/or logic instructions including program modules, executed in association with the circuitry, for example. In general, program modules may include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular control, delay or instructions. The inventions may also be practiced in the context of distributed circuit settings where circuitry is connected via communication buses, circuitry or links. In distributed settings, control/instructions may occur from both local and remote computer storage media including memory stor-age devices.

The system and computing device along with their components herein may also include and/or utilize one or more type of computer readable media. Computer readable media can be any available media that is resident on, associable with, or can be accessed by such circuits and/or computing components. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.

Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and can accessed by computing component. Communication media may comprise computer readable instructions, data structures, program modules or other data embodying the functionality herein. Further, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, 4G and 5G cellular networks and other wireless media. Combinations of the any of the above are also included within the scope of computer readable media.

In the present description, the terms component, module, device, etc. may refer to any type of logical or functional circuits, blocks and/or processes that may be implemented in a variety of ways. For example, the functions of various circuits and/or blocks can be combined with one another into any other number of modules. Each module may even be implemented as a software program stored on a tangible memory (e.g., random access memory, read only memory, CD-ROM memory, hard disk drive) to be read by a central processing unit to implement the functions of the invention herein. Or, the modules can comprise programming instructions transmitted to a general purpose computer or to processing/graphics hardware via a transmission carrier wave. Also, the modules can be implemented as hardware logic circuitry implementing the functions encompassed by the invention herein. Finally, the modules can be implemented using special purpose instructions (SIMD instructions) , field programmable logic arrays or any mix thereof which provides the desired level performance and cost.

As disclosed herein, implementations and features consistent with the present inventions may be implemented through computer-hardware, software and/or firmware. For example, the systems and methods disclosed herein may be embodied in various forms including, for example, a data processor, such as a computer that also includes a database, digital electronic circuitry, firmware, software, or in combinations of them. Further, while some of the disclosed implementations describe components such as software, systems and methods consistent with the invention herein may be implemented with any combination of hardware, software and/or firmware. Moreover, the above-noted features and other aspects and principles of the invention herein may be implemented in various environments. Such environments and related applications may be specially constructed for performing the various processes and operations according to the present invention or they may include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and may be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines may be used with programs written in accordance with teachings of the invention herein, or it may be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.

Aspects of the method and system described herein, such as the logic, may be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices ("PLDs") , such as field programmable gate arrays ("FPGAs") , programmable array logic ("PAL") devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM) , embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial) , custom devices, fuzzy (neural) logic, quantum devices, and hybrids of any of the above device types. The underlying device technologies may be provided in a variety of component types, e.g., metal-oxide semiconductor field-effect transistor ("MOSFET") technologies like complementary metal-oxide semiconductor ("CMOS") , bipolar technologies like emit-ter-coupled logic ("ECL") , polymer technologies (e.g., silicon-conjugated polymer and metal-conjugated polymer-metal structures), mixed analog and digital, and so on.

It should also be noted that the various logic and/or functions disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioural, register transfer, logic component, and/or other characteristics. Computer-readable me-dia in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in various forms (e.g., optical, magnetic or semiconduc- tor storage media) and carrier waves that may be used to transfer such formatted data and/or instructions through wireless, optical, or wired signalling media or any combination thereof. Examples of transfers of such formatted data and/or instruc-tions by carrier waves include, but are not limited to, transfers (uploads, downloads, e-mail, etc.) over the Internet and/or other computer networks via one or more data transfer protocols (e.g., Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Simple Mail Transfer Protocol (SMTP), and so on) .

Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise," "comprising, " and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of "including, but not limited to." Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words "herein," "hereunder," "above," "below," and words of similar import refer to this application as a whole and not to any particular portions of this application.

Although certain presently preferred implementations of the present invention have been specifically described herein, it will be apparent to those skilled in the art to which the inventions pertain that variations and modifications of the various implementations shown and described herein may be made without departing from the scope of the inventions herein. Accordingly, it is intended that the inventions be limited only to the extent required by the appended claims and the applicable rules of law.

Reference list

computing device 200

communication unit 202

processor 204

capturing device 206

display 208

memory 210

communication network 205 model generator module 212 learning module 214

policy module 216

recommendation module 218 render module 220

model updater module 222 clustering module 224.

state module 232

reward module 234

comfort optimizer module 236 energy optimizer module 238 energy analyzer 242

consumption analyzer 244 ambient analyzer 246

database 250

user device 260

GUI (graphic user interface) 262 system 300

home automation environment 302 rooms 304a-f

occupants 305

energy sources 306

sensors 308

server 320

network 350

database 330

computing device 340

controller 322

memory 324

building model generator 326 computing device 340

energy optimizer 370

state analyzer 372

energy learning agent 374

energy action selector 376

comfort optimizer 380

emotion and behaviour analyzer 382 comfort learning agent 384

comfort action selector 386 occupants 305

policy module 355

recommendation module 365

user device 375

system 600

computing device 640

emotion and behaviour analyzer 682 comfort learning agent 684

comfort action selector 686 system 700

computing device 740

energy learning agent 774

energy action selector 778

replay memory 720

building automation environment 802 floors 802a-c

sensors 808

computing device 840

emotion and behaviour analyzer 882

comfort learning agent 884

comfort action selector 886

cluster 805, 810 and 815

buildings 912, 922 and 932

energy sources 915, 925 and 935

floors 914, 916 and 918, 924-928 and 934-938 local agent 956, 966, 976, 986, 996

energy optimizer 952, 962, 972

comfort optimizer 954, 964, 974

global agents 910, 920 and 930

building automation environment 1002

learning agent 1012 and 1022

action selector 1014 and 1024

real-time energy utilization 1015, 1025 environment data 1016, 1026

action 1018, 1028

state T+l 1020

state T+2 1030