The embodiments herein discloses a system and method for designing SoC by synchronizing a hierarchy of SMDPs. Reinforcement Learning is done either hierarchically in several steps or in a single-step comprising environment, tasks, agents and experiments, to have access to SoC (System on a Chip) related information. The ΑI agent is configured to learn from tire interaction and plan the implementation of a SoC circuit design. Q values generated for each domain and sub domain are stored in a hierarchical SMDP structure in a form of SMDP Q table in a big date database. An optimal chip architecture corresponding to a maximum Q value of a top level in the SMDP Q table is acquired and stored in a database for learning and inference. Desired SoC configuration is optimized and generated based on the optimal chip architecture and the generated chip specific graph library.