Bank bailouts are controversial governmental decisions impacting taxpayers. This paper proposes a dynamic financial network framework using a Markov Decision Process (MDP) and artificial intelligence to learn optimal bailout actions minimizing taxpayer losses. Analyzing European global systemically important institutions, the study finds bailout optimality depends on taxpayer stakes exceeding a critical level determined by network characteristics. Intervention increases with network distress, taxpayer stakes, credit exposures, and crisis duration. The model suggests continued bailouts for previously rescued banks, highlighting the potential for moral hazard.