This study introduces VideoMol, a molecular video-based foundation model pre-trained on 120 million frames of 2 million drug-like and bioactive molecules. VideoMol represents molecules as 60-frame videos and employs three self-supervised learning strategies to capture molecular representations. It demonstrates high accuracy in predicting molecular targets and properties, particularly in identifying antiviral molecules against EBV and EPD, and shows promise in improving binding affinity predictions compared to existing molecular docking references. The model's interpretability is also highlighted through the visualization of key molecular substructures.