Linked Presentation: Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism