Your team is experimenting with developing smaller, distilled LLMs for a specific domain. You have performed batch inference on a dataset by using several variations of your distilled LLMs and stored the batch inference outputs in Cloud Storage. You need to create an evaluation workflow that integrates with your existing Vertex AI pipeline to assess the performance of the LLM versions while also tracking artifacts. What should you do?
5091a99
2 months ago