Audio diarization and Whisper transcription pipeline that packages speaker segments as a labeled dataset.