New Benchmark BEGIN about evaluating groundedness in dialogue systems. [Preprint] [Data]