Blog

Factcheck-GPT: End-to-End Fine-Grained Document-Level Fact-Checking and Correction of LLM Output

Nov 2023

we present a holistic end-to-end solution for annotating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels concerning the verifiability and factual inconsistencies found in LLM outputs. [More]

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Sept 2023

we collect an open-source dataset to evaluate safeguards in LLMs, and deploy safer open-source LLMs at a low cost. Recently, we extend this dataset to Chinese. Further, we extend the dataset to 3x amount. [More]

CMMLU: Measuring massive multitask language understanding in Chinese

Jun 2023

We proposed a comprehensive Chinese assessment suite specifically designed to evaluate the advanced knowledge and reasoning abilities of LLMs in a Chinese linguistic and cultural context. It has been widely used in Chineses LLM community [More]

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

May 2023

We develop automatic systems to identify machine-generated text and to detect potential misuse. We then introduce a large-scale benchmark M4, which is multi-generator, multi-domain, and multi-lingual corpus for machine-generated text detection. [More]

Balancing out Bias: Achieving Fairness Through Balanced Training

2022

We introduce a simple, but highly effective, objective for countering bias using balanced training. [More]

FairLib: A unified framework for assessing and improving fairness

2022

We present an open-source Python library for assessing and improving model fairness. It provides a systematic framework for quickly accessing benchmark datasets, reproducing existing debiasing baseline models, developing new methods, evaluating models with different metrics, and visualizing their results. [More]

Everybody needs good neighbours: An unsupervised locality-based method for bias mitigation

2022

we propose a new meta-algorithm for debiasing representation learning models, which combines the notions of data locality and accuracy of model fit, such that a supervised debiasing method can optimise fairness between neighbourhoods of poorly vs. well modelled instances as identified by our method. [More]