Home
Leaderboard
Home
Leaderboard
Home
Leaderboard
Developing AI Safety through Iterative Problem Solving
The rapid evolution of AI presents continuous challenges. In this dynamic landscape, we adapt, we innovate, and we stay ahead.
The rapid evolution of AI presents continuous challenges. In this dynamic landscape, we adapt, we innovate, and we stay ahead.
March, 2023
The GPT-4 technical report was released, using translated data for evaluation, which revealed a lack of adequate test data for assessing the capabilities of LLMs.
June, 2023
Objective questions (multiple-choice) have become mainstream in model testing. However, how such testing can be scaled for LLM evaluation still needs to be explored.
July, 2023
The development of LLMs proposed new challenges. Many capabilities go beyond the scope of objective questions, with one of the most important being the safety of generated content.
How to effectively evaluate LLMs in non-English languages?
CMMLU: One of the most authoritative benchmark for Chinese LLM Capabilities
CMMLU: One of the most authoritative benchmark for Chinese LLM Capabilities
CMMLU: One of the most authoritative benchmark for Chinese LLM Capabilities
March, 2023
GPT-4 technical report has been released. How do we test the capabilities of foundation models?
How do we scale such testing?
LM-Evaluation-Harness: One of the most impactful opensource benchmark for LLMs
LM-Evaluation-Harness: One of the most impactful opensource benchmark for LLMs
LM-Evaluation-Harness: One of the most impactful opensource benchmark for LLMs
June, 2023
Objective questions has became mainstream in model testing. How do we scale the testing?
How do we assess LLM safety?
July, 2023
How to test the safety of large models?
Do-Not-Answer: Impactful research featured in Stanford AI Index Report
Do-Not-Answer: Impactful research featured in Stanford AI Index Report
Do-Not-Answer: Impactful research featured in Stanford AI Index Report
How can we design affordable test solutions for LLMs?
Alternative Solution: Expert small models that perform the same as LLM evaluators but cost 200 times less
Alternative Solution: Expert small models that perform the same as LLM evaluators but cost 200 times less
Alternative Solution: Expert small models that perform the same as LLM evaluators but cost 200 times less
August, 2023
How to reduce the cost of model testing?
How can we verify factuality that isn't even obvious to humans?
Loki: Open-source agent solution for automating factuality verification
Loki: Open-source agent solution for automating factuality verification
Loki: Open-source agent solution for automating factuality verification
October, 2023
How to verify factuality that cannot be solved by human annotation?
How can we develop a new schema for testing agents?
Specialized Evaluator: testing intermediate results with 20% higher precision and 80% cheaper cost
Specialized Evaluator: testing intermediate results with 20% higher precision and 80% cheaper cost
Specialized Evaluator: testing intermediate results with 20% higher precision and 80% cheaper cost
December, 2023
How to evaluate agents and RAGs?
How to address the issue of data contamination and conduct dynamic testing?
Break down the scope of testing and define knowledge points.
Break down the scope of testing and define knowledge points.
Break down the scope of testing and define knowledge points.
March, 2024
How to address the issue of data contamination and conduct dynamic testing?
June, 2023
Objective questions (multiple-choice) have become mainstream in model testing. However, how such testing can be scaled for LLM evaluation still needs to be explored.
July, 2023
The development of LLMs proposed new challenges. Many capabilities go beyond the scope of objective questions, with one of the most important being the safety of generated content.
August, 2023
With a clearer landscape of LLM testing emerging (using LLMs to test LLMs becoming mainstream), the extremely high cost remains a significant problem.
October, 2023
Most existing evaluations and tests heavily rely on human annotation. However, some tasks are challenging even for humans. A crucial aspect of content safety is factuality.
December, 2023
The concept of agents has emerged, posing new challenges for AI safety. This extends beyond content safety, involving systems interacting with more extensive and sensitive information and actions.
March, 2024
How to address the issue of data contamination and conduct dynamic testing?
Change is constant.
With us, you are always ahead of the curve.
Change is constant.
With us, you are always ahead of the curve.
The LibrAI Team
LibrAI is a team of AI PhDs and engineers with extensive research backgrounds, bringing together deep expertise, hands-on experience, and strong collaboration in cutting-edge AI research, supported by the world-class resources of MBZUAI.
Dr. Xudong Han (CEO)
Dr. Xudong Han (CEO)
Dr. Emad Alghamdi (Regional CEO)
Dr. Emad Alghamdi (Regional CEO)
Yilin Geng (COO)
Yilin Geng (COO)
Dr. Haonan Li (CTO)
Dr. Haonan Li (CTO)
Hao Wang (Head of Engineering)
Hao Wang (Head of Engineering)
Rong Zheng (Senior Engineer)
Rong Zheng (Senior Engineer)
Prof. Timothy Baldwin (Advisor)
Prof. Timothy Baldwin (Advisor)
Dr. Zenan Zhai (AI Researcher)
Dr. Zenan Zhai (AI Researcher)
Dr. Yuxia Wang (AI Researcher)
Dr. Yuxia Wang (AI Researcher)
Keep up to date with LibrAI
Get the latest updates, research insights, and industry trends in AI safety and benchmarking delivered to your inbox. Stay informed with LibrAI.
© LibrAI. Ltd. 2023. All rights reserved. Privacy Policy.
© LibrAI. Ltd. 2023. All rights reserved. Privacy Policy.
© LibrAI. Ltd. 2023. All rights reserved. Privacy Policy.