ProtoCode: Leveraging Large Language Models for Automated Generation of Machine-Readable Protocols from Scientific Publications

11 Dec 2023 · Shuo Jiang, Daniel Evans-Yamamoto, Dennis Bersenev, Sucheendra K. Palaniappan, Ayako Yachie-Kinoshita ·

Protocol standardization and sharing are crucial for reproducibility in life sciences. In spite of numerous efforts for standardized protocol description, adherence to these standards in literature remains largely inconsistent. Curation of protocols are especially challenging due to the labor intensive process, requiring expert domain knowledge of each experimental procedure. Recent advancements in Large Language Models (LLMs) offer a promising solution to interpret and curate knowledge from complex scientific literature. In this work, we develop ProtoCode, a tool leveraging fine-tune LLMs to curate protocols which can be interpretable by both human and machine interfaces. Our proof-of-concept, focused on polymerase chain reaction (PCR) protocols, retrieves information from PCR protocols at an accuracy ranging 69-100% depending on the information content. In all the tested protocols, we demonstrate that ProtoCode successfully converts literature-based protocols into correct operational files for multiple thermal cycler systems. In conclusion, ProtoCode can alleviate labor intensive curation and standardization of life science protocols to enhance research reproducibility by providing a reliable, automated means to process and standardize protocols. ProtoCode is freely available as a web server at https://curation.taxila.io/ProtoCode/.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

ProtoCode: Leveraging Large Language Models for Automated Generation of Machine-Readable Protocols from Scientific Publications

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove