Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on Chain of Thought

19 Feb 2024  ·  Yuying Du, Xueyan Tang ·

Smart contracts, as a key component of blockchain technology, play a crucial role in ensuring the automation of transactions and adherence to protocol rules. However, smart contracts are susceptible to security vulnerabilities, which, if exploited, can lead to significant asset losses. This study explores the potential of enhancing smart contract security audits using the GPT-4 model. We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark vulnerability library, containing 732 vulnerabilities, and compared it with five other vulnerability detection tools to evaluate GPT-4's ability to identify seven common types of vulnerabilities. Moreover, we assessed GPT-4's performance in code parsing and vulnerability capture by simulating a professional auditor's auditing process using CoT(Chain of Thought) prompts based on the audit reports of eight groups of smart contracts. We also evaluated GPT-4's ability to write Solidity Proof of Concepts (PoCs). Through experimentation, we found that GPT-4 performed poorly in detecting smart contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of 37.8%, and an F1-score of 41.1%, indicating a tendency to miss vulnerabilities during detection. Meanwhile, it demonstrated good contract code parsing capabilities, with an average comprehensive score of 6.5, capable of identifying the background information and functional relationships of smart contracts; in 60% of the cases, it could write usable PoCs, suggesting GPT-4 has significant potential application in PoC writing. These experimental results indicate that GPT-4 lacks the ability to detect smart contract vulnerabilities effectively, but its performance in contract code parsing and PoC writing demonstrates its significant potential as an auxiliary tool in enhancing the efficiency and effectiveness of smart contract security audits.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods