TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Only Connect Walls Dataset Task 1 (Grouping)	OCW	Human Performance	# Correct Groups	1405	# 1
Only Connect Walls Dataset Task 1 (Grouping)	OCW	Human Performance	# Solved Walls	285	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/large-language-models-are-fixated-by-red-1/task-1-grouping-on-ocw)](https://paperswithcode.com/sota/task-1-grouping-on-ocw?p=large-language-models-are-fixated-by-red-1)`

Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset

NeurIPS 2023 · Saeid Naeini, Raeid Saqur, Mozhgan Saeidi, John Giorgi, Babak Taati ·

The quest for human imitative AI has been an enduring topic in AI research since its inception. The technical evolution and emerging capabilities of the latest cohort of large language models (LLMs) have reinvigorated the subject beyond academia to the cultural zeitgeist. While recent NLP evaluation benchmark tasks test some aspects of human-imitative behaviour (e.g., BIG-bench's 'human-like behavior' tasks), few, if not none, examine creative problem solving abilities. Creative problem solving in humans is a well-studied topic in cognitive neuroscience with standardized tests that predominantly use the ability to associate (heterogeneous) connections among clue words as a metric for creativity. Exposure to misleading stimuli - distractors dubbed red herrings - impede human performance in such tasks via the fixation effect and Einstellung paradigm. In cognitive neuroscience studies, such fixations are experimentally induced by pre-exposing participants to orthographically similar incorrect words to subsequent word-fragments or clues. The popular British quiz show Only Connect's Connecting Wall segment essentially mimics Mednick's Remote Associates Test (RAT) formulation with built-in, deliberate red herrings, which makes it an ideal proxy dataset to explore and study fixation effect and Einstellung paradigm from cognitive neuroscience in LLMs. In this paper we present the novel Only Connect Wall (OCW) dataset and report results from our evaluation of selected pre-trained language models and LLMs on creative problem solving tasks like grouping clue words by heterogeneous connections, and identifying correct open knowledge domain connections in respective groups. We synthetically generate two additional datasets: OCW-Randomized, OCW-WordNet to further analyze our red-herrings hypothesis in language models. The code and link to the dataset are available at https://github.com/TaatiTeam/OCW.

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

taatiteam/ocw official

Tasks

Add Remove

Only Connect Walls Dataset Task 1 (Grouping)

Only Connect Walls Dataset Task 2 (Connections)

Datasets

Introduced in the Paper:

OCW

Used in the Paper:

HumanEval

BIG-bench

Results from the Paper

Add Remove

Ranked #1 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (# Correct Groups metric, using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Uses Extra Training Data	Benchmark
Only Connect Walls Dataset Task 1 (Grouping)	OCW	Human Performance	# Correct Groups	1405	# 1		Compare
Only Connect Walls Dataset Task 1 (Grouping)	OCW	Human Performance	# Solved Walls	285	# 1		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove