Large language models (LLMs) have been shown to be capable of impressive.
few-shot generalisation to new tasks. However, they still tend to perform.
poorly on multi-step logical reasoning problems. Here we carry out a.
comprehensive evaluation of LLMs on 50 tasks that probe different aspects of.
Logical reasoning. We show that language models tend to perform fairly well at.
Single step inference or entailment tasks, but struggle to chain together.
multiple reasoning steps to solve more complex problems. In light of this, we.
Propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as.
General processing modules, and alternates between selection and inference to.