New Frontiers in Visual Language Reasoning:
Compositionality, Prompts and Causality

Morning, Jun 18th, Sun, 2023 CVPR

Recent years have seen the stunning powers of Visual Language Pre-training (VLP) models. Although VLPs have revolutionalized some fundamental principles of visual language reasoning (VLR), the other remaining problems prevent them from “thinking” like a human being: how to reason the world from breaking into parts (compositionality), how to achieve the generalization towards novel concepts provided a glimpse of demonstrations in context (prompts), and how to debias visual language reasoning by imagining what would have happened in the counterfactual scenarios (causality).

The workshop provides the opportunity to gather researchers from different fields to review the technology trends of the three lines, to better endow VLPs with these reasoning abilities. Our workshop also consists of two multi-modal reasoning challenges under the backgrounds of cross-modal math-word calculation and proving problems. The challenges are practical and highly involved with our issues, therefore, shedding more insights into the new frontiers of visual language reasoning.


Call for Papers

Workshop Event Date
Submission Deadline April 10, 2023 (11:59PM EDT)
Notification of Acceptance April 12, 2023 (11:59PM EDT)
Camera-Ready Submission Deadline Due April 14, 2023 (11:59PM EDT)

This workshop pays more attention to their limitations of existing notions on Compositionality, Prompts, and Causality. In the sense of visual language reasoning, compositionality demonstrates how human represents and organizes visual events for reasoning better; prompt-based methods build a cross-modal bridge to leverage a large language model for reasoning on new visual concepts in a vocabulary; and causality prevents the chain of reasoning from spurious features and other confounding factors. They are very important whereas have seldom been investigated together and worked out in VLPs, covering a non-exhaustive list of topics such as:

  • Large-scale visual-language pretraining (VLP) models, visual-language representation learning and reasoning (VLR) (Especially, how VLP and VLR are connected with compositionality, prompts and causality);
  • Causal inference theory, algorithms and models applied for vision, language, and vision-language area;
  • Prompt engineering, tuning and other prompting methods for vision, language, and vision-language area;
  • Compositional representation learning for vision, language, and vision-language area;
  • The relationships and interplay among compositionality, prompts and causality;
  • New benchmarks that evaluate vision-language reasoning in terms of compositionality and causality

Geometric-MPS Challenge

Geometry math-word problem solving (MPS) is a classic cross-modal reasoning problem composed of text and diagram input, promoting challenging multi-modal numerical reasoning tasks. Our workshop challenge includes two competition tracks about solving plane-geometriy MPS problems based on calculation and proving tasks. It encourages the community to develop reasoning algorithms for geometry MPS problems. To this, the challenge does not encourage the usage of gigantic large language pre-training systems and particulary, forbid the implementation of Wolfram and other comparable softwares. For a fairly evaluating the reasoning abilities across all candidates, we only release the train and val datasets for the compeitions, and our test dataset would be kept under wrap in the online evaluation server until the challenge finished. The codes and pre-trained candidate models should be submitted and evaluated online by our evaluation server.

Our competition website is coming with our online evaluation service and more details, would be avaiable in the next week.

Track-1: Geometric-MPS Calculation (Up-coming)


This track is derived from GeoQA benchmark ( containing 4,998 geometry problems from real math exams in middle school, data in which are annotated with execution program sequence for assisting neural-symbolic prediction. Besides, we further collect extra 9000+ geometry problems from the original source without execution program annotation, and divide them into train and test parts. The train part would be combined with GeoQA to construct the training set for Track-1, then the test part would be saved in our server for online model evaluation. The task goal is to answer a seres of single-choice questions in the test bank.

Track-2: Geometric-MPS Proving (Up-coming)


This track is derived from UniGeo benchmark ( with 4,998 calculation MPS instances consistent with GeoQA and extra 9,543 proving MPS instances. The task goal is to generate a consistent proving sequence for each MPS proving problem. Beyond the UniGeo data, we additionally crawl with more than 60,000 proving instances, which consist of more diverse and more complicated MPS proving chains. Similar with Track-1, we divide them into train and test parts, where the train part would be combined with UniGeo to construct the training set, then the test part would be saved in our server for online model evaluation.

Challenge Event Date
Competition site online March 28, 2023 (11:59PM EDT)
Release of train data March 28, 2023 (11:59PM EDT)
Validation server online March 31, 2023 (11:59PM EDT)
Validation set update May 6, 2023 (11:59PM EDT)
Validation set final update May 30, 2023 (11:59PM EDT)
Test phase submission deadline, validation server close June 10, 2023 (11:59PM EDT)
Preliminary test results release to the participants June 12, 2023 (11:59PM EDT)



Vicente Ordonez







Invited speakers