TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/The Devil is in the Detail: Simple Tricks Improve Systemat...

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

2021-08-26EMNLP 2021 11Systematic Generalization
PaperPDFCode(official)Code

Abstract

Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results.

Related Papers

Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey2025-06-04Systematic Generalization in Language Models Scales with Information Entropy2025-05-19Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality2025-04-02Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models2025-03-12Enhancing NLP Robustness and Generalization through LLM-Generated Contrast Sets: A Scalable Framework for Systematic Evaluation and Adversarial Training2025-03-09Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization2025-02-07Towards Conscious Service Robots2025-01-25Inductive Biases for Zero-shot Systematic Generalization in Language-informed Reinforcement Learning2025-01-25