TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Shellcode_IA32: A Dataset for Automatic Shellcode Generation

Shellcode_IA32: A Dataset for Automatic Shellcode Generation

Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh

2021-04-27ACL (NLP4Prog) 2021 8Machine TranslationNMTTranslationCode Generation
PaperPDFCode(official)

Abstract

We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.

Results

TaskDatasetMetricValueModel
Code GenerationShellcode_IA32BLEU-462.97LSTM-based Sequence to Sequence
Code GenerationShellcode_IA32Exact Match Accuracy51.55LSTM-based Sequence to Sequence

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16Function-to-Style Guidance of LLMs for Code Translation2025-07-15The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2025-07-14