Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh
We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Code Generation | Shellcode_IA32 | BLEU-4 | 62.97 | LSTM-based Sequence to Sequence |
| Code Generation | Shellcode_IA32 | Exact Match Accuracy | 51.55 | LSTM-based Sequence to Sequence |