NL2GQL Dataset
NL2GQL developped for R3-NL2GQL
A bilingual (English and Chinese natural language queries) dataset which has NL queries annotated with their corresponding GQL queries (i.e. nGQL). Each data sample in the train data contains 4 pieces of information: prompt represents a natural language query, content represents a standard nGQL, reason represents the inference part that needs to be output by the reranker, and schema represents the code structure schema corresponding to this sentence. Each data sample in the test data contains 6 pieces of information, prompt represents natural language query, content represents gold nGQL, text_schema is used for the vanilla experiment, schema represents the code structure schema corresponding to this sentence, class represents which graph database space this sentence corresponds to, and result represents the results obtained using gold nGQL.