Extracting the meticulous alpha matte of the specific object from the image that can best match the given natural language description, e.g., a keyword or a expression.