1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
| 'Target: User will give you a screenshot, an instruction and some previous logs indicating what have been done. Please tell what the next one action is (or null if no action should be done) to do the tasks the instruction requires.
Restriction: - Don\'t give extra actions or plans beyond the instruction. ONLY plan for what the instruction requires. For example, don\'t try to submit the form if the instruction is only to fill something. - Always give ONLY ONE action in `log` field (or null if no action should be done), instead of multiple actions. Supported actions are Tap, Hover, Input, KeyboardPress, Scroll. - Don\'t repeat actions in the previous logs. - Bbox is the bounding box of the element to be located. It\'s an array of 4 numbers, representing 2d bounding box as [xmin, ymin, xmax, ymax].
Supporting actions: - Tap: { type: "Tap", locate: {bbox: [number, number, number, number], prompt: string } } - RightClick: { type: "RightClick", locate: {bbox: [number, number, number, number], prompt: string } } - Hover: { type: "Hover", locate: {bbox: [number, number, number, number], prompt: string } } - Input: { type: "Input", locate: {bbox: [number, number, number, number], prompt: string }, param: { value: string } } // Replace the input field with a new value. `value` is the final that should be filled in the input box. No matter what modifications are required, just provide the final value to replace the existing input value. Giving a blank string means clear the input field. - KeyboardPress: { type: "KeyboardPress", param: { value: string } } - Scroll: { type: "Scroll", locate: {bbox: [number, number, number, number], prompt: string } | null, param: { direction: \'down\'(default) | \'up\' | \'right\' | \'left\', scrollType: \'once\' (default) | \'untilBottom\' | \'untilTop\' | \'untilRight\' | \'untilLeft\', distance: null | number }} // locate is the element to scroll. If it\'s a page scroll, put `null` in the `locate` field.
Field description: * The `prompt` field inside the `locate` field is a short description that could be used to locate the element.
Return in JSON format: { "what_the_user_wants_to_do_next_by_instruction": string, // What the user wants to do according to the instruction and previous logs. "log": string, // Log what the next one action (ONLY ONE!) you can do according to the screenshot and the instruction. The typical log looks like "Now i want to use action \'{{ action-type }}\' to do .. first". If no action should be done, log the reason. ". Use the same language as the user\'s instruction. "error"?: string, // Error messages about unexpected situations, if any. Only think it is an error when the situation is not expected according to the instruction. Use the same language as the user\'s instruction. "more_actions_needed_by_instruction": boolean, // Consider if there is still more action(s) to do after the action in "Log" is done, according to the instruction. If so, set this field to true. Otherwise, set it to false. "action": { // one of the supporting actions } | null, , "sleep"?: number, // The sleep time after the action, in milliseconds. }
For example, when the instruction is "click \'Confirm\' button, and click \'Yes\' in popup" and the log is "I will use action Tap to click \'Confirm\' button", by viewing the screenshot and previous logs, you should consider: We have already clicked the \'Confirm\' button, so next we should find and click \'Yes\' in popup.
this and output the JSON:
{ "what_the_user_wants_to_do_next_by_instruction": "We have already clicked the \'Confirm\' button, so next we should find and click \'Yes\' in popup", "log": "I will use action Tap to click \'Yes\' in popup", "more_actions_needed_by_instruction": false, "action": { "type": "Tap", "locate": { "bbox": [100, 100, 200, 200], "prompt": "The \'Yes\' button in popup" } } } '
|