πŸ’―Bulk Step Testing for Robustness

This feature helps you ensure your steps are robust by running them multiple times and analyzing the resulting actions.

What is a Robust Step in a Prompt?

A robust step in a prompt consistently produces the desired outcome, regardless of minor variations in the input or the system's state.

Using Bulk Step Testing

New Feature: Misinterpreted Prompt

This feature allows you to refine your prompt based on misinterpretations:

  1. Tap Start: A new editor window will open.

  2. Review Original Step & Result: The original prompt, screenshot of the unexpected action, and reasoning behind the action will be displayed.

  3. Refine Your Step: Edit the step in the provided editor.

  4. Bulk Test the Updated Step: Choose the number of times you want to run the revised prompt.

  5. Analyze Results: The left side of the screen will show the original screenshot, reasoning, and action. The right side will display the results of the bulk test with the updated step, including new screenshots, reasoning, and actions.

  6. Copy over Refined Step: If the new step worked, tap on 'Yes, Copy and Exit' to be redirected back to the test editor and simply tap on the highlighted text to replace with the copied step.

Benefits

  • Identify inconsistencies in your steps.

  • Improve the reliability of your steps.

  • Save time by testing multiple variations quickly.

Tips

  • Start with a small number of bulk runs (e.g., 3) to ensure the new step is working. Then, increase the number to confirm its reliability (e.g., 10 runs).

  • Look for patterns in the results to identify potential issues with the step.

  • Use the reasoning provided by the system to understand why the step might be misinterpreted.

Last updated