Quick Start
Development Approach
MaaFramework provides three integration solutions to meet different development scenarios:
Approach 1: Pure JSON Low-Code Programming (General UI)
Applicable Scenarios: Quick start, simple logic implementation
Features:
- Zero coding prerequisite
- Automated processes configured through JSON
- Comes with a 🎞️ video tutorial and a ⭐ project template
{
"Click Start Button": {
"recognition": "OCR", // Text recognition engine
"expected": "Start", // Target text
"action": "Click", // Execute click action
"next": ["Click Confirm Icon"] // Subsequent task chain
},
"Click Confirm Icon": {
"recognition": "TemplateMatch", // Image template matching
"template": "confirm.png", // Matching asset path
"action": "Click"
}
}
Approach 2: JSON + Custom Logic Extension (Recommended)
💡 Core feature of release v4.x
Features:
- Retains the low-code advantage of JSON
- Registers custom task modules through AgentServer
- Seamlessly integrates with the ⭐ boilerplate.
{
"Click Confirm Icon": {
"next": ["Custom Processing Module"]
},
"Custom Processing Module": {
"recognition": "Custom",
"custom_recognition": "MyReco", // Custom recognizer ID
"action": "Custom",
"custom_action": "MyAct" // Custom action ID
}
}
💡 The General UI will automatically connect to your AgentServer child process, and call the corresponding recognition/action when executing MyReco
/MyAct
.
# Python pseudo-code example
from maa.agent.agent_server import AgentServer
# Register a custom recognizer
@AgentServer.custom_recognition("MyReco")
class CustomReco:
def analyze(ctx):
return (10, 10, 100, 100) # Return your own processed recognition result
# Register a custom action
@AgentServer.custom_action("MyAct")
class CustomAction:
def run(ctx):
ctx.controller.post_click(100, 10).wait() # Execute click
ctx.override_next(["TaskA", "TaskB"]) # Dynamically adjust the task flow
# Start the Agent service
AgentServer.start_up(sock_id)
For a complete example, refer to the template commit.
Approach 3: Full-Code Development
Applicable Scenarios:
- Deep customization requirements
- Implementation of complex business logic
- Need for flexible control over the execution flow
# Python pseudo-code example
def main():
# Execute the predefined JSON task
result = tasker.post_task("Click Start Button").wait().get()
if result.completed:
# Execute code-level operations
tasker.controller.post_click(100, 100)
else:
# Get the current screenshot
image = tasker.controller.cached_image
# Register a custom action
tasker.resource.register_custom_action("MyAction", MyAction())
# Execute a mixed task chain
tasker.post_task("Click Confirm Icon").wait()
Resource Preparation
File Structure Specification
⭐If you use the boilerplate, just modify it directly in folder.
my_resource/
├── image/ # Image asset library
│ ├── my_button_ok.png
│ └── my_icon_close.png
├── model/
│ └── ocr/ # Text recognition models
│ ├── det.onnx
│ ├── keys.txt
│ └── rec.onnx
└── pipeline/ # Task pipelines
├── my_main.json
└── my_subflow.json
You can modify the names of files and folders starting with "my_", but the others have fixed file names and should not be changed. Here's a breakdown:
Pipeline JSON Files
The files in my_resource/pipeline
contain the main script execution logic and recursively read all JSON format files in the directory.
You can refer to the Task Pipeline Protocol for writing these files. You can find a simple demo for reference.
Tools:
- JSON Schema
- VSCode Extension
- Config resources based on
interface.json
- Support going to task definition, finding task references, renaming task, completing task, click to launch task
- Support launching as MaaPiCli
- Support screencap and crop image after connected
- Config resources based on
Image Files
The files in my_resource/image
are primarily used for template matching images, feature detection images, and other images required by the pipeline. They are read based on the template
and other fields specified in the pipeline.
Please note that the images used need to be cropped from the lossless original image and scaled to 720p. UNLESS YOU EXACTLY KNOW HOW MAAFRAMEWORK PROCESSES, DO USE THE CROPPING TOOLS BELOW TO OBTAIN IMAGES.
Text Recognition Model Files
⭐If you use the boilerplate, just follow its documentation and run configure.py
to automatically deploy the model file.
The files in my_resource/model/ocr
are ONNX models obtained from PaddleOCR after conversion.
You can use our pre-converted files: MaaCommonAssets. Choose the language you need and store them according to the directory structure mentioned above in Prepare Resource Files.
If needed, you can also fine-tune the official pre-trained models of PaddleOCR yourself (please refer to the official PaddleOCR documentation) and convert them to ONNX files for use. You can find conversion commands here.
Debug
Use Development Tool.
Some tools will generate
config/maa_option.json
file in the same directory, including:logging
: Save the log and generatedebug/maa.log
. Default true.recording
: Save recording function, which will save all screenshots and operation data during operation. You can useDbgController
for reproducible debugging. Default false.save_draw
: Saves the image recognition visualization results. All image recognition visualization results drawings during the run will be saved. Default false.show_hit_draw
: Displays the node hit pop-up window. Each time the recognition is successful, a pop-up window will display the recognition result. Default false.stdout_level
: The console displays the log level. The default is 2 (Error), which can be set to 0 to turn off all console logs, or to 7 to turn on all console logs.
If you integrate it yourself, you can enable debugging options through the
Toolkit.init_option
/MaaToolkitConfigInitOption
interface. The generated json file is the same as above.
Run
You can using Generic UI (MaaPiCli, MFA, MFW, etc) or by writing integration code yourself.
Using MaaPiCli
⭐If you use the boilerplate, follow its documentation directly and run install.py
to automatically package the relevant files.
Use MaaPiCli in the bin
folder of the Release package, and write interface.json
and place it in the same directory to use it.
The Cli has completed basic function development, and more functions are being continuously improved! Detailed documentation needs to be further improved. Currently, you can refer to Sample to write it.
Examples:
Writing Integration Code Yourself
Please refer to the Integration Documentation.