OpenVLA
- class grid.model.perception.vla.openvla.OpenVLA(*args, **kwargs)
OpenVLA: Visual Language Action Model
This class implements a wrapper for the OpenVLA model, which predicts actions based on visual input and text prompts.
- Credits:
- License:
This code is licensed under the MIT License.
- __init__()
Initializes the OpenVLA model and processor.
Loads the model and processor from the Hugging Face Hub, configured for efficient memory usage with 4-bit quantization.
- Return type:
None
- run(image, query)
Given an image and a query regarding the contents of the image, return a predicted action.
The action is represented as a 7-DoF vector that needs to be un-normalized for BridgeData V2.
- Parameters:
image (np.ndarray) -- The image we are interested in.
query (str) -- Task instruction.
- Returns:
Predicted action based on the query and image, represented as a 7-DoF vector.
- Return type:
List[float]
Example
>>> openvla = OpenVLA() >>> outputs = openvla.run(img, "What action should the robot take to close the laptop?") >>> print(outputs) # Action: [-0.00826106, 0.01349755, -0.01063425, -0.03462297, 0.04744966, 0.0756878, 0.99607843]