Dynamo with the Gemini Vision API test(Nano Banana)
[September 13, 2025] Today’s Research Log: Verifying the Integration of Dynamo with the Gemini Vision API and its Limitations
1. Objective 🎯
The goal was to build an automated workflow within the Autodesk Dynamo environment. This workflow would use Python scripts to connect with Google’s multimodal AI model, aiming to receive image files from the AI that were either modified or newly generated.
2. Summary of Work 💡
I wrote a Python script to call the Google Gemini Vision API from Dynamo and attempted to integrate the API while resolving a series of network and model errors. Through systematic debugging and testing, I discovered that the initial goal of generating images with the ‘Nano Banana’ (Gemini 2.5 Flash) model is not currently supported by the API. As a result, while I successfully created a stable workflow that takes an image as input and returns a text analysis, I concluded that the original goal of image output could not be achieved.
3. Detailed Process & Methodology ⚙️
- Initial Setup and Scripting: I wrote a Python script template in a Dynamo Python node, targeting the ‘Nano Banana’ model. The script was designed to encode images in Base64 and include them in the JSON request body.
- First Error Resolution: I encountered a network error, specifically [Errno 11001]. I diagnosed the issue as a network environment problem (firewall, proxy, etc.), provided a solution to the user, and established a communication base with the server.
- Second Error Resolution: A 404 Not Found error occurred. I confirmed that the model name included in the request URL,
gemini-2.5-flash-preview, was nonexistent. I resolved this by changing the model to the officially availablegemini-1.5-flash-latest. - Third Error Resolution & Debugging: A
KeyError: 'parts'occurred. I identified the cause as a missingpartsarray within the API's response content. - Hypothesis and A/B Testing: Based on the final response, which stated “API responded successfully, but the content was empty. This is likely due to safety filters,” I hypothesized that the AI was returning an empty response instead of generating an image due to safety filters.
- Test 1: I retried with a prompt and an image that were very benign.
- Test 2: I completely removed the image and requested image generation with only a text prompt.
- Final Conclusion: Both tests resulted in the API returning an empty response without an image. This confirmed that the current
v1beta/:generateContentendpoint does not support image editing or generation. It only provides functionality to receive an image input and output text.
4. Key Findings 📝
- Functional Limitations of the Gemini Vision API: While the
gemini-2.5-flashmodel is excellent at analyzing images and generating descriptive text, it does not currently support modifying existing images or returning new ones based on a request. - Completion of a Stable Text Analysis Workflow: Although the initial goal was not met, I successfully completed a powerful Dynamo-Gemini integration script that takes an image as input, analyzes complex content, and returns the answer in text.
5. Challenges & Solutions 🚧
- Challenge 1: Persistent Hallucinations from the AI Assistant
- Cause: The AI assistant mistook the initial successful text responses for the final project goal, overlooking or misinterpreting crucial clues, such as the specific
gemini-2.5-flashmodel version the user requested. - Solution: I corrected the AI’s flawed assumptions through clear and repeated instructions, explicitly redefining the objective as ‘image output’. I also used a debugging mode to output the raw API response JSON, allowing me to visually inspect the actual data and rectify the errors.
6. Next Steps & Future Plans 🚀
- Review Official Documentation: I will periodically check the official Google AI documentation to see if a dedicated API endpoint for image editing and generation has been newly released.
- Leverage Text Results in Dynamo: I will use the completed text output script to develop new workflows. For example, I can send architectural or engineering data (like floor plan images generated in Dynamo) to Gemini AI and utilize the analysis results in other Dynamo nodes.
7. Reflections & Ideas 🤔
This project provided a valuable lesson on the importance of AI assistant hallucinations. It is essential not to blindly trust AI’s answers and to verify hypotheses and data directly during critical decision-making stages. I should also investigate potential performance issues that a Dynamo Python node might face when decoding Base64 data back into an image file if the API ever returns an image.
I hope you found this content helpful. Your support is what fuels this ongoing research and documentation.
If you’d like to contribute, you can buy me a coffee or become a member to take a deeper dive with me on this journey.
By becoming a member, you’ll get access to the test files and supplementary documents featured in the newsletter.
I hope you found this content helpful. Your support is what fuels this ongoing research and documentation.
If you’d like to contribute, you can buy me a coffee or become a member to take a deeper dive with me on this journey.
By becoming a member, you’ll get access to the test files and supplementary documents featured in the newsletter.
댓글
댓글 쓰기