[AI Reality Check] Gemini 2.5 Pro vs. Fine-Tuning: A Survival Guide for Engineers

Why training your own AI is 90% Engineering and 10% Magic (Log of 22 Hurdles)


https://www.linkedin.com/pulse/image-geometry-should-we-use-gemini-25-pro-fine-tune-our-wonho-cho-1y8wc


Introduction: The Dream of Custom AI What if AI could go beyond simply reading design drawings and start creating new geometry on its own? This is the ultimate vision—one that could fundamentally shift the paradigm of design automation. To test this, we defined a concrete process: an AI recognizes geometry in an image, interprets it into a JSON structure, and then reconstructs the shape using Dynamo.

To test the feasibility of this process, we explored two contrasting paths simultaneously: leveraging the powerful Gemini 2.5 Pro API (The Expert) versus fine-tuning our own open-source model (The Student) using Hugging Face and Google Colab.

The results were starkly divergent, and this post is a candid record of the "Wall of Reality" we hit.

1. The Showdown: Expert API vs. Custom Student

  • The Expert (Gemini 2.5 Pro): The results were extremely positive. It interpreted complex geometric images into JSON with surprising accuracy, proving our automation vision is practically achievable.

  • The Student (Fine-Tuned Gemma 3B/Paligemma): Despite resolving technical issues, the output was a failure. Instead of sophisticated JSON, it output a basic sentence: "a diagram of a cube."

2. The 22 Hurdles: Fine-Tuning is "Engineering," Not "AI" Before the model could even reach the starting line, we had to confront 22 technical hurdles. We learned that 90% of success in fine-tuning depends not on AI theory, but on invisible, foundational engineering.

  • Data Traps: A single invisible "special whitespace character" or a misunderstanding of the .jsonl format halted the entire process.

  • Infrastructure Hell: From Colab GPU incompatibility (NVIDIA vs. T4) to session restarts wiping out libraries, the environment was a constant battle.

  • Library Conflicts: We faced deadlocks where the automation library (SFTTrainer) conflicted with our data processing, forcing us to switch to the manual Trainer method.

3. Key Lesson: Pre-training is a "Habit" Why did the fine-tuning fail to produce JSON? We realized that for a model pre-trained on billions of data points, describing an image in English is more than knowledge; it's a powerful "habit" or "instinct." Our small dataset was nowhere near sufficient to break this deeply ingrained habit.

Conclusion: A Survival Guide, Not a Success Story We didn't gain a high-performance model, but we gained cost-based strategic insight. We now know that the true cost of fine-tuning includes the immense time required to build high-quality datasets and overcome engineering hurdles.

Unless you have massive data and resources, Gemini 2.5 Pro is the "Lighthouse" showing us the way. For those brave enough to try fine-tuning, use our trial-and-error log as a map to avoid the traps we fell into.

댓글

이 블로그의 인기 게시물

Geometry test 0506 stair and routing

Structural Analysis Workflow with Dynamo and Robot