Building an Automated Pipeline for Generating Geometric Datasets for AI Training using Dynamo and Python Research Note (2025–09–12)


A Methodology for Automated, Large-Scale Generation of Datasets for AI Fine-tuning Using Dynamo and Generative Design


1. Objective

To develop a fully automated pipeline within Autodesk Dynamo that programmatically generates diverse geometric shapes according to parametric rules and extracts rich metadata for each shape into a JSON format optimized for AI fine-tuning.


2. Summary of Work

The project began with a theoretical exploration of the number of possible triangles within a grid, which then led to the development of a Dynamo Python script to generate the actual geometries. Following this, I built an integrated pipeline that automatically analyzes each generated geometry, extracts its properties, and transforms them into a “prompt-completion” JSON structure ideal for AI training. Throughout this process, several API and data type errors were resolved to deliver a final, stable, and fully automated script.

3. Detailed Process & Methodology

  • Initial Exploration (Theoretical Approach): I began by calculating the number of possible triangles within 100x100 and 3x3 grids to understand the complexity of the problem. This phase also involved exploring criteria for classifying shapes with specific properties (e.g., right-angled, isosceles).
  • Base Geometry Generation (Dynamo Python): Using a Dynamo Python script, I implemented a foundational pipeline that generates all possible 3-point combinations within a grid and filters out collinear combinations to create a batch of valid Triangle Polygon objects.
  • Data Structure Design (Optimized for AI Datasets): To move beyond simple coordinate values and enhance AI training efficiency, I designed a hierarchical JSON structure. This structure includes a natural language description (prompt), geometry type, properties, and core data, a decision made with future Text-to-Geometry model training in mind.
  • Automated Analysis and Information Extraction: I added logic within the Python script to automatically calculate the area, perimeter, side lengths, and type (e.g., right-angled, isosceles) for each batch-generated triangle Polygon. This was a critical step in achieving full automation without manual intervention.
  • Integrated Pipeline Construction and Debugging: The entire process — from geometry creation and analysis to JSON object construction and final JSON string conversion — was unified within a single Python script. I encountered and resolved several errors during this integration, ensuring the script’s stability.




4. Key Findings & Results

  • Final Deliverable: The result is a unified Python script that, with a single click in Dynamo, simultaneously outputs: ① a list of geometries and ② a corresponding list of JSON strings containing detailed metadata for each geometry.
  • Key Insight: This project confirmed that Dynamo can be used not only for the mass generation of complex shapes but also for systematically building high-quality, “ground-truth” datasets ready for immediate use in AI training by analyzing geometric properties in real-time. This reveals Dynamo’s potential to function as a powerful data preprocessing and generation platform, extending far beyond its role as a simple modeling tool.

5. Challenges & Solutions

  • Problem 1: AttributeError: 'Polygon' object has no attribute 'Area' / 'Perimeter'
  • Cause Analysis: In the Dynamo API, a Polygon object is treated as a closed Curve and therefore does not directly possess the .Area property, which belongs to a Surface. Furthermore, its perimeter is defined by the standard Curve property .Length, not .Perimeter.
  • Solution: To calculate the area, I first converted the Polygon into a Surface object using Surface.ByPatch() and then called the .Area property. The perimeter was correctly retrieved using the .Length property.
  • Problem 2: TypeError: Object of type [...] is not JSON serializable
  • Cause Analysis: This issue was diagnosed in two stages.
  • Initial Diagnosis: The numerical values returned by the Dynamo API are of the .NET Double type, not the standard Python float. Consequently, Python's native json library could not serialize them.
  • Root Cause: The error persisted even after explicitly converting the numbers using float(). The fundamental issue was that when data (like a dictionary) is passed from one Dynamo node to another, the Dynamo engine "wraps" the object in its own proprietary format for compatibility, which breaks interoperability with standard Python libraries.
  • Solution: I abandoned the approach of using a separate “JSON converter” node. I redesigned the pipeline to handle geometry creation, analysis, dictionary construction, and final JSON string conversion within a single Python script. This approach completely prevents type mismatch issues caused by data transfer between nodes, resolving the problem at its source.

6. Next Steps & Future Plans

  • Expand Geometry Types: The current generation logic, limited to triangles, needs to be expanded to include more diverse 2D shapes (squares, circles, arcs) and 3D solid geometries.
  • Integrate Image Data Generation: Add a workflow to automatically capture images of the generated geometries and store them alongside the JSON data to build a complete image-text dataset.
  • Apply to Model Training: Utilize the generated dataset to fine-tune an actual AI model and validate the performance of a Text-to-Geometry or Geometry-Property-Prediction model.

7. Reflections & Ideas

This project has proven that the combination of Dynamo and Python is highly effective not just for automating tasks, but for creating the high-quality datasets that serve as the “fuel” for data-driven technologies like AI. Significantly, the error-resolution process itself provided a deeper understanding of Dynamo’s internal data handling mechanisms.

If I had initially hypothesized that data types could be transformed when passed between nodes, I could have avoided significant trial and error. In the future, when using external libraries with Dynamo objects, I will prioritize strategies for maintaining data “purity” throughout the workflow.

The model’s performance could be further enhanced by refining the prompt generation logic. For instance, adding functionality to generate more complex natural language prompts based on vertex positions (“a triangle with a vertex near the origin…”) or orientation (“a triangle that is elongated to the right…”) would be a valuable next step.

[Appendix] Analysis Report on Scaling Down the Grid Problem for AI Training Dataset Generation

1. Overview

  • Purpose of Document: To document the strategic decision-making process of reducing the problem’s scope from a 100x100 grid to a 3x3 grid in order to manage complexity and establish a concrete implementation logic during the initial phase of an AI geometric dataset generation project.
  • Change Summary: The analysis scope was shifted from a theoretical 100x100 grid to a practical, verifiable 3x3 grid.
  • Executive Summary:
  • The 100x100 grid problem was prohibitively large for initial analysis, with over 166 billion possible triangles and complex logic for identifying collinear points.
  • By scaling down to a 3x3 grid, we established a manageable environment with only 76 total triangles, allowing for direct manual calculation and verification of different triangle types (e.g., right-angled, isosceles).
  • This simplification process was a critical step in establishing a ground truth for verifying the accuracy of the subsequent automation script.

2. Initial Problem Definition (100x100 Grid)

2.1. Objective

  • To theoretically calculate the total number of possible triangles that can be formed within a 100x100 grid, which contains 10,000 points.

2.2. Analysis and Limitations

  • Methodology: The approach was to calculate the total combinations of choosing 3 points from 10,000 (C(10000,3)) and then subtract the combinations of three points that lie on a straight line (collinear points).
  • Result: A total of 166,534,567,888 unique triangles.
  • Challenges Encountered (Scale and Complexity):
  • Scale: The sheer number of combinations (in the hundreds of billions) made the result difficult to conceptualize intuitively.
  • Logical Complexity: Unlike horizontal and vertical lines, calculating the number of collinear point sets on diagonal lines with various slopes was highly complex and relied on intensive formulas. This complexity posed a significant challenge for initial implementation and verification.

3. Problem Scaling and Specification (3x3 Grid)

3.1. Objective

  • To overcome the complexity of the 100x100 grid by reducing the problem to a minimal unit where all possible cases could be manually enumerated and verified.
  • To move beyond merely counting the total number of triangles to classifying them by type, laying the groundwork for data labeling and filtering logic.

3.2. Analysis and Results

  • Methodology: Visually identifying and classifying all possible triangles within a 3x3 grid (9 points) based on their geometric properties.

Results

  • Total Triangles76
  • Equilateral Triangles0 (not possible on an integer grid)
  • Right-Angled Triangles40
  • Isosceles Triangles48

Key Outcomes:

  • Clarity: By determining the exact counts for each triangle type, we established clear target values that could be used to verify the accuracy of the automation script to be developed.
  • Logic Specification: This process prompted the formulation of concrete logic for how to programmatically identify different triangle types (e.g., using the Pythagorean theorem for right angles, comparing side lengths for isosceles triangles).

4. Comparative Analysis of Scaling



5. Conclusion & Insights

  • Effectiveness of a Bottom-Up Approach: Shifting from a large, abstract problem (100x100) to a small, concrete one (3x3) was a highly effective strategy. This represents a classic Divide and Conquer approach to problem-solving.
  • Establishing a Verifiable Testbed: The analysis of the 3x3 grid served as a perfect test case for validating the logic of the Python script. For example, if the script, when run on a 3x3 grid, did not produce exactly 40 right-angled triangles, we would know there was a logical error in the code.
  • Establishing a Clear Development Path: By simplifying the problem, the objective shifted from the vague question of “How do we make all possible triangles?” to the specific, actionable goal of “How do we identify a right-angled triangle and structure its properties into a JSON format?”.

[Appendix]test Files

1–1 GD GEOMETRY TEST v3 250912 (eng).dyn

Press enter or click to view image in full size

python

# Boilerplate code to load Dynamo’s geometry libraries
import sys
import clr
clr.AddReference(‘ProtoGeometry’)
from Autodesk.DesignScript.Geometry import *

# For creating combinations of points
import itertools

# Get inputs from the Dynamo node
grid_size = IN[0] if IN[0] else 3 # Default to 3 if no input
spacing = IN[1] if IN[1] else 10 # Default to 10 if no input

# — — 1. Generate Points — -
points = []
for x in range(grid_size):
for y in range(grid_size):
# Create a point at (x*spacing, y*spacing, 0)
points.append(Point.ByCoordinates(x * spacing, y * spacing, 0))

# — — 2. Create Combinations — -
# Get all unique combinations of 3 points from the list
point_combinations = itertools.combinations(points, 3)

# — — 3. Filter and Create Triangles — -
triangles = []
for combo in point_combinations:
p1, p2, p3 = combo[0], combo[1], combo[2]

# Check for collinearity by calculating the area of the triangle.
# If the area is zero, the points are in a straight line.
# Formula: 0.5 * |x1(y2-y3) + x2(y3-y1) + x3(y1-y2)|
area_check = p1.X * (p2.Y — p3.Y) + p2.X * (p3.Y — p1.Y) + p3.X * (p1.Y — p2.Y)

# If the area is not zero (using a small tolerance for safety)
if abs(area_check) > 0.001:
# — — 4. Create Geometry — -
# Create a polygon (triangle) from the 3 points
triangles.append(Polygon.ByPoints([p1, p2, p3]))

# Output the list of triangle polygons to Dynamo
OUT = triangles

1–2 GD GEOMETRY TEST v5 triangle 250912 (eng).dyn

Press enter or click to view image in full size

python

# Load necessary libraries
import sys
import clr

clr.AddReference(‘ProtoGeometry’)
from Autodesk.DesignScript.Geometry import *
import math
import json # Re-added for JSON conversion

# — — Get inputs — -

grid_size = IN[0] if IN[0] else 3

spacing = IN[1] if IN[1] else 10

# — — Initialize lists to store the results — -

geometries = []

json_string_list = [] # Final list of JSON strings to be output

triangle_id_counter = 0

# — — Generate points and their combinations — -

points = [Point.ByCoordinates(x * spacing, y * spacing, 0) for x in range(grid_size) for y in range(grid_size)]
import itertools

point_combinations = itertools.combinations(points, 3)

# — — Main loop: Iterate through all combinations to generate, analyze, and convert to JSON — -
for combo in point_combinations:

p1, p2, p3 = combo[0], combo[1], combo[2]

# Filter for collinear points
area_check = p1.X * (p2.Y — p3.Y) + p2.X * (p3.Y — p1.Y) + p3.X * (p1.Y — p2.Y)

if abs(area_check) > 0.001:

# 1. Generate geometry
triangle_geo = Polygon.ByPoints([p1, p2, p3])

geometries.append(triangle_geo)

# 2. Automated analysis
surface = Surface.ByPatch(triangle_geo)

area = surface.Area

perimeter = triangle_geo.Length

l1, l2, l3 = p1.DistanceTo(p2), p2.DistanceTo(p3), p3.DistanceTo(p1)

sides = sorted([l1, l2, l3])

# Automated triangle type detection
triangle_type = “Scalene”
# … (Type detection logic) …
is_right_angled = abs(sides[0]**2 + sides[1]**2 — sides[2]**2) < 0.001

if is_right_angled:

triangle_type = “Right-Angled “ + triangle_type

# 3. Aggregate information (create a Python dictionary)
prompt_text = “A {} triangle with an area of {:.2f}.”.format(triangle_type, area)

ai_data_object = {

“geometry_id”: “triangle_{}”.format(triangle_id_counter),

“prompt”: prompt_text,

“completion”: {

“type”: “Polygon”,

“properties”: { “name”: triangle_type, “vertex_count”: 3, “area”: float(round(area, 2)), “perimeter”: float(round(perimeter, 2)), “side_lengths”: [float(round(s, 2)) for s in sides] },

“data”: { “vertices”: [{“x”: float(p1.X), “y”: float(p1.Y), “z”: float(p1.Z)}, {“x”: float(p2.X), “y”: float(p2.Y), “z”: float(p2.Z)}, {“x”: float(p3.X), “y”: float(p3.Y), “z”: float(p3.Z)} ] }

}

}

# 4. Immediately convert the created dictionary to a JSON string and add it to the list
json_string = json.dumps(ai_data_object, indent=2, ensure_ascii=False)

json_string_list.append(json_string)

triangle_id_counter += 1

# — — Final output — -
# OUT[0]: List of geometries
# OUT[1]: List of JSON strings

OUT = [geometries, json_string_list]

I hope you found this content helpful. Your support is what fuels this ongoing research and documentation.

If you’d like to contribute, you can buy me a coffee or become a member to take a deeper dive with me on this journey.

By becoming a member, you’ll get access to the test files and supplementary documents featured in the newsletter.


more article

[ Buy Me a Coffee ]

[Become a Member & Get Access Newsletter 📧 Data Box 💽 ]

댓글

이 블로그의 인기 게시물

Geometry test 0506 stair and routing

Structural Analysis Workflow with Dynamo and Robot