Machine Learning for CAD/CAM Data

Taylor Hale Robert
4 min readOct 22, 2021

Employing machine learning with Accumark CSVTools reports

Photo by Phillip Goldsberry on Unsplash

Accumark has a variety of built-in reports that Gerber users can leverage to understand many quantitative features of their CAD/CAM work.

But what if we could make this data do more? It’s tidy and accessible. It contains troves of information about how companies build their products. Could we leverage this historic data into something more useful in the the future? Say, customized formulas for product development?

I‘ve been deeply curious. Below is a bit about what I learned.

About the Data

I was kindly gifted a dataset pulled directly from the Accumark set of tools. After I left my job in the furniture industry #thegreatresignation, I reached out to a competing company explaining what I thought I could do and provided proof-of-concept for the beginning stages of the idea. The company was excited about my work so far and offered a dataset for me to work with.

The CSVTOOL reports contain information like perimeter, area, number of internal lines, and more. These days Accumark has a fleshed out GUI for these reports, but the company I worked for was way out-of-date in software so my experience is only by using the command line, which makes the process easy to script. Using the Python3 OS module you can loop through all the reports and storage areas you need to expediently.

Patterns Across Assemblies

I was searching for a way to model how companies build what they build, in a way that could be modeled with existing data.

Before I tried much feature engineering or heavily transformed these datasets to draw out some of the characteristics I thought might be most important, I wanted to see how well these objects clustered with no manipulation. Using available features I applied a few clustering algorithms to see how they performed. At a high level, I was interested in modeling the patterns across subassemblies and assemblies. Using PCA to reduce the dimensionality of the dataset, you can see a dmppiece report represented in the compressed feature space.

dmppiece Accumark report reducted to 3 dimensions using PCA

As you can see above, pattern pieces form clusters based on their characteristics. I was excited to see this, and moved on to explore how I could most accurately represent the relationships in the data above.

Use Cases

The benefits of being able to accurately represent a company’s specific style of development quickly and accurately are many. Using machine learning to build formulas and automate product development, in a customized way, has the potential to decrease overhead for software customers significantly. Automation in the CAD room can increase development speed and maximize the number of products a company is able to introduce per season. Additionally, offloading some of the cultural development knowledge to an algorithm can relieve some of the increasing staffing pressure — experienced furniture pattern makers are rapidly aging out of the industry, and their millenial replacements prefer digital tools and have accrued fewer years of experience.

Model Performance

This was a multi-level classification task. Output from the first model, which classified the parts, was fed into the second model, which classified the object.

Initially, performance for the first level of modeling hovered around 75% accuracy, when attempting to classify pieces as one of 37 categories. After spending some time with this data, I realized that an enormous amount of information was contained in text fields that allow Accumark users to input their own sort of “code” that they use internally. I created an NLP-inspired way to encode this information and transform it into boolean features. By performing this feature engineering I improved performance to 99% correct classification across the test set, which held across cross validation folds and different datasets.

I performed the second level of modeling on a dataset created from the Accumark reports, but it was entirely composed of aggregate data from across many types of reports. The first run of 8 baseline classifiers performed surprisingly well — classifying objects from a selection of 7 different labels accurately 82% of the time. I continued to refine my features and incrementally improved performance.

In the end, successful modeling at both levels meant leveraging the most valuable data — custom vocabulary. Incorporating some text-based feature engineering pushed the performance of this algorithm to 98–99% accuracy for both levels of modeling. From here I’ll work to transform this modeled data into functional templates for production. If you have questions about this project, please feel free to reach out.

P.S. there is little to be found about Accumark CSVTOOLS floating around the internet. It’s a particular interest of mine, so if you want to talk about them, PLEASE email me.

--

--

Taylor Hale Robert

data science + workflow automation, and the health + habits that support the work