This week fooling around as always in some science channels I stumbled upon a very interesting video from NightHawkInLight, “Using Factory Optimization to Peel Eggs & Improve Everything (Multivariate Experimental Design)”. It was kind of funny and I couldn’t hold myself.

Experimenting and measuring how much a variable can alter a result is not an easy task. Imagine you have 3 variables each with 3 values possible, to make sure you have tested all the combinations you will have to run 27 experiments. If you are a machine learning engineer each of those will be under the 5-cross-fold protocol, make the math and you will end up running 135 training jobs.

Now, back to the video, seeing him using a very a near-deterministic (many quotes) way of identifying which variable changed the easiness of pealing an Egg was not only fun but very enlightening.

We have a lot of methods to reduce the toll of training exhaustive combinations of parameters, HalvingGridSearch, Gaussian Methods and others. But nothing like that in the video where you can be sure to run a simple set of sums and know (or at least have clue) of which parameter change for good the training results.

The result was surprisingly good and with some little tweaks here and there. We can substitute those 27 experiments and 135 training jobs by 9 + 1 experiments with 50 jobs and almost the same score. That +1 I’ll explain later in the article.

As a father of a toddler, with 3 dogs, a cat and a cockatiel who got woken at 6:35 AM on a Sunday why not test it out. This is not a scientific article, but it could lead to some good insights.

Then, the saga begins:

The **Taguchi Method**, developed by Genichi Taguchi, is a statistical method sometimes referred to as robust design methods. It was initially created to improve the quality of manufactured goods but has since been applied to various fields such as engineering, biotechnology, marketing, and advertising. The method aims to reduce variation in a process through robust design of experiments. The overall objective is to produce high-quality products at a low cost to the manufacturer.

One of the key components of the Taguchi Method is the use of **orthogonal arrays** in experimental design. These arrays are used to organize the parameters affecting the process and the levels at which they should be varied. Instead of having to test all possible combinations like in factorial design, the Taguchi method tests pairs of combinations. This approach allows for the collection of necessary data to determine which factors most affect product quality with a minimum amount of experimentation, thus saving time and resources.

The Taguchi method is best used when there is an intermediate number of variables (3 to 50), few interactions between variables, and when only a few variables contribute significantly. The Taguchi arrays can be derived or looked up. Small arrays can be drawn out manually; large arrays can be derived from deterministic algorithms. Analysis of variance on the collected data from the Taguchi design of experiments can be used to select new parameter values to optimize the performance characteristic.

A good example of this is the combination of 3 variables capable of 2 values each. Such as A, B, C with values L and H. In an exhaustive example we must run 8 experiments, but with Taniguchi orthogonal arrays 4 experiments is enough. To measure the impact of a score for each value of each variable.

Then summing up the scores for each Variable Value such as:

`A = L; Score 99`

A = H; Score 98

B = L; Score 89

B = H; Score 79

C = L; Score 88

C = H; Score 85

The idea is that with the above calculations we can understand how impactful a change in the parameter value was. Since each value was confronted the same number of times, nullifying in that group of the other parameters.

For that example, the reduction was 2-fold, from 8 to 4. But we can have scenarios just as stated in the intro where it can be 3 times or more.

Doing some fast literature review I ended up with this article An orthogonal genetic algorithm with quantization for global numerical optimization, which instead of looking at a set of fixed matrices and trying to find which fits the problem. We can build an algorithm that given a set of parameters and values we can extrapolate the orthogonal array.

For a proof of concept nothing fancy of a code based on the formulas and algorithms stated in the article mentioned above.

`class Variable: `

_name = ""

_values = []

def __init__(self, name:str, values: list):

self._name = name

self._values = values

def __str__(self):

return f"Variable(name = '{self._name}', values = {self._values})"@property

def values(self) -> list:

return self._values

@property

def name(self) -> str:

return self._name

@property

def n_values(self) -> int:

return len(self._values)

class TaguchiOpt:

_n_exp = 0

_variables: list[Variable] = []

_Q = 0

_N = 0

_J = 0

_M = 0

_OA = None

_OA_values = None

@classmethod

def from_dict(cls, variables: dict):

_vars = []

for k, v in variables.items():

_vars.append(Variable(k, v))

return cls(_vars)

def __init__(self, variables: list[Variable]):

self._variables = variables

self._N = len(variables)

self.build_OA()

def build_OA(self) -> int:

m_v = None

for v in self._variables:

if not m_v or v.n_values > m_v.n_values:

m_v = v

self._Q = m_v.n_values

self._find_J()

self._M = self._Q ** self._J

self._OA = np.full((self._M, self._N), 2, dtype=int)

self._OA_values = np.full((self._M, self._N), None)

self._fill_OA()

return self

def _fill_OA(self):

## For Basic Columns

for k in range(1, self._J+1):

j = int ( ((self._Q**(k-1)) - 1 )

/ (self._Q - 1) ) + 1

max_i = int(self._Q ** self._J)

for i in range (1, max_i + 1):

den = (self._Q ** (self._J - k))

# Fixing i,j for pyhton zero based array

self._OA[i-1, j-1] = int((i - 1) / den) % self._Q #if den > 0 else None

# For Non-Basic Columns

for k in range (2, self._J+1):

j = int ( ((self._Q ** (k-1)) - 1 )

/ (self._Q - 1) ) + 1

for s in range (1, j):

for t in range (1, self._Q):

a_s = self._OA[:,s-1]

a_j = self._OA[:,j-1]

a_jj = int((j+(s-1)*(self._Q-1)+t)) - 1

if a_jj < self._N:

self._OA[:,a_jj] = np.mod(a_s *t + a_j, self._Q)

self._OA = self._OA[:,0:self._N]

for i in range (0, self._M):

for j in range(0, self._N):

v = self._variables[j]

vi = self._OA[i,j]

try:

value = v.values[vi]

self._OA_values[i, j] = value

except:

print(f"Error at {v}, {vi}, ({i,j})")

def _calc_N_for_QJ(self):

return int( (self._Q**(self._J) - 1)/(self._Q-1) )

def _find_J(self):

self._J = int(np.log(self._N * (self._Q-1) + 1) / np.log(self._Q))

n = self._calc_N_for_QJ()

if n > self._N:

self._J -= 1

elif n < self._N:

self._J += 1

self.N = self._calc_N_for_QJ()

@property

def OA(self):

return self._OA_values

def get_params(self, n=0):

return { self._variables[i].name : self._OA_values[n][i] for i in range(self._N)}

def __str__(self):

return f"(Q = {self._Q}, N = {self._N}, J = {self._J}, M = {self._M}, N = {self._N})"

I opted for some classical methods, avoiding as much interference as possible. The training set was artificially generated using scikit-learning methods with a fixed random seed, with 100 variables, 30 of which were informational and 5 thousand records. Those could lead to a quantifiable time to train and a non out of the box 100% accuracy.

The classifier was also another very commonly used model the RandomForest with non-other parameters set outside the Parameter Grid. Finally, the K-Fold protocol under cross validation with k=5, this last set to avoid testing with overfitted results.

`param_grid = {'max_depth': [5, 10, 20],`

'min_samples_split': [5, 10, 20],

'max_features' : ['sqrt', 'log2', None]}base_estimator = RandomForestClassifier(random_state=42)

X, y = make_classification(n_samples=5_000, n_features=100, n_informative=30, random_state=42)

sh = GridSearchCV(base_estimator, param_grid, cv=5, verbose=2)

sh_fit = sh.fit(X, y)

`TgOpt = TaguchiOpt.from_dict(param_grid)`

results = {

'params': [],

'split0_test_score': [],

'split1_test_score': [],

'split2_test_score': [],

'split3_test_score': [],

'split4_test_score': [],

'mean_test_score': [],

'std_test_score': [],

'max_depth': [],

'min_samples_split': [],

'max_features' : []

}best_model_params = None

best_model_score = 0

for e in range(len(TgOpt.OA)):

params = TgOpt.get_params(e)

print(params)

estimator = RandomForestClassifier(random_state=42, **params)

split_scores = cross_val_score(estimator, X, y, cv=5)

results['params'].append(params)

for k, v in params.items():

results[k].append(v)

for i in range(5):

results[f'split{i}_test_score'].append(split_scores[i])

mean_score = np.mean(split_scores)

results['mean_test_score'].append(mean_score)

if mean_score >= best_model_score:

best_model_params = params

best_model_score = mean_score

results['std_test_score'].append(np.std(split_scores))

print()

To find the best model we can build upon the experiments run using the Taguchi Arrays, sum the impacts of each parameter value and remove those values what didn’t gave us much of an improve:

`df_results_oa = pd.DataFrame(results)`

df_results_oa.sort_values('mean_test_score', ascending=False).head(20)for k in params.keys():

print(df_results_oa[[k, 'mean_test_score']].fillna('None').groupby(k).sum().reset_index().sort_values('mean_test_score', ascending=False))

The overall execution and implementation were fun by itself, reading a paper that is quite common outside of my domain and bringing up that idea on things that we take as granted has a value (at least for me 😉).

As previously mentioned, this is not something that will substitute current methods of hyper parameter optimization / searching. But as Taniguchi proposed it can serve as a method to generate robust experiments with some level of explicability which is not very common in other methods.

We had at first a reduction of almost 3 times the number of jobs being run during the GridSearchCV against the Orthogonal Array Method, conversely the time also was a third of the exhaustive method. The score was close to the best one in the Grid Search but still some points below, but that was not final. That leaves time to spare so pruning of the current set of parameters was done based on the results of the Taniguchi method, ending up with only 2 more experiments to execute.

For a result the method yielded a score comparable to the classical method and run under a third of its time. I found that very promising and with ideas of porting it also to feature Selection and maybe even as an optimizer for an ensemble of classifiers.

I’ll leave a simple notebook over here, just in case: taguchi_parameter_search/Experiments.ipynb at main · wanermiranda/taguchi_parameter_search (github.com)

So long and Thanks for all the Fish!