In the last two weeks, I’ve clocked another 44 hours on my machine learning journey, diving into the world of unsupervised learning algorithms and their role in data preprocessing. I also took some time to tweak my Vivino web scraper to harvest more data (with the goal of improving the model’s accuracy in the near future). And I also started reading “How AI Works” to boost my theoretical knowledge.

- Unsupervised Learning Algorithms: I’ve been wrapping my head around unsupervised learning algorithms and their super important role in data preprocessing.
- Data Scraper Upgrade: I tweaked the scraper from the previous project.
- Book Learning: I dug into “How AI Works,” which, aside from a history lesson, gave me a better handle on vectors — quite the unexpected bonus.

Here’s the rundown of the most important insights from these two weeks:

*Standardization vs. Normalization:*These preprocessing techniques are pivotal. Standardization modifies data to a zero mean and unit variance, while normalization scales data within a bounded interval, typically [0, 1].*Principal Component Analysis (PCA):*PCA is a dimensionality reduction technique that identifies the directions (principal components) that maximize variance in high-dimensional data, effectively reducing the number of dimensions without significant loss of information.*Whitening:*Post-PCA, whitening is applied to equalize the variances of the principal components, ensuring that they contribute equally to the subsequent analysis.*Masking in Python:*This technique filters data arrays, allowing for the selection of data points that meet specific criteria, thereby enhancing the focus on relevant patterns.*Non-Negative Matrix Factorization (NMF):*NMF is a factorization method that approximates a matrix with non-negative elements, facilitating interpretable feature extraction.*PCA vs. NMF:*PCA provides orthogonal components that maximize variance, whereas NMF offers a parts-based representation, which can be more interpretable, especially in applications like image decomposition.*Manifold Learning Algorithms:*These algorithms seek to uncover the low-dimensional manifold within high-dimensional data, facilitating visualization and noise reduction.*KMeans Clustering:*This algorithm partitions data into k distinct clusters, each represented by a centroid, and assigns data points to the nearest cluster.*Vector Quantization:*A technique in signal processing that approximates a set of data points with a smaller set of representative vectors, aiding in data compression.

## Favorite Code Block:

This fortnight’s highlight was using NMF for image decomposition on the faces dataset. It’s like having X-ray vision, seeing beyond the surface to what each component of the data focuses on. This examples were built with the “Introduction to Machine Learning with Python” book.

This code is showing the first 15 components found using NMF:

`nmf = NMF(n_components=15, random_state=0)`

nmf.fit(X_train)

X_train_nmf = nmf.transform(X_train)

X_test_nmf = nmf.transform(X_test)fix, axes = plt.subplots(3, 5, figsize=(15, 12), subplot_kw={'xticks': (), 'yticks': ()})

for i, (component, ax) in enumerate(zip(nmf.components_, axes.ravel())):

ax.imshow(component.reshape(image_shape))

ax.set_title(f"{i}.component")

Here I observed that the component 4 was mostly representing faces oriented to the left and component 7 faces oriented to the right.

In this code I plot the first 10 images for each of those components and indeed the faces it shoes are mostly turning left and right:

`compn = 4`

# sort by 3rd component, plot first 10 images

inds = np.argsort(X_train_nmf[:, compn])[::-1]

fig, axes = plt.subplots(2, 5, figsize=(15, 8), subplot_kw={'xticks': (), 'yticks': ()})

for i, (ind, ax) in enumerate(zip(inds, axes.ravel())):

ax.imshow(X_train[ind].reshape(image_shape))

compn = 7

# sort by 7th component, plot first 10 imagesinds = np.argsort(X_train_nmf[:, compn])[::-1]

fig, axes = plt.subplots(2, 5, figsize=(15, 8), subplot_kw={'xticks': (), 'yticks': ()})

for i, (ind, ax) in enumerate(zip(inds, axes.ravel())):

ax.imshow(X_train[ind].reshape(image_shape)

## Next Steps:

In the upcoming two weeks, my main goal is to complete the section on unsupervised algorithms. With this foundation, I will embark on building a new model that integrates the insights and techniques I’ve been acquiring.

It’s also time for some introspection — honing in on the specific ML path I want to pursue to align my projects with my career aspirations. It’s about getting strategic with my learning to carve out a niche in the ML landscape. This will involve a more cautious selection of projects that not only broaden my expertise but also steer my portfolio towards a niche that resonates with my professional goals.

Interested in a friendly brainstorm or sharing ideas about AI, machine learning and or linguistics? Let’s connect on LinkedIn or Upwork. I’m all for exchanging thoughts and learning together in this exciting field!