Victor Solis 12/13/24 Victor Solis 12/13/24

How I Passed the AWS Certified Machine Learning Specialty Without Taking Any Prior AWS Certifications

Obtaining the AWS Machine Learning Specialty Certification is a demanding but highly rewarding achievement for data science and machine learning professionals. While many pursue other AWS certifications first, attaining this certification without prior AWS experience is entirely feasible.

Victor Solis 4/2/24 Victor Solis 4/2/24

Understanding Clustering Models: A Simple Guide with Examples

It is useful for a data science practitioner to understand how to apply clustering algorithms due to their ability to discern and group similar data points without prior labeling knowledge. Clustering is instrumental in revealing hidden patterns within data, aiding in customer segmentation and anomaly detection applications.

Victor Solis 2/16/24 Victor Solis 2/16/24

Understanding Key Statistical Tests for Data Scientists

In statistics, understanding how to choose and apply the right test is crucial for analyzing data effectively. That is also true for data science use cases. A proper statistical test can help you determine if there is a significant difference between numerical or categorical variables. This blog post will delve into five fundamental statistical tests I have used throughout my data science and analytics career.

Victor Solis 1/3/24 Victor Solis 1/3/24

Interpreting Machine Learning Models in Python with SHAP

In machine learning, understanding how models arrive at their predictions is crucial. A common way to determine feature contribution is by looking at feature importance. This measure is based on the decrease in model performance when removing a feature. It is a useful measure but contains no information beyond that importance.

Victor Solis 11/30/23 Victor Solis 11/30/23

Harnessing the Power of Apache Spark for Data Scientists

As a data scientist, you will often work with big data. A tool to handle high volumes of data is essential to provide key insights for such situations. Combining Python with Apache Spark offers simplicity and versatility to achieve unparalleled results in big data analytics.

Victor Solis 10/31/23 Victor Solis 10/31/23

Topic Modeling Using LDA and Topic Coherence

Topic modeling is a powerful technique used in natural language processing to identify topics in a text corpus automatically. Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling techniques, and in this tutorial, we'll explore how to implement it using the Gensim library in Python.

Victor Solis 9/30/23 Victor Solis 9/30/23

Simple and Efficient Machine Learning Prototyping in Python Using Sweetviz and PyCaret

Prototyping new models can be a time-consuming process. It combines business understanding, data understanding, data preparation, model creation, and evaluation. Luckily, there are a few ways to speed up the process to enable fast decision-making in a business setting.

Victor Solis 8/31/23 Victor Solis 8/31/23

Common Time Series Metrics Using Darts in Python

Time series forecasting has many applications across various industries. In my current role, we use it to forecast service volume cases for existing customers to improve allocation and capacity problems.

Victor Solis 7/31/23 Victor Solis 7/31/23

Bert Transformer Text Similarity in Python

Sentence similarity involves determining how closely related two sentences are in meaning. It measures the similarity between two or more sentences using a similarity score. We often want to measure text similarity in various NLP tasks. Some of those tasks include information retrieval, text summarization, and question answering. One way of achieving that is by using a BERT Transformer model.

Victor Solis 6/30/23 Victor Solis 6/30/23

Common Supervised Machine Learning Algorithms

Supervised machine learning is a branch of artificial intelligence that involves training algorithms to make predictions or classifications based on input data. We have a labeled dataset in supervised learning, meaning we have input features and corresponding output values. The goal is to train a model to predict output values for new, unseen input data accurately