How I Passed the AWS Certified Machine Learning Specialty Without Taking Any Prior AWS Certifications
Obtaining the AWS Machine Learning Specialty Certification is a demanding but highly rewarding achievement for data science and machine learning professionals. While many pursue other AWS certifications first, attaining this certification without prior AWS experience is entirely feasible.
Understanding Clustering Models: A Simple Guide with Examples
It is useful for a data science practitioner to understand how to apply clustering algorithms due to their ability to discern and group similar data points without prior labeling knowledge. Clustering is instrumental in revealing hidden patterns within data, aiding in customer segmentation and anomaly detection applications.
Understanding Key Statistical Tests for Data Scientists
In statistics, understanding how to choose and apply the right test is crucial for analyzing data effectively. That is also true for data science use cases. A proper statistical test can help you determine if there is a significant difference between numerical or categorical variables. This blog post will delve into five fundamental statistical tests I have used throughout my data science and analytics career.
Interpreting Machine Learning Models in Python with SHAP
In machine learning, understanding how models arrive at their predictions is crucial. A common way to determine feature contribution is by looking at feature importance. This measure is based on the decrease in model performance when removing a feature. It is a useful measure but contains no information beyond that importance.
Harnessing the Power of Apache Spark for Data Scientists
As a data scientist, you will often work with big data. A tool to handle high volumes of data is essential to provide key insights for such situations. Combining Python with Apache Spark offers simplicity and versatility to achieve unparalleled results in big data analytics.
Topic Modeling Using LDA and Topic Coherence
Topic modeling is a powerful technique used in natural language processing to identify topics in a text corpus automatically. Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling techniques, and in this tutorial, we'll explore how to implement it using the Gensim library in Python.
Simple and Efficient Machine Learning Prototyping in Python Using Sweetviz and PyCaret
Prototyping new models can be a time-consuming process. It combines business understanding, data understanding, data preparation, model creation, and evaluation. Luckily, there are a few ways to speed up the process to enable fast decision-making in a business setting.
Common Time Series Metrics Using Darts in Python
Time series forecasting has many applications across various industries. In my current role, we use it to forecast service volume cases for existing customers to improve allocation and capacity problems.
Bert Transformer Text Similarity in Python
Sentence similarity involves determining how closely related two sentences are in meaning. It measures the similarity between two or more sentences using a similarity score. We often want to measure text similarity in various NLP tasks. Some of those tasks include information retrieval, text summarization, and question answering. One way of achieving that is by using a BERT Transformer model.
Common Supervised Machine Learning Algorithms
Supervised machine learning is a branch of artificial intelligence that involves training algorithms to make predictions or classifications based on input data. We have a labeled dataset in supervised learning, meaning we have input features and corresponding output values. The goal is to train a model to predict output values for new, unseen input data accurately