Skip to content
    Latest

    Choosing a Data-Governance Framework for Your Organization

    What is Data Governance? Data governance refers to the process of managing enterprise data with the aim of making data more accessible, reliable, usable, secure, and...

    Fitting Support Vector Machines via Quadratic Programming

    In this blog post we take a deep dive into the internals of Support Vector Machines. We derive a Linear SVM classifier, explain its advantages, and...

    ML internals: Synthetic Minority Oversampling (SMOTE) Technique

    In this article we discuss why fitting models on imbalanced datasets is problematic, and how class imbalance is typically addressed. We present the...

    The Future of Data Science - Mining GTC 2021 for Trends

    Deep learning enthusiasts are increasingly putting NVIDIA’s GTC at the top of their gotta-be-there conference list. I enjoyed mining this year’s...

    Credit Card Fraud Detection using XGBoost, SMOTE, and threshold moving

    In this article, we'll discuss the challenge organizations face around fraud detection, how machine learning can be used to identify and spot...

    Trending Toward Concept Building - A Review of Model Interpretability for Deep Neural Networks

    We are at an interesting time in our industry when it comes to validating models - a crossroads of sorts when you think about it. There is an...

    Fireside Chat: Stig Pedersen from Topdanmark

    "In having one or two very successful algorithmic deployments, the business then begins coming to you to ask for assistance. It becomes a mutual...

    Defining Metrics to Drive Machine Learning Model Adoption & Value

    One of the biggest ironies of enterprise data science is that although data science teams are masters at using probabilistic models and diagnostic...

    Enterprise-class NLP with spaCy v3

    spaCy is a python library that provides capabilities to conduct advanced natural language processing analysis and build models that can underpin...

    How to Supercharge Data Exploration with Pandas Profiling

    Producing insights from raw data is a time-consuming process. Predictive modeling efforts rely on dataset profiles, whether consisting of summary...

    Density-Based Clustering

    Original content by Manojit Nandi - Updated by Josh Poduska. Cluster Analysis is an important problem in data analysis. Data scientists use...

    Analyzing Large P Small N Data - Examples from Microbiome

    Guest Post by Bill Shannon, Co-Founder and Managing Partner of BioRankings Introduction High throughput screening technologies have been developed to...

    The Curse of Dimensionality

    Guest Post by Bill Shannon, Founder and Managing Partner of BioRankings Danger of Big Data Big data is the rage. This could be lots of rows (samples)...

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.