Skip to content
    Latest

    Data Exploration with Pandas Profiler and D-Tale

    We all have heard how data is the new oil. I always say that if that is the case, we need to go through some refinement process before that raw oil is converted into useful...

    How to run PySpark on a 32-core cluster with Domino

    In this post we will show you two different ways to get up and running with Spark. The first is to use Domino, which has Spark pre-installed and...

    Social Network Analysis with NetworkX

    Many types of real-world problems involve dependencies between records in the data. For example, sociologist are eager to understand how people...

    Pandas Categoricals

    Guest post by Matthew Rocklin Pandas Categoricals efficiently encode and dramatically improve performance on data with text categories Disclaimer:...

    Geographic visualization with R's ggmap

    Have you ever crunched some numbers on data that involved spatial locations? If the answer is no, then boy are you missing out! So much spatial data...

    Deep Learning with h2o.ai

    This post provides a brief history lesson and overview of deep learning, coupled with a quick "how to" guide for dipping your toes into the water...

    Reflections on "Buy vs Build" for Data Science Tools

    “Buy vs build”, “not-invented-here syndrome” and even “invented-here-syndrome” have been written about extensively. I want to share a few reflections...

    Topic modeling in 9/11 news articles

    This is a guest post by Dan Morris. The interactive dashboard and the code are also available. This post describes a project to visualize topics in...

    How to Do Factor Analysis in R

    This is a guest post by Evan Warfel. What is Factor Analysis? P-values. T-tests. Categorical variables. All are contenders for the most misused...

    Interactive dashboards with knitr and htmlwidgets

    Introduction htmlwidgets for R is a nifty R package that lets you easily generate interactive visualization. There are already widgets for...

    How to Get Started with the Data Science Bowl

    I am thrilled to share a Domino project we’ve created with starter code in R and Python for participating in the Data Science Bowl. Introduction The...

    Visualizing home ownership with small multiples and R

    This is guest post by Antonio Sánchez Chinchón. The "small multiples" visualization technique was introduced by Edward Tufte, one of the current...

    Cloud Security: The right way to worry

    Here’s a question we hear a lot: We’re not that comfortable with the cloud from a security perspective -- can you install Domino on premise? The...

    Getting error bounds on classification metrics

    This is a guest post by Casson Stallings. Error bounds, or lack thereof Calculating error bounds on metrics derived from very large data sets has...

    Using data science to get a good deal on a Macbook

    This is a guest post by Ajay Sharma. Introduction I created a Python project that grabs Craigslist postings in New York for Macbook Air 13" and...

    40-percent faster R without any code changes

    Starting today, anyone running R code on Domino can use Revolution R Open, to dramatically improve their performance without any code changes. ...

    User stories: how Domino helps a data scientist create "unicorn-level deliverables"

    We asked our users to tell us stories about how they're using Domino. This is what we heard from Laura Lorenz, a Data Scientist at StockUp. I...

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.