Skip to content
    Latest

    Data Exploration with Pandas Profiler and D-Tale

    We all have heard how data is the new oil. I always say that if that is the case, we need to go through some refinement process before that raw oil is converted into useful...

    How Atlassian Uses Data Science to Improve Collaboration

    Christy Bergman, Data Scientist at Atlassian, gives an inside look at how they use data science to improve user onboarding and collaboration. This...

    Fitting Gaussian Process Models in Python

    Written by Chris Fonnesbeck, Assistant Professor of Biostatistics, Vanderbilt University Medical Center. A common applied statistics task involves...

    Achieving Reproducibility with Conda and Domino Environments

    Managing “environments” (i.e., the set of packages, configuration, etc.) is a critical capability of any Data Science Platform. Not only does...

    Enabling Data Science Agility with Docker

    This post describes how Domino uses Docker to solve a number of interconnected problems for data scientists and researchers, related to environment...

    Python 3.6 with Domino in Minutes

    For Pythonistas like me, the holidays started a little early with today's release of Python 3.6. In case you haven't heard, Python 3.6 has a number...

    Python for SAS Users: The Pandas Data Analysis Library

    Ths post is a chapter from Randy Betancourt's Python for SAS Users quick start guide. Randy wrote this guide to familiarize SAS users with Python and...

    23 Visualizations and When to Use Them

    This talk was presented live at PLOTCON 2016 in NYC on November 18, 2016. Scatterplot or bubble chart – what visualization makes the most sense for...

    Python vs. R for Data Science

    R and Python are both popular open source programming languages for data scientists. Each has its advantages for performing data science tasks. So,...

    Exploring the Limits of Parallelized Machine Learning

    This week, Domino’s Chief Data Scientist, Eduardo Ariño de la Rubia, presented a webinar: Machine Learning at Scale with Amazon's X1 Instance. If you...

    Gain Shell Access To Your Domino Instances

    Note: Please be advised that direct access to containers via SSH has been deprecated for Domino versions above 4.x. Indirect SSH access via Workspace...

    How Buzzfeed Uses Real-Time Machine Learning to Choose Their Viral Content

    This talk took place at the Domino Data Science Pop-up in Los Angeles, CA on September 14, 2016. In this presentation, Jane Kelly, Director of Data...

    Wisdom From Machine Learning at Netflix

    At Data By The Bay in May, we saw a great talk by Netflix's Justin Basilico: Recommendations for Building Machine Learning Software. Justin describes...

    Using k-Nearest Neighbors (k-NN) in Production

    What is k-Nearest Neighbors (k-NN)? k-Nearest Neighbors is a simple algorithm that stores all available cases and classifies new cases based on a...

    Choosing Content for Netflix: How Data Leads the Way

    This talk took place at the Domino Data Science Pop-up in Los Angeles, CA on September 14, 2016 In this presentation, Paul Ellwood, VP of Data...

    Using Apache Spark to Analyze Large Neuroimaging Datasets

    This article was written by Sergul Aydore, Ph.D., and Syed Ashrafulla, Ph.D. Sergul and Syed received their Ph.D.s in Electrical Engineering in 2014...

    The "Joel Test" for Data Science

    It's the sixteenth anniversary of Joel Spolsky's "Joel Test," which he described as a "highly irresponsible, sloppy test to rate the quality of a...

    Subscribe to the Data Science Blog

    Receive data science tips and tutorials from leading Data Scientists right to your inbox.