Data Science Blog | Practical Techniques (3)

Manual Feature Engineering

Many thanks to AWP Pearson for the permission to excerpt "Manual Feature Engineering: Manipulating Data for Fun and Profit" from the book, Machine...

Addison-Wesley Professional Data Science Code Engineering Model Management Machine Learning Practical Techniques Feature Engineering data model

A Practitioner's Guide to Deep Learning with Ludwig

Joshua Poduska provides a distilled overview of Ludwig including when to use Ludwig’s command-line syntax and when to use its Python API. ...

Data Science Code Machine Learning Domino Product Practical Techniques Deep Learning Ludwig

Themes and Conferences per Pacoid, Episode 11

Paco Nathan's latest article covers program synthesis, AutoPandas, model-driven data queries, and more. Introduction Welcome back to our monthly...

Data Science Code autopandas Jupyter Practical Techniques sql Program Synthesis Pandas Model Context Paco Nathan Column Model Development

MNIST Expanded: 50,000 New Samples Added

This post provides a distilled overview regarding the rediscovery of 50,000 samples within the MNIST dataset. MNIST: The Potential Danger of...

Data Science Machine Learning Practical Techniques MNSIT Practical Techniques Featured Reproducibility Domino Data Science Field Note

Addressing Irreproducibility in the Wild

This Domino Data Science Field Note provides highlights and excerpted slides from Chloe Mawer’s "The Ingredients of a Reproducible Machine Learning...

Data Science Leaders At Work Code Model Management Machine Learning Seek Truth Speak Truth Models Featured Practical Techniques Model Deployment Model Interpretability Reproducibility Model Development Domino Data Science Field Note

Can Data Science Help Us Make Sense of the Mueller Report?

This blog post provides insights on how to apply Natural Language Processing (NLP) techniques. The Mueller Report The Mueller Report, officially...

Data Science Code Machine Learning Practical Techniques R NLP

Machine Learning in Production: Software Architecture

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Software Architecture" chapter from the book, Machine Learning...

Addison-Wesley Professional Engineering Model Management Machine Learning Software Architecture Data Science and Data Engineering Alignment Practical Techniques Model Production Data Engineering Data Infrastructure Data Scientists

Comparing the Functionality of Open Source NLP Libraries

In this guest post, Maziyar Panahi and David Talby provide a cheat sheet for choosing open source NLP libraries. What do Natural Language Processing...

Data Science Machine Learning CoreNLP Practical Techniques NLTK spaCy Spark NLP NLP Open Source OpenNLP

Manipulating Data with dplyr

Special thanks to Addison-Wesley Professional for permission to excerpt the following "Manipulating data with dplyr" chapter from the book, ...

Addison-Wesley Professional Data Science Code Practical Techniques R dplyr Model Development

Highlights from the Maryland Data Science Conference: Deep Learning on Imagery and Text

Niels Kasch, cofounder of Miner & Kasch, an AI and Data Science consulting firm, provides insight from a deep learning session that occurred at the...

Data Science Code Machine Learning Practical Techniques Deep Learning data

Themes and Conferences per Pacoid, Episode 5

In Paco Nathan's latest column, he explores the theme of "learning data science" by diving into education programs, learning materials, educational...

Data Science Leaders At Work Jupyter Practical Techniques Data Science Leaders Data Science Education Data Science Leaders at Work Paco Nathan Column

Creating Multi-language Pipelines with Apache Spark or Avoid Having to Rewrite spaCy into Java

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to create multi-language pipelines with Apache Spark and avoid...

Code Code Featured Sparkling ML Practical Techniques Data Engineering spaCy Python Machine Learning Engineer Spark

Making PySpark Work with spaCy: Overcoming Serialization Errors

In this guest post, Holden Karau, Apache Spark Committer, provides insights on how to use spaCy to process text data. Karau is a Developer Advocate...

Data Science Code Arrow Machine Learning Sparkling ML PySpark Practical Techniques spaCy Python NLP Serialization Spark

Item Response Theory in R for Survey Analysis

In this guest blog post, Derrick Higgins, of American Family Insurance, covers item response theory (IRT) and how data scientists can apply it within...

Data Science Code Model Management Practical Techniques Predictive Models R Item Response Theory Generative Models IRT Models Model Development

Benchmarking NVIDIA CUDA 9 and Amazon EC2 P3 Instances Using Fashion MNIST

In this post, Josh Poduska, Chief Data Scientist at Domino Data Lab, writes about benchmarking NVIDIA CUDA 9 and Amazon EC2 P3 Instances Using...

Benchmark Data Science Code Machine Learning Domino Product MNIST Practical Techniques EC2 P3 CUDA Deep Learning GPU

Learn from the Reproducibility Crisis in Science

Key highlights from Clare Gollnick’s talk, “The limits of inference: what data scientists can learn from the reproducibility crisis in science”, are...

Data Science Machine Learning Practical Techniques Data Scientists Inference Practical Techniques Featured Reproducibility Bias Domino Data Science Field Note

Data Exploration with Pandas Profiler and D-Tale