In most companies, Data Engineers support the Data Scientists in various ways. Often this means translating or productionizing the notebooks and scripts that a Data Scientist has written. A large portion of the Data Engineer’s role could be replaced with better tooling for Data Scientists, freeing Data Engineers to do more impactful (and scalable) work.
Scaling Interactive Pandas Workflows with Modin – Talk at PyData NYC 2018
In this talk, we will present Modin, a middle layer for DataFrames and interactive data science. Modin, formerly Pandas on Ray, is a library that allows users to speed up their Pandas workflows by changing a single line of code. During the presentation, we will discuss interesting ways Modin is being used, and show how we improve the performance of the most popular Pandas operations. Modin is an early-stage project at UC Berkeley’s RISELab designed to facilitate the use of distributed computing for Data Science. Often, a challenge encountered when trying to use tools for large-scale data is that there is a significant learning overhead. Modin is designed to expose a set of familiar APIs (Pandas, SQL, etc.) and internally…
Modin (Pandas on Ray) – October 2018
View the code on Gist.
Pandas on Ray – Early Lessons from Parallelizing Pandas
View the code on Gist.
Michael I. Jordan: Artificial Intelligence — The Revolution Hasn’t Happened Yet
(This article has originally been published on Medium.com.) Artificial Intelligence (AI) is the mantra of the current era. The phrase is intoned by technologists, academicians, journalists and venture capitalists alike. As with many phrases that cross over from technical academic fields into general circulation, there is significant misunderstanding accompanying the use of the phrase. But this is not the classical case of the public not understanding the scientists — here the scientists are often as befuddled as the public. The idea that our era is somehow seeing the emergence of an intelligence in silicon that rivals our own entertains all of us — enthralling us and frightening us in equal measure. And, unfortunately, it distracts us. There is a different narrative that one can…
Pandas on Ray
View the code on Gist.