Blog Archives

A Modern Approach to Python & PySpark Application Development

2/9/2018

Python has become the darling language of the data science and data engineering world. It's versatile and powerful, yet easy enough for beginners to use. While we encounter Python developers in every area of IT from web development to network management, we're really seeing the boom right now in machine learning and deep learning application development.

But there's a problem where data science and big data intersect as Hadoop does not have native support for Python. On a filesystem like MapR-XD, this is less of an issue since any library that supports parallel computation can use MapR-XD as a Direct NFS storage layer. If you want to leverage Apache Hadoop YARN for distributed computation, however, you are limited to the Spark Python API (PySpark).

Read more

0 Comments

A Modern Approach to Python & PySpark Application Development

Author

Archives