Thursday, January 22, 2015

Top Python mistakes when dealing with Big Data

Interesting article.
  1. Reinventing the wheel. For example, writing one-off code to read a CSV instead of using a convenient purpose-built library that could offer deeper functionality. (Python Pandas, to be specific, in this case - interesting stuff, actually!)
  2. Failing to tune for performance. Cuts down on testing cycles per day.
  3. Failing to understand time and timezones. Ain't that the truth.
  4. Manual integration of different technologies in a solution (copying results files back and forth by hand, etc.)
  5. Not keeping track of data types and schemata.
  6. Failing to include data provenance tracking. Oooh, I like this notion.
  7. No testing, especially no regression testing.
All good points.

No comments:

Post a Comment