Tips for Data Science Newbs: Setting Flags for Feature Engineering
What Are Feature Flags?
Feature flags (or toggles, controls) are a software engineering best practice that as a data scientist, I wish I learned a long time ago. They are used by developers to disable or enable parts of their code given certain variables and conditions. This is useful for developers in their testing and deployment processes as it enables them to control features of their code and test them without jeopardizing the functionality of the rest of the code. I have come to realize how useful this practice is for data scientists to ensure proper, continuous and safe feature engineering.
Lead by Example
Given a function that normalizes some feature x, a flag called "is_feature_normalized" is used to ensure that when and if the code is executed multiple times, x is only altered during the first execution. If the flag is False, the normalizing function is executed and the flag is switched to True. If the flag is True, the normalizing function is not executed. Please see an example below.
import numpy as np is_feature_normalized = False def feature_normalization(X): return (X-np.min(X))/(np.max(X)-np.min(X)) if not is_feature_normalized: X = feature_normalization(X) is_feature_normalized = True
This tip is especially helpful when working with Jupyter Notebooks - we all know how messy they can get...
Motivation
Given data scientists' reputation for lacking software engineering skills and best practices, I believe it is my duty as a member of this community, to pass on any knowledge that will make us better coders, and in turn, data scientists.