So, you want to get rid of any row that has ‘NaN’ ( null or not a number ) values, because it doesn’t work with some functions ( or can’t ignore )

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
obs1.dropna(how = 'all', subset = ['wind_mph'], inplace = True)
obs1 = obs1.reset_index(drop=True)
obs1.dropna(how = 'all', subset = ['wind_mph'], inplace = True) obs1 = obs1.reset_index(drop=True)
obs1.dropna(how = 'all', subset = ['wind_mph'], inplace = True)
obs1 = obs1.reset_index(drop=True)

Another important point – is that with Pandas – you have to convert any value you load to NaN.  Or somehow change it to the cannonical ‘NaN’.   here is a way to convert text to ‘NaN’.    This might also clean up any automated conversion for columns – so if 99.9% of your column data is float, but one value text or something – it will though an internal exception and keep the column as a object.  The data my program generates puts a “<no_value_provided>”, the parm to use is na_values ( and it can be a list if your data has more than one notation.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
def date_utc(x): return dateutil.parser.parse(x[:20], ignoretz=True)
obs1 = pd.read_csv(target_csv, parse_dates=[9], date_parser=date_utc,
dtype = { 'wind_mph': 'float64'},
na_values = "<no_value_provided>")
def date_utc(x): return dateutil.parser.parse(x[:20], ignoretz=True) obs1 = pd.read_csv(target_csv, parse_dates=[9], date_parser=date_utc, dtype = { 'wind_mph': 'float64'}, na_values = "<no_value_provided>")
def date_utc(x): return dateutil.parser.parse(x[:20], ignoretz=True)
obs1 = pd.read_csv(target_csv, parse_dates=[9], date_parser=date_utc,
                    dtype = { 'wind_mph': 'float64'},
                    na_values = "<no_value_provided>")

 

Leave a Reply