DataFrame.
median
Return the median of the values for the requested axis.
Note
Unlike pandas’, the median in pandas-on-Spark is an approximated median based upon approximate percentile computation because computing median across a large dataset is extremely expensive.
Axis for the function to be applied on.
Include only float, int, boolean columns. False is not supported. This parameter is mainly for pandas compatibility.
Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.
Examples
>>> df = ps.DataFrame({ ... 'a': [24., 21., 25., 33., 26.], 'b': [1, 2, 3, 4, 5]}, columns=['a', 'b']) >>> df a b 0 24.0 1 1 21.0 2 2 25.0 3 3 33.0 4 4 26.0 5
On a DataFrame:
>>> df.median() a 25.0 b 3.0 dtype: float64
On a Series:
>>> df['a'].median() 25.0 >>> (df['b'] + 100).median() 103.0
For multi-index columns,
>>> df.columns = pd.MultiIndex.from_tuples([('x', 'a'), ('y', 'b')]) >>> df x y a b 0 24.0 1 1 21.0 2 2 25.0 3 3 33.0 4 4 26.0 5
>>> df.median() x a 25.0 y b 3.0 dtype: float64
>>> df.median(axis=1) 0 12.5 1 11.5 2 14.0 3 18.5 4 15.5 dtype: float64
>>> df[('x', 'a')].median() 25.0 >>> (df[('y', 'b')] + 100).median() 103.0