pyspark.pandas.window.Rolling.sum¶
-
Rolling.
sum
() → FrameLike[source]¶ Calculate rolling summation of given DataFrame or Series.
Note
the current implementation of this API uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.
- Returns
- Series or DataFrame
Same type as the input, with the same index, containing the rolling summation.
See also
pyspark.pandas.Series.expanding
Calling object with Series data.
pyspark.pandas.DataFrame.expanding
Calling object with DataFrames.
pyspark.pandas.Series.sum
Reducing sum for Series.
pyspark.pandas.DataFrame.sum
Reducing sum for DataFrame.
Examples
>>> s = ps.Series([4, 3, 5, 2, 6]) >>> s 0 4 1 3 2 5 3 2 4 6 dtype: int64
>>> s.rolling(2).sum() 0 NaN 1 7.0 2 8.0 3 7.0 4 8.0 dtype: float64
>>> s.rolling(3).sum() 0 NaN 1 NaN 2 12.0 3 10.0 4 13.0 dtype: float64
For DataFrame, each rolling summation is computed column-wise.
>>> df = ps.DataFrame({"A": s.to_numpy(), "B": s.to_numpy() ** 2}) >>> df A B 0 4 16 1 3 9 2 5 25 3 2 4 4 6 36
>>> df.rolling(2).sum() A B 0 NaN NaN 1 7.0 25.0 2 8.0 34.0 3 7.0 29.0 4 8.0 40.0
>>> df.rolling(3).sum() A B 0 NaN NaN 1 NaN NaN 2 12.0 50.0 3 10.0 38.0 4 13.0 65.0