RidgeRegressionModel¶

class pyspark.mllib.regression.RidgeRegressionModel(weights: pyspark.mllib.linalg.Vector, intercept: float)[source]¶

A linear regression model derived from a least-squares fit with an l_2 penalty term.

New in version 0.9.0.

Examples

>>> from pyspark.mllib.linalg import SparseVector
>>> from pyspark.mllib.regression import LabeledPoint
>>> data = [
...     LabeledPoint(0.0, [0.0]),
...     LabeledPoint(1.0, [1.0]),
...     LabeledPoint(3.0, [2.0]),
...     LabeledPoint(2.0, [3.0])
... ]
>>> lrm = RidgeRegressionWithSGD.train(sc.parallelize(data), iterations=10,
...     initialWeights=np.array([1.0]))
>>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(lrm.predict(np.array([1.0])) - 1) < 0.5
True
>>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True
>>> abs(lrm.predict(sc.parallelize([[1.0]])).collect()[0] - 1) < 0.5
True
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> lrm.save(sc, path)
>>> sameModel = RidgeRegressionModel.load(sc, path)
>>> abs(sameModel.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(sameModel.predict(np.array([1.0])) - 1) < 0.5
True
>>> abs(sameModel.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True
>>> from shutil import rmtree
>>> try:
...    rmtree(path)
... except BaseException:
...    pass
>>> data = [
...     LabeledPoint(0.0, SparseVector(1, {0: 0.0})),
...     LabeledPoint(1.0, SparseVector(1, {0: 1.0})),
...     LabeledPoint(3.0, SparseVector(1, {0: 2.0})),
...     LabeledPoint(2.0, SparseVector(1, {0: 3.0}))
... ]
>>> lrm = LinearRegressionWithSGD.train(sc.parallelize(data), iterations=10,
...     initialWeights=np.array([1.0]))
>>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True
>>> lrm = RidgeRegressionWithSGD.train(sc.parallelize(data), iterations=10, step=1.0,
...     regParam=0.01, miniBatchFraction=1.0, initialWeights=np.array([1.0]), intercept=True,
...     validateData=True)
>>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True

Methods

`load`(sc, path)	Load a RidgeRegressionMode.
`predict`(x)	Predict the value of the dependent variable given a vector or an RDD of vectors containing values for the independent variables.
`save`(sc, path)	Save a RidgeRegressionMode.

Attributes

`intercept`	Intercept computed for this model.
`weights`	Weights computed for every feature.

Methods Documentation

classmethod load(sc: pyspark.context.SparkContext, path: str) → pyspark.mllib.regression.RidgeRegressionModel [source]¶: Load a RidgeRegressionMode.

New in version 1.4.0.

predict(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[float, pyspark.rdd.RDD[float]]¶: Predict the value of the dependent variable given a vector or an RDD of vectors containing values for the independent variables.

New in version 0.9.0.

save(sc: pyspark.context.SparkContext, path: str) → None[source]¶: Save a RidgeRegressionMode.

New in version 1.4.0.

Attributes Documentation

intercept¶: Intercept computed for this model.

New in version 1.0.0.

weights¶: Weights computed for every feature.

New in version 1.0.0.

LinearRegressionWithSGD

RidgeRegressionWithSGD