pyspark.sql.functions.xxhash64#

pyspark.sql.functions.xxhash64(*cols)[source]#

Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsColumn or column name

one or more columns to compute on.

Returns
Column

hash value as long column.

Examples

>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2'])
>>> df.select('*', sf.xxhash64('c1')).show()
+---+---+-------------------+
| c1| c2|       xxhash64(c1)|
+---+---+-------------------+
|ABC|DEF|4105715581806190027|
+---+---+-------------------+
>>> df.select('*', sf.xxhash64('c1', df.c2)).show()
+---+---+-------------------+
| c1| c2|   xxhash64(c1, c2)|
+---+---+-------------------+
|ABC|DEF|3233247871021311208|
+---+---+-------------------+
>>> df.select('*', sf.xxhash64('*')).show()
+---+---+-------------------+
| c1| c2|   xxhash64(c1, c2)|
+---+---+-------------------+
|ABC|DEF|3233247871021311208|
+---+---+-------------------+