pyspark.sql.functions.xxhash64#
- pyspark.sql.functions.xxhash64(*cols)[source]#
Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column. The hash computation uses an initial seed of 42.
New in version 3.0.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- cols
Column
or column name one or more columns to compute on.
- cols
- Returns
Column
hash value as long column.
See also
Examples
>>> import pyspark.sql.functions as sf >>> df = spark.createDataFrame([('ABC', 'DEF')], ['c1', 'c2']) >>> df.select('*', sf.xxhash64('c1')).show() +---+---+-------------------+ | c1| c2| xxhash64(c1)| +---+---+-------------------+ |ABC|DEF|4105715581806190027| +---+---+-------------------+
>>> df.select('*', sf.xxhash64('c1', df.c2)).show() +---+---+-------------------+ | c1| c2| xxhash64(c1, c2)| +---+---+-------------------+ |ABC|DEF|3233247871021311208| +---+---+-------------------+
>>> df.select('*', sf.xxhash64('*')).show() +---+---+-------------------+ | c1| c2| xxhash64(c1, c2)| +---+---+-------------------+ |ABC|DEF|3233247871021311208| +---+---+-------------------+