pyspark.sql.functions.hll_sketch_agg#
- pyspark.sql.functions.hll_sketch_agg(col, lgConfigK=None)[source]#
Aggregate function: returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
New in version 3.5.0.
- Parameters
- Returns
Column
The binary representation of the HllSketch.
Examples
>>> df = spark.createDataFrame([1,2,2,3], "INT") >>> df1 = df.agg(hll_sketch_estimate(hll_sketch_agg("value")).alias("distinct_cnt")) >>> df1.show() +------------+ |distinct_cnt| +------------+ | 3| +------------+ >>> df2 = df.agg(hll_sketch_estimate( ... hll_sketch_agg("value", lit(12)) ... ).alias("distinct_cnt")) >>> df2.show() +------------+ |distinct_cnt| +------------+ | 3| +------------+ >>> df3 = df.agg(hll_sketch_estimate( ... hll_sketch_agg(col("value"), lit(12))).alias("distinct_cnt")) >>> df3.show() +------------+ |distinct_cnt| +------------+ | 3| +------------+