Computes the absolute value.
Computes the absolute value.
1.3.0
Computes the cosine inverse of the given column; the returned angle is in the range 0.
Computes the cosine inverse of the given column; the returned angle is in the range 0.0 through pi.
1.4.0
Computes the cosine inverse of the given value; the returned angle is in the range 0.
Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.
1.4.0
Returns the date that is numMonths after startDate.
Returns the date that is numMonths after startDate.
1.5.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Creates a new array column.
Creates a new array column. The input columns must all have the same data type.
1.4.0
Creates a new array column.
Creates a new array column. The input columns must all have the same data type.
1.4.0
Returns true if the array contain the value
Returns true if the array contain the value
1.5.0
Returns a sort expression based on ascending order of the column.
Returns a sort expression based on ascending order of the column.
// Sort by dept in ascending order, and then age in descending order. df.sort(asc("dept"), desc("age"))
1.3.0
Computes the numeric value of the first character of the string column, and returns the result as a int column.
Computes the numeric value of the first character of the string column, and returns the result as a int column.
1.5.0
Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.
Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.
1.4.0
Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.
Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.
1.4.0
Computes the tangent inverse of the given column.
Computes the tangent inverse of the given column.
1.4.0
Computes the tangent inverse of the given value.
Computes the tangent inverse of the given value.
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group.
1.3.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group.
1.3.0
Computes the BASE64 encoding of a binary column and returns it as a string column.
Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.
1.5.0
An expression that returns the string representation of the binary value of the given long column.
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".
1.5.0
An expression that returns the string representation of the binary value of the given long column.
An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".
1.5.0
Computes bitwise NOT.
Computes bitwise NOT.
1.4.0
Marks a DataFrame as small enough for use in broadcast joins.
Marks a DataFrame as small enough for use in broadcast joins.
The following example marks the right DataFrame for broadcast hash join using joinKey
.
// left and right are DataFrames left.join(broadcast(right), "joinKey")
1.5.0
Call an user-defined function.
Call an user-defined function. Example:
import org.apache.spark.sql._ val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") val sqlContext = df.sqlContext sqlContext.udf.register("simpleUDF", (v: Int) => v * v) df.select($"id", callUDF("simpleUDF", $"value"))
1.5.0
Computes the cube-root of the given column.
Computes the cube-root of the given column.
1.4.0
Computes the cube-root of the given value.
Computes the cube-root of the given value.
1.4.0
Computes the ceiling of the given column.
Computes the ceiling of the given column.
1.4.0
Computes the ceiling of the given value.
Computes the ceiling of the given value.
1.4.0
Returns the first column that is not null, or null if all inputs are null.
Returns the first column that is not null, or null if all inputs are null.
For example, coalesce(a, b, c)
will return a if a is not null,
or b if a is null and b is not null, or c if both a and b are null but c is not null.
1.3.0
Returns a Column based on the given column name.
Returns a Column based on the given column name.
1.3.0
Returns a Column based on the given column name.
Concatenates multiple input string columns together into a single string column.
Concatenates multiple input string columns together into a single string column.
1.5.0
Concatenates multiple input string columns together into a single string column, using the given separator.
Concatenates multiple input string columns together into a single string column, using the given separator.
1.5.0
Convert a number in a string column from one base to another.
Convert a number in a string column from one base to another.
1.5.0
Computes the cosine of the given column.
Computes the cosine of the given column.
1.4.0
Computes the cosine of the given value.
Computes the cosine of the given value.
1.4.0
Computes the hyperbolic cosine of the given column.
Computes the hyperbolic cosine of the given column.
1.4.0
Computes the hyperbolic cosine of the given value.
Computes the hyperbolic cosine of the given value.
1.4.0
Aggregate function: returns the number of items in a group.
Aggregate function: returns the number of items in a group.
1.3.0
Aggregate function: returns the number of items in a group.
Aggregate function: returns the number of items in a group.
1.3.0
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
1.3.0
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
1.3.0
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
1.5.0
Window function: returns the cumulative distribution of values within a window partition, i.
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
N = total number of rows in the partition cumeDist(x) = number of values before (and including) x / N
This is equivalent to the CUME_DIST function in SQL.
1.4.0
Returns the current date as a date column.
Returns the current date as a date column.
1.5.0
Returns the current timestamp as a timestamp column.
Returns the current timestamp as a timestamp column.
1.5.0
Returns the date that is days
days after start
Returns the date that is days
days after start
1.5.0
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
A pattern could be for instance dd.MM.yyyy
and could return a string like '18.03.1993'. All
pattern letters of java.text.SimpleDateFormat can be used.
NOTE: Use when ever possible specialized functions like year. These benefit from a specialized implementation.
1.5.0
Returns the date that is days
days before start
Returns the date that is days
days before start
1.5.0
Returns the number of days from start
to end
.
Returns the number of days from start
to end
.
1.5.0
Extracts the day of the month as an integer from a given date/timestamp/string.
Extracts the day of the month as an integer from a given date/timestamp/string.
1.5.0
Extracts the day of the year as an integer from a given date/timestamp/string.
Extracts the day of the year as an integer from a given date/timestamp/string.
1.5.0
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
1.5.0
Window function: returns the rank of rows within a window partition, without any gaps.
Window function: returns the rank of rows within a window partition, without any gaps.
The difference between rank and denseRank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using denseRank and had three people tie for second place, you would say that all three were in second place and that the next person came in third.
This is equivalent to the DENSE_RANK function in SQL.
1.4.0
Returns a sort expression based on the descending order of the column.
Returns a sort expression based on the descending order of the column.
// Sort by dept in ascending order, and then age in descending order. df.sort(asc("dept"), desc("age"))
1.3.0
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.
1.5.0
Computes the exponential of the given column.
Computes the exponential of the given column.
1.4.0
Computes the exponential of the given value.
Computes the exponential of the given value.
1.4.0
Creates a new row for each element in the given array or map column.
Creates a new row for each element in the given array or map column.
1.3.0
Computes the exponential of the given column.
Computes the exponential of the given column.
1.4.0
Computes the exponential of the given value minus one.
Computes the exponential of the given value minus one.
1.4.0
Parses the expression string into the column that it represents, similar to DataFrame.
Parses the expression string into the column that it represents, similar to DataFrame.selectExpr
// get the number of words of each length df.groupBy(expr("length(word)")).count()
Computes the factorial of the given value.
Computes the factorial of the given value.
1.5.0
Aggregate function: returns the first value of a column in a group.
Aggregate function: returns the first value of a column in a group.
1.3.0
Aggregate function: returns the first value in a group.
Aggregate function: returns the first value in a group.
1.3.0
Computes the floor of the given column.
Computes the floor of the given column.
1.4.0
Computes the floor of the given value.
Computes the floor of the given value.
1.4.0
Formats numeric column x to a format like '#,###,###.
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.
If d is 0, the result has no decimal point or fractional part. If d < 0, the result will be null.
1.5.0
Formats the arguments in printf-style and returns the result as a string column.
Formats the arguments in printf-style and returns the result as a string column.
1.5.0
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
1.5.0
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
1.5.0
Assumes given timestamp is UTC and converts to given timezone.
Assumes given timestamp is UTC and converts to given timezone.
1.5.0
Returns the greatest value of the list of column names, skipping null values.
Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Returns the greatest value of the list of values, skipping null values.
Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Computes hex value of the given column.
Computes hex value of the given column.
1.5.0
Extracts the hours as an integer from a given date/timestamp/string.
Extracts the hours as an integer from a given date/timestamp/string.
1.5.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Returns a new string column by converting the first letter of each word to uppercase.
Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.
For example, "hello world" will become "Hello World".
1.5.0
Creates a string column for the file name of the current Spark task.
Locate the position of the first occurrence of substr column in the given string.
Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.
NOTE: The position is not zero based, but 1 based index, returns 0 if substr could not be found in str.
1.5.0
Return true iff the column is NaN.
Return true iff the column is NaN.
1.5.0
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Aggregate function: returns the last value of the column in a group.
Aggregate function: returns the last value of the column in a group.
1.3.0
Aggregate function: returns the last value in a group.
Aggregate function: returns the last value in a group.
1.3.0
Given a date column, returns the last day of the month which the given date belongs to.
Given a date column, returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.
1.5.0
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Returns the least value of the list of column names, skipping null values.
Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Returns the least value of the list of values, skipping null values.
Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.
1.5.0
Computes the length of a given string or binary column.
Computes the length of a given string or binary column.
1.5.0
Computes the Levenshtein distance of the two given string columns.
Computes the Levenshtein distance of the two given string columns.
1.5.0
Creates a Column of literal value.
Locate the position of the first occurrence of substr in a string column, after position pos.
Locate the position of the first occurrence of substr in a string column, after position pos.
NOTE: The position is not zero based, but 1 based index. returns 0 if substr could not be found in str.
1.5.0
Locate the position of the first occurrence of substr.
Locate the position of the first occurrence of substr. NOTE: The position is not zero based, but 1 based index, returns 0 if substr could not be found in str.
1.5.0
Returns the first argument-base logarithm of the second argument.
Returns the first argument-base logarithm of the second argument.
1.4.0
Returns the first argument-base logarithm of the second argument.
Returns the first argument-base logarithm of the second argument.
1.4.0
Computes the natural logarithm of the given column.
Computes the natural logarithm of the given column.
1.4.0
Computes the natural logarithm of the given value.
Computes the natural logarithm of the given value.
1.4.0
Computes the logarithm of the given value in base 10.
Computes the logarithm of the given value in base 10.
1.4.0
Computes the logarithm of the given value in base 10.
Computes the logarithm of the given value in base 10.
1.4.0
Computes the natural logarithm of the given column plus one.
Computes the natural logarithm of the given column plus one.
1.4.0
Computes the natural logarithm of the given value plus one.
Computes the natural logarithm of the given value plus one.
1.4.0
Computes the logarithm of the given value in base 2.
Computes the logarithm of the given value in base 2.
1.5.0
Computes the logarithm of the given column in base 2.
Computes the logarithm of the given column in base 2.
1.5.0
Converts a string column to lower case.
Converts a string column to lower case.
1.3.0
Left-pad the string column with
Left-pad the string column with
1.5.0
Trim the spaces from left end for the specified string value.
Trim the spaces from left end for the specified string value.
1.5.0
Aggregate function: returns the maximum value of the column in a group.
Aggregate function: returns the maximum value of the column in a group.
1.3.0
Aggregate function: returns the maximum value of the expression in a group.
Aggregate function: returns the maximum value of the expression in a group.
1.3.0
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.
1.5.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group. Alias for avg.
1.4.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group. Alias for avg.
1.4.0
Aggregate function: returns the minimum value of the column in a group.
Aggregate function: returns the minimum value of the column in a group.
1.3.0
Aggregate function: returns the minimum value of the expression in a group.
Aggregate function: returns the minimum value of the expression in a group.
1.3.0
Extracts the minutes as an integer from a given date/timestamp/string.
Extracts the minutes as an integer from a given date/timestamp/string.
1.5.0
A column expression that generates monotonically increasing 64-bit integers.
A column expression that generates monotonically increasing 64-bit integers.
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs: 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
1.4.0
Extracts the month as an integer from a given date/timestamp/string.
Extracts the month as an integer from a given date/timestamp/string.
1.5.0
Returns col1 if it is not NaN, or col2 if col1 is NaN.
Returns col1 if it is not NaN, or col2 if col1 is NaN.
Both inputs should be floating point columns (DoubleType or FloatType).
1.5.0
Unary minus, i.
Unary minus, i.e. negate the expression.
// Select the amount column and negates all values. // Scala: df.select( -df("amount") ) // Java: df.select( negate(df.col("amount")) );
1.3.0
Given a date column, returns the first date which is later than the value of the date column that is on the specified day of the week.
Given a date column, returns the first date which is later than the value of the date column that is on the specified day of the week.
For example, next_day('2015-07-27', "Sunday")
returns 2015-08-02 because that is the first
Sunday after 2015-07-27.
Day of the week parameter is case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun".
1.5.0
Inversion of boolean expression, i.
Inversion of boolean expression, i.e. NOT.
// Scala: select rows that are not active (isActive === false) df.filter( !df("isActive") ) // Java: df.filter( not(df.col("isActive")) );
1.3.0
Window function: returns the ntile group id (from 1 to n
inclusive) in an ordered window
partition.
Window function: returns the ntile group id (from 1 to n
inclusive) in an ordered window
partition. Fow example, if n
is 4, the first quarter of the rows will get value 1, the second
quarter will get 2, the third quarter will get 3, and the last quarter will get 4.
This is equivalent to the NTILE function in SQL.
1.4.0
Window function: returns the relative rank (i.
Window function: returns the relative rank (i.e. percentile) of rows within a window partition.
This is computed by:
(rank of row in its partition - 1) / (number of rows in the partition - 1)
This is equivalent to the PERCENT_RANK function in SQL.
1.4.0
Returns the positive value of dividend mod divisor.
Returns the positive value of dividend mod divisor.
1.5.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Extracts the quarter as an integer from a given date/timestamp/string.
Extracts the quarter as an integer from a given date/timestamp/string.
1.5.0
Generate a random column with i.
Generate a random column with i.i.d. samples from U[0.0, 1.0].
1.4.0
Generate a random column with i.
Generate a random column with i.i.d. samples from U[0.0, 1.0].
1.4.0
Generate a column with i.
Generate a column with i.i.d. samples from the standard normal distribution.
1.4.0
Generate a column with i.
Generate a column with i.i.d. samples from the standard normal distribution.
1.4.0
Window function: returns the rank of rows within a window partition.
Window function: returns the rank of rows within a window partition.
The difference between rank and denseRank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using denseRank and had three people tie for second place, you would say that all three were in second place and that the next person came in third.
This is equivalent to the RANK function in SQL.
1.4.0
Extract a specific(idx) group identified by a java regex, from the specified string column.
Extract a specific(idx) group identified by a java regex, from the specified string column.
1.5.0
Replace all substrings of the specified string value that match regexp with rep.
Replace all substrings of the specified string value that match regexp with rep.
1.5.0
Repeats a string column n times, and returns it as a new string column.
Repeats a string column n times, and returns it as a new string column.
1.5.0
Reverses the string column and returns it as a new string column.
Reverses the string column and returns it as a new string column.
1.5.0
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
1.4.0
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
1.4.0
Round the value of e
to scale
decimal places if scale
>= 0
or at integral part when scale
< 0.
Round the value of e
to scale
decimal places if scale
>= 0
or at integral part when scale
< 0.
1.5.0
Returns the value of the column e
rounded to 0 decimal places.
Returns the value of the column e
rounded to 0 decimal places.
1.5.0
Window function: returns a sequential number starting at 1 within a window partition.
Window function: returns a sequential number starting at 1 within a window partition.
This is equivalent to the ROW_NUMBER function in SQL.
1.4.0
Right-padded with pad to a length of len.
Right-padded with pad to a length of len.
1.5.0
Trim the spaces from right end for the specified string value.
Trim the spaces from right end for the specified string value.
1.5.0
Extracts the seconds as an integer from a given date/timestamp/string.
Extracts the seconds as an integer from a given date/timestamp/string.
1.5.0
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.
1.5.0
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.
column to compute SHA-2 on.
one of 224, 256, 384, or 512.
1.5.0
Shift the the given value numBits left.
Shift the the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.
1.5.0
Shift the the given value numBits right.
Shift the the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
1.5.0
Unsigned shift the the given value numBits right.
Unsigned shift the the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.
1.5.0
Computes the signum of the given column.
Computes the signum of the given column.
1.4.0
Computes the signum of the given value.
Computes the signum of the given value.
1.4.0
Computes the sine of the given column.
Computes the sine of the given column.
1.4.0
Computes the sine of the given value.
Computes the sine of the given value.
1.4.0
Computes the hyperbolic sine of the given column.
Computes the hyperbolic sine of the given column.
1.4.0
Computes the hyperbolic sine of the given value.
Computes the hyperbolic sine of the given value.
1.4.0
Returns length of array or map.
Returns length of array or map.
1.5.0
Sorts the input array for the given column in ascending / descending order, according to the natural ordering of the array elements.
Sorts the input array for the given column in ascending / descending order, according to the natural ordering of the array elements.
1.5.0
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements.
Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements.
1.5.0
* Return the soundex code for the specified expression.
* Return the soundex code for the specified expression.
1.5.0
Partition ID of the Spark task.
Partition ID of the Spark task.
Note that this is indeterministic because it depends on data partitioning and task scheduling.
1.4.0
Splits str around pattern (pattern is a regular expression).
Splits str around pattern (pattern is a regular expression). NOTE: pattern is a string represent the regular expression.
1.5.0
Computes the square root of the specified float value.
Computes the square root of the specified float value.
1.5.0
Computes the square root of the specified float value.
Computes the square root of the specified float value.
1.3.0
Creates a new struct column that composes multiple input columns.
Creates a new struct column that composes multiple input columns.
1.4.0
Creates a new struct column.
Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be remained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col${index + 1}, i.e. col1, col2, col3, ...
1.4.0
Substring starts at pos
and is of length len
when str is String type or
returns the slice of byte array that starts at pos
in byte and is of length len
when str is Binary type
Substring starts at pos
and is of length len
when str is String type or
returns the slice of byte array that starts at pos
in byte and is of length len
when str is Binary type
1.5.0
Returns the substring from string str before count occurrences of the delimiter delim.
Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.
Aggregate function: returns the sum of all values in the given column.
Aggregate function: returns the sum of all values in the given column.
1.3.0
Aggregate function: returns the sum of all values in the expression.
Aggregate function: returns the sum of all values in the expression.
1.3.0
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
1.3.0
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
1.3.0
Computes the tangent of the given column.
Computes the tangent of the given column.
1.4.0
Computes the tangent of the given value.
Computes the tangent of the given value.
1.4.0
Computes the hyperbolic tangent of the given column.
Computes the hyperbolic tangent of the given column.
1.4.0
Computes the hyperbolic tangent of the given value.
Computes the hyperbolic tangent of the given value.
1.4.0
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
1.4.0
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
1.4.0
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
1.4.0
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
1.4.0
Converts the column into DateType.
Converts the column into DateType.
1.5.0
Assumes given timestamp is in given timezone and converts to UTC.
Assumes given timestamp is in given timezone and converts to UTC.
1.5.0
Translate any character in the src by a character in replaceString.
Translate any character in the src by a character in replaceString. The characters in replaceString is corresponding to the characters in matchingString. The translate will happen when any character in the string matching with the character in the matchingString.
1.5.0
Trim the spaces from both ends for the specified string column.
Trim the spaces from both ends for the specified string column.
1.5.0
Returns date truncated to the unit specified by the format.
Returns date truncated to the unit specified by the format.
1.5.0
Defines a user-defined function of 10 arguments as user-defined function (UDF).
Defines a user-defined function of 10 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 9 arguments as user-defined function (UDF).
Defines a user-defined function of 9 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 8 arguments as user-defined function (UDF).
Defines a user-defined function of 8 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 7 arguments as user-defined function (UDF).
Defines a user-defined function of 7 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 6 arguments as user-defined function (UDF).
Defines a user-defined function of 6 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 5 arguments as user-defined function (UDF).
Defines a user-defined function of 5 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 4 arguments as user-defined function (UDF).
Defines a user-defined function of 4 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 3 arguments as user-defined function (UDF).
Defines a user-defined function of 3 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 2 arguments as user-defined function (UDF).
Defines a user-defined function of 2 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 1 arguments as user-defined function (UDF).
Defines a user-defined function of 1 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 0 arguments as user-defined function (UDF).
Defines a user-defined function of 0 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Decodes a BASE64 encoded string column and returns it as a binary column.
Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.
1.5.0
Inverse of hex.
Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.
1.5.0
Convert time string with given pattern (see [http://docs.
Convert time string with given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix time stamp (in seconds), return null if fail.
1.5.0
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale, return null if fail.
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale, return null if fail.
1.5.0
Gets current Unix timestamp in seconds.
Gets current Unix timestamp in seconds.
1.5.0
Converts a string column to upper case.
Converts a string column to upper case.
1.3.0
Extracts the week number as an integer from a given date/timestamp/string.
Extracts the week number as an integer from a given date/timestamp/string.
1.5.0
Evaluates a list of conditions and returns one of multiple possible result expressions.
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer. // Scala: people.select(when(people("gender") === "male", 0) .when(people("gender") === "female", 1) .otherwise(2)) // Java: people.select(when(col("gender").equalTo("male"), 0) .when(col("gender").equalTo("female"), 1) .otherwise(2))
1.4.0
Extracts the year as an integer from a given date/timestamp/string.
Extracts the year as an integer from a given date/timestamp/string.
1.5.0
Call a Scala function of 10 arguments as user-defined function (UDF).
Call a Scala function of 10 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 9 arguments as user-defined function (UDF).
Call a Scala function of 9 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 8 arguments as user-defined function (UDF).
Call a Scala function of 8 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 7 arguments as user-defined function (UDF).
Call a Scala function of 7 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 6 arguments as user-defined function (UDF).
Call a Scala function of 6 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 5 arguments as user-defined function (UDF).
Call a Scala function of 5 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 4 arguments as user-defined function (UDF).
Call a Scala function of 4 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 3 arguments as user-defined function (UDF).
Call a Scala function of 3 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 2 arguments as user-defined function (UDF).
Call a Scala function of 2 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 1 arguments as user-defined function (UDF).
Call a Scala function of 1 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call a Scala function of 0 arguments as user-defined function (UDF).
Call a Scala function of 0 arguments as user-defined function (UDF). This requires you to specify the return data type.
(Since version 1.5.0) Use udf
1.3.0
Call an user-defined function.
Call an user-defined function. Example:
import org.apache.spark.sql._ val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") val sqlContext = df.sqlContext sqlContext.udf.register("simpleUDF", (v: Int) => v * v) df.select($"id", callUdf("simpleUDF", $"value"))
(Since version 1.5.0) Use callUDF
1.4.0
:: Experimental :: Functions available for DataFrame.
1.3.0