public final class DataFrameNaFunctions
extends java.lang.Object
DataFrame
s.
Modifier and Type | Method and Description |
---|---|
DataFrame |
drop()
Returns a new
DataFrame that drops rows containing any null or NaN values. |
DataFrame |
drop(int minNonNulls)
Returns a new
DataFrame that drops rows containing
less than minNonNulls non-null and non-NaN values. |
DataFrame |
drop(int minNonNulls,
scala.collection.Seq<java.lang.String> cols)
(Scala-specific) Returns a new
DataFrame that drops rows containing less than
minNonNulls non-null and non-NaN values in the specified columns. |
DataFrame |
drop(int minNonNulls,
java.lang.String[] cols)
Returns a new
DataFrame that drops rows containing
less than minNonNulls non-null and non-NaN values in the specified columns. |
DataFrame |
drop(scala.collection.Seq<java.lang.String> cols)
(Scala-specific) Returns a new
DataFrame that drops rows containing any null or NaN values
in the specified columns. |
DataFrame |
drop(java.lang.String how)
Returns a new
DataFrame that drops rows containing null or NaN values. |
DataFrame |
drop(java.lang.String[] cols)
Returns a new
DataFrame that drops rows containing any null or NaN values
in the specified columns. |
DataFrame |
drop(java.lang.String how,
scala.collection.Seq<java.lang.String> cols)
(Scala-specific) Returns a new
DataFrame that drops rows containing null or NaN values
in the specified columns. |
DataFrame |
drop(java.lang.String how,
java.lang.String[] cols)
Returns a new
DataFrame that drops rows containing null or NaN values
in the specified columns. |
DataFrame |
fill(double value)
Returns a new
DataFrame that replaces null or NaN values in numeric columns with value . |
DataFrame |
fill(double value,
scala.collection.Seq<java.lang.String> cols)
(Scala-specific) Returns a new
DataFrame that replaces null or NaN values in specified
numeric columns. |
DataFrame |
fill(double value,
java.lang.String[] cols)
Returns a new
DataFrame that replaces null or NaN values in specified numeric columns. |
DataFrame |
fill(java.util.Map<java.lang.String,java.lang.Object> valueMap)
Returns a new
DataFrame that replaces null values. |
DataFrame |
fill(scala.collection.immutable.Map<java.lang.String,java.lang.Object> valueMap)
(Scala-specific) Returns a new
DataFrame that replaces null values. |
DataFrame |
fill(java.lang.String value)
Returns a new
DataFrame that replaces null values in string columns with value . |
DataFrame |
fill(java.lang.String value,
scala.collection.Seq<java.lang.String> cols)
(Scala-specific) Returns a new
DataFrame that replaces null values in
specified string columns. |
DataFrame |
fill(java.lang.String value,
java.lang.String[] cols)
Returns a new
DataFrame that replaces null values in specified string columns. |
<T> DataFrame |
replace(scala.collection.Seq<java.lang.String> cols,
scala.collection.immutable.Map<T,T> replacement)
(Scala-specific) Replaces values matching keys in
replacement map. |
<T> DataFrame |
replace(java.lang.String[] cols,
java.util.Map<T,T> replacement)
Replaces values matching keys in
replacement map with the corresponding values. |
<T> DataFrame |
replace(java.lang.String col,
java.util.Map<T,T> replacement)
Replaces values matching keys in
replacement map with the corresponding values. |
<T> DataFrame |
replace(java.lang.String col,
scala.collection.immutable.Map<T,T> replacement)
(Scala-specific) Replaces values matching keys in
replacement map. |
public DataFrame drop()
DataFrame
that drops rows containing any null or NaN values.
public DataFrame drop(java.lang.String how)
DataFrame
that drops rows containing null or NaN values.
If how
is "any", then drop rows containing any null or NaN values.
If how
is "all", then drop rows only if every column is null or NaN for that row.
how
- (undocumented)public DataFrame drop(java.lang.String[] cols)
DataFrame
that drops rows containing any null or NaN values
in the specified columns.
cols
- (undocumented)public DataFrame drop(scala.collection.Seq<java.lang.String> cols)
DataFrame
that drops rows containing any null or NaN values
in the specified columns.
cols
- (undocumented)public DataFrame drop(java.lang.String how, java.lang.String[] cols)
DataFrame
that drops rows containing null or NaN values
in the specified columns.
If how
is "any", then drop rows containing any null or NaN values in the specified columns.
If how
is "all", then drop rows only if every specified column is null or NaN for that row.
how
- (undocumented)cols
- (undocumented)public DataFrame drop(java.lang.String how, scala.collection.Seq<java.lang.String> cols)
DataFrame
that drops rows containing null or NaN values
in the specified columns.
If how
is "any", then drop rows containing any null or NaN values in the specified columns.
If how
is "all", then drop rows only if every specified column is null or NaN for that row.
how
- (undocumented)cols
- (undocumented)public DataFrame drop(int minNonNulls)
DataFrame
that drops rows containing
less than minNonNulls
non-null and non-NaN values.
minNonNulls
- (undocumented)public DataFrame drop(int minNonNulls, java.lang.String[] cols)
DataFrame
that drops rows containing
less than minNonNulls
non-null and non-NaN values in the specified columns.
minNonNulls
- (undocumented)cols
- (undocumented)public DataFrame drop(int minNonNulls, scala.collection.Seq<java.lang.String> cols)
DataFrame
that drops rows containing less than
minNonNulls
non-null and non-NaN values in the specified columns.
minNonNulls
- (undocumented)cols
- (undocumented)public DataFrame fill(double value)
DataFrame
that replaces null or NaN values in numeric columns with value
.
value
- (undocumented)public DataFrame fill(java.lang.String value)
DataFrame
that replaces null values in string columns with value
.
value
- (undocumented)public DataFrame fill(double value, java.lang.String[] cols)
DataFrame
that replaces null or NaN values in specified numeric columns.
If a specified column is not a numeric column, it is ignored.
value
- (undocumented)cols
- (undocumented)public DataFrame fill(double value, scala.collection.Seq<java.lang.String> cols)
DataFrame
that replaces null or NaN values in specified
numeric columns. If a specified column is not a numeric column, it is ignored.
value
- (undocumented)cols
- (undocumented)public DataFrame fill(java.lang.String value, java.lang.String[] cols)
DataFrame
that replaces null values in specified string columns.
If a specified column is not a string column, it is ignored.
value
- (undocumented)cols
- (undocumented)public DataFrame fill(java.lang.String value, scala.collection.Seq<java.lang.String> cols)
DataFrame
that replaces null values in
specified string columns. If a specified column is not a string column, it is ignored.
value
- (undocumented)cols
- (undocumented)public DataFrame fill(java.util.Map<java.lang.String,java.lang.Object> valueMap)
DataFrame
that replaces null values.
The key of the map is the column name, and the value of the map is the replacement value.
The value must be of the following type:
Integer
, Long
, Float
, Double
, String
, Boolean
.
For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
import com.google.common.collect.ImmutableMap;
df.na.fill(ImmutableMap.of("A", "unknown", "B", 1.0));
valueMap
- (undocumented)public DataFrame fill(scala.collection.immutable.Map<java.lang.String,java.lang.Object> valueMap)
DataFrame
that replaces null values.
The key of the map is the column name, and the value of the map is the replacement value.
The value must be of the following type: Int
, Long
, Float
, Double
, String
, Boolean
.
For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.
df.na.fill(Map(
"A" -> "unknown",
"B" -> 1.0
))
valueMap
- (undocumented)public <T> DataFrame replace(java.lang.String col, java.util.Map<T,T> replacement)
replacement
map with the corresponding values.
Key and value of replacement
map must have the same type, and
can only be doubles, strings or booleans.
If col
is "*", then the replacement is applied on all string columns or numeric columns.
import com.google.common.collect.ImmutableMap;
// Replaces all occurrences of 1.0 with 2.0 in column "height".
df.replace("height", ImmutableMap.of(1.0, 2.0));
// Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
df.replace("name", ImmutableMap.of("UNKNOWN", "unnamed"));
// Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
df.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));
col
- name of the column to apply the value replacementreplacement
- value replacement map, as explained above
public <T> DataFrame replace(java.lang.String[] cols, java.util.Map<T,T> replacement)
replacement
map with the corresponding values.
Key and value of replacement
map must have the same type, and
can only be doubles, strings or booleans.
import com.google.common.collect.ImmutableMap;
// Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
df.replace(new String[] {"height", "weight"}, ImmutableMap.of(1.0, 2.0));
// Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
df.replace(new String[] {"firstname", "lastname"}, ImmutableMap.of("UNKNOWN", "unnamed"));
cols
- list of columns to apply the value replacementreplacement
- value replacement map, as explained above
public <T> DataFrame replace(java.lang.String col, scala.collection.immutable.Map<T,T> replacement)
replacement
map.
Key and value of replacement
map must have the same type, and
can only be doubles, strings or booleans.
If col
is "*",
then the replacement is applied on all string columns , numeric columns or boolean columns.
// Replaces all occurrences of 1.0 with 2.0 in column "height".
df.replace("height", Map(1.0 -> 2.0))
// Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
df.replace("name", Map("UNKNOWN" -> "unnamed")
// Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
df.replace("*", Map("UNKNOWN" -> "unnamed")
col
- name of the column to apply the value replacementreplacement
- value replacement map, as explained above
public <T> DataFrame replace(scala.collection.Seq<java.lang.String> cols, scala.collection.immutable.Map<T,T> replacement)
replacement
map.
Key and value of replacement
map must have the same type, and
can only be doubles , strings or booleans.
// Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
df.replace("height" :: "weight" :: Nil, Map(1.0 -> 2.0));
// Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
df.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed");
cols
- list of columns to apply the value replacementreplacement
- value replacement map, as explained above