How do you delete a column by name in data.table?
Any of the following will remove column foo
from the data.table df3
:
# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table)
df3[,foo:=NULL]
df3[, c("foo","bar"):=NULL] # remove two columns
myVar = "foo"
df3[, (myVar):=NULL] # lookup myVar contents
# Method 2a -- A safe idiom for excluding (possibly multiple)
# columns matching a regex
df3[, grep("^foo$", colnames(df3)):=NULL]
# Method 2b -- An alternative to 2a, also "safe" in the sense described below
df3[, which(grepl("^foo$", colnames(df3))):=NULL]
data.table also supports the following syntax:
## Method 3 (could then assign to df3,
df3[, !"foo"]
though if you were actually wanting to remove column "foo"
from df3
(as opposed to just printing a view of df3
minus column "foo"
) you'd really want to use Method 1 instead.
(Do note that if you use a method relying on grep()
or grepl()
, you need to set pattern="^foo$"
rather than "foo"
, if you don't want columns with names like "fool"
and "buffoon"
(i.e. those containing foo
as a substring) to also be matched and removed.)
Less safe options, fine for interactive use:
The next two idioms will also work -- if df3
contains a column matching "foo"
-- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column "bar"
, you'll end up with a zero-row data.table.
As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring "foo"
. For programming purposes (or if you are wanting to actually remove the column(s) from df3
rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.
# Method 4:
df3[, .SD, .SDcols = !patterns("^foo$")]
Lastly there are approaches using with=FALSE
, though data.table
is gradually moving away from using this argument so it's now discouraged where you can avoid it; showing here so you know the option exists in case you really do need it:
# Method 5a (like Method 3)
df3[, !"foo", with=FALSE]
# Method 5b (like Method 4)
df3[, !grep("^foo$", names(df3)), with=FALSE]
# Method 5b (another like Method 4)
df3[, !grepl("^foo$", names(df3)), with=FALSE]
Drop data frame columns by name
There's also the subset
command, useful if you know which columns you want:
df <- data.frame(a = 1:10, b = 2:11, c = 3:12)
df <- subset(df, select = c(a, c))
UPDATED after comment by @hadley: To drop columns a,c you could do:
df <- subset(df, select = -c(a, c))
Remove Column from Data Table C#
DataTable t;
t.Columns.Remove("columnName");
t.Columns.RemoveAt(columnIndex);
Remove columns from DataTable which are not in List string
If you just want to remove all the columns not found in list 'A'
var A = new List<string> { "a", "b", "c" };
var toRemove = dt.Columns.Cast<DataColumn>().Select(x => x.ColumnName).Except(A).ToList();
foreach (var col in toRemove) dt.Columns.Remove(col);
Keeping Data Columns by name and removing the others within a DataTable
You can't change a collection while you are enumerating on it. You can change your code to a standard for loop with a backward indexing like this
for(int x = dataTable.Columns.Count - 1; x >= 0; x--)
{
DataColumn dc = dataTable.Columns[x];
if(dc.ColumnName != "Cat" && dc.ColumnName != "Dog" &&
dc.ColumnName != "Turtle " && dc.ColumnName != "Lion")
{
dc.Columns.Remove(dataColumn)
}
}
The looping in reverse is required to avoid jumping columns when you remove an item from the collection. Also, as explained, in the comment below, you need to use the && logical operator to remove ALL the columns that don't have a name like the four one you want to preserve. Using the || logical operator will remove all of your columns because the column named "Lion" will be removed because its name is not "cat" (or anything else in the if condition).
There is also the possibility to use a DataView to extract only the columns you need, but this has the drawback to require a second datatable in memory and you could encounter problems if your data set is really big.
DataTable datatable = CSVReader.CSVInput(filepath);
DataView dv = new DataView(datatable);
DataTable newTable = dv.ToTable(false, new string[] {"cat", "dog", "turtle", "lion"});
Delete multiple columns by reference using reverse selection in data.Table
We can use the setdiff
to get the names
of the dataset that are not in the list_to_keep
and assign (:=
) it to NULL
df[, setdiff(names(df), list_to_keep) := NULL]
As @rosscova mentioned, using which
on the logical vector
can be used to get the position of the column and to assign the columns to NULL
df[, which(!names(df)%in%list_to_keep):=NULL]
How to drop columns by name in a data frame
You should use either indexing or the subset
function. For example :
R> df <- data.frame(x=1:5, y=2:6, z=3:7, u=4:8)
R> df
x y z u
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8
Then you can use the which
function and the -
operator in column indexation :
R> df[ , -which(names(df) %in% c("z","u"))]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
Or, much simpler, use the select
argument of the subset
function : you can then use the -
operator directly on a vector of column names, and you can even omit the quotes around the names !
R> subset(df, select=-c(z,u))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
Note that you can also select the columns you want instead of dropping the others :
R> df[ , c("x","y")]
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
R> subset(df, select=c(x,y))
x y
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
Remove columns from DataTable in C#
Aside from limiting the columns selected to reduce bandwidth and memory:
DataTable t;
t.Columns.Remove("columnName");
t.Columns.RemoveAt(columnIndex);
Related Topics
Use R Code or Windows User Variable ("%Userprofile%") in Yaml
Converting Factors to Binary in R
Calculate Group Mean While Excluding Current Observation Using Dplyr
Dt: Dynamically Change Column Values Based on Selectinput from Another Column in R Shiny App
What You Can Do with a Data.Frame That You Can't with a Data.Table
Insert a Blank Row After Each Group of Data
How to Sort All Dataframes in a List of Dataframes on the Same Column
Sort a String of Comma-Separated Items Alphabetically
R: Reshaping Multiple Columns from Long to Wide
How to Merge and Sum Two Data Frames
How to Split a Data Frame into Multiple Dataframes with Each Two Columns as a New Dataframe
How to Extract Certain Columns from a List of Data Frames
Why Use As.Factor() Instead of Just Factor()