Reordering Columns in a Large Dataframe

Reordering columns in a large dataframe

If you're just moving certain columns to the end, you can create a little helper-function like the following:

movetolast <- function(data, move) {
data[c(setdiff(names(data), move), move)]
}

movetolast(df, c("b", "c"))
# a d e f b c
# 1 1 Rabbit Cat Cat Cat Dog
# 2 2 Cat Dog Dog Dog Rabbit
# 3 3 Dog Dog Dog Rabbit Cat
# 4 4 Dog Rabbit Rabbit Cat Dog
# 5 5 Rabbit Cat Cat Dog Dog

I would not recommend getting too into the habit of using column positions, especially not from a programmatic standpoint, since those positions might change.


"For fun" update

Here's an extended interpretation of the above function. It allows you to move columns to either the first or last position, or to be before or after another column.

moveMe <- function(data, tomove, where = "last", ba = NULL) {
temp <- setdiff(names(data), tomove)
x <- switch(
where,
first = data[c(tomove, temp)],
last = data[c(temp, tomove)],
before = {
if (is.null(ba)) stop("must specify ba column")
if (length(ba) > 1) stop("ba must be a single character string")
data[append(temp, values = tomove, after = (match(ba, temp)-1))]
},
after = {
if (is.null(ba)) stop("must specify ba column")
if (length(ba) > 1) stop("ba must be a single character string")
data[append(temp, values = tomove, after = (match(ba, temp)))]
})
x
}

Try it with the following.

moveMe(df, c("b", "c"))
moveMe(df, c("b", "c"), "first")
moveMe(df, c("b", "c"), "before", "e")
moveMe(df, c("b", "c"), "after", "e")

You'll need to adapt it to have some error checking--for instance, if you try to move columns "b" and "c" to "before c", you'll (obviously) get an error.

Reordering columns in large data frame

A better option would be to scan and use that in rearrangng the columns

test <- scan('order.txt', sep=",", quiet = TRUE)

Set order of columns in pandas dataframe

Just select the order yourself by typing in the column names. Note the double brackets:

frame = frame[['column I want first', 'column I want second'...etc.]]

How to change the column order in a pandas dataframe when there are too many columns?

You could use a column mask:

>>> mysubset = ["d","f"]
>>> mask = df.columns.isin(mysubset)
>>> pd.concat([df.loc[:,mask], df.loc[:,~mask]], axis=1)
d f a b c e g h i
0 2 4 5 8 7 1 1 2 3
1 2 4 1 4 2 3 1 5 3

or use sorted:

>>> mysubset = ["d","f"]
>>> df[sorted(df, key=lambda x: x not in mysubset)]
d f a b c e g h i
0 2 4 5 8 7 1 1 2 3
1 2 4 1 4 2 3 1 5 3

which works because x not in mysubset will be False for d and f, and False < True.

pandas how to swap or reorder columns

Two column Swapping

cols = list(df.columns)
a, b = cols.index('LastName'), cols.index('MiddleName')
cols[b], cols[a] = cols[a], cols[b]
df = df[cols]

Reorder column Swapping (2 swaps)

cols = list(df.columns)
a, b, c, d = cols.index('LastName'), cols.index('MiddleName'), cols.index('Contact'), cols.index('EmployeeID')
cols[a], cols[b], cols[c], cols[d] = cols[b], cols[a], cols[d], cols[c]
df = df[cols]

Swapping Multiple

Now it comes down to how you can play with list slices -

cols = list(df.columns)
cols = cols[1::2] + cols[::2]
df = df[cols]

Reorder Pandas Columns

Given some DataFrame that looks like this:

df = pd.DataFrame(columns=range(1, 11))
df

Empty DataFrame
Columns: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Index: []

You can reorder the DataFrame's columns by index using iloc. I use np.r_ to make selection easy:

df.iloc[:, np.r_[0:2, 7:10, 2:8]]

Empty DataFrame
Columns: [1, 2, 8, 9, 10, 3, 4, 5, 6, 7, 8]
Index: []

Reorder certain columns in pandas dataframe

You can try to reorder like this:

first_cols = ['A','B','C']
last_cols = [col for col in df.columns if col not in first_cols]

df = df[first_cols+last_cols]

Reordering columns in data frame once again

We can use mixedsort from gtools to arrange the 'q' columns.

library(gtools)
i1 <- grep("q\\d+", names(mayData))
nm1 <- mixedsort(names(mayData)[i1])
mayData[c(setdiff(names(mayData), nm1), nm1)]
# age Country Bank year q1 q6 q9 q10 q11
#1 10 Country 1 bank 1 1950 1 1 1 1 1
#2 12 Country 2 bank 2 1960 1 1 1 1 1
#3 13 Country 3 bank 3 1970 1 1 1 1 1
#4 10 Country 1 bank 1 1980 2 2 2 2 2
#5 11 Country 2 bank 2 1990 2 2 2 2 2
#6 15 Country 3 bank 3 2000 2 2 2 2 2

NOTE: Using only base R functions and a single package.

Or as @Cath mentioned, removing the substring with gsub can be used to order as well

sort(as.numeric(sub("^q", "", names(mayData)[i1])))


Related Topics



Leave a reply



Submit