Select a sequence of columns: `:` works but not `seq`
On recent versions of data.table, numbers can be used in j
to specify columns. This behaviour includes formats such as DT[,1:2]
to specify a numeric range of columns. (Note that this syntax does not work on older versions of data.table).
So why does DT[ , 1:2]
work, but DT[ , seq(1:2)]
does not? The answer is buried in the code for data.table:::[.data.table
, which includes the lines:
if (!missing(j)) {
jsub = replace_dot_alias(substitute(j))
root = if (is.call(jsub))
as.character(jsub[[1L]])[1L]
else ""
if (root == ":" || (root %chin% c("-", "!") && is.call(jsub[[2L]]) &&
jsub[[2L]][[1L]] == "(" && is.call(jsub[[2L]][[2L]]) &&
jsub[[2L]][[2L]][[1L]] == ":") || (!length(all.vars(jsub)) &&
root %chin% c("", "c", "paste", "paste0", "-", "!") &&
missing(by))) {
with = FALSE
}
We can see here that data.table
is automatically setting the with = FALSE
parameter for you when it detects the use of function :
in j
. It doesn't have the same functionality built in for seq
, so we have to specify with = FALSE
ourselves if we want to use the seq
syntax.
DT[ , seq(1:2), with = FALSE]
Spark get a column as sequence for usage in zeppelin select form
You can try getting a tuple of object and string from the RDD, and use toIterable
to convert to Iterable[(Object, String)]
:
val testIter = data.select("file", "id").collect().map(
x => (x.getAs[Object](0), x.getAs[String](1))
).toIterable
How to use DISTINCT used while selecting all columns including sequence number column?
For two columns this query will be enough:
SELECT name, min(seq_num)
FROM table
GROUP BY name
For more column, use row_number
analytic functon
SELECT name, col1, col2, .... col500, seq_num
FROM (
SELECT t.*, row_number() over (partition by name order by seq_num ) As rn
FROM table t
)
WHERE rn = 1
The above queries pick only one row with a given name and the smallest seq_num value for each name.
Scala Spark DataFrame : dataFrame.select multiple columns given a Sequence of column names
val columnNames = Seq("col1","col2",....."coln")
// using the string column names:
val result = dataframe.select(columnNames.head, columnNames.tail: _*)
// or, equivalently, using Column objects:
val result = dataframe.select(columnNames.map(c => col(c)): _*)
Get table and column owning a sequence
Get the "owning" table and column
ALTER SEQUENCE seqName OWNED BY table.id;
Your ALTER SEQUENCE
statement causes an entry in the system catalog pg_depend
with the dependency type (deptype
) 'a' and a refobjsubid
greater than 0, pointing to the attribute number (attnum
) in pg_attribute
. With that knowledge you can devise a simple query:
SELECT d.refobjid::regclass, a.attname
FROM pg_depend d
JOIN pg_attribute a ON a.attrelid = d.refobjid
AND a.attnum = d.refobjsubid
WHERE d.objid = 'public."seqName"'::regclass -- your sequence here
AND d.refobjsubid > 0
AND d.classid = 'pg_class'::regclass;
Double quotes (
""
) are only needed for otherwise illegal names (mixed case, reserved words, ...).No need to assert that
refclassid
is of typeregclass
since the join topg_attribute
does that automatically.
No need to assert that the sequence is a sequence since schema-qualified object names are unique across the database.
No need to join topg_class
orpg_namespace
at all.The schema name is only needed to disambiguate or if it's not in the
search_path
.
The same table name (or sequence name for that matter) can be used in multiple schemas. A cast to the object identifier typeregclass
observes the currentsearch_path
to pick the best match if you omit the schema qualification. If the table is not visible, you get an error message.What's more, a
regclass
type is displayed astext
to the user automatically. (If not, cast totext
.) The schema-name is prepended automatically where necessary to be unambiguous in your session.
Get the actual "owner" (the role)
To get the role owning a specific sequence, as requested:
SELECT c.relname, u.usename
FROM pg_class c
JOIN pg_user u ON u.usesysid = c.relowner
WHERE c.oid = '"seqName"'::regclass; -- your sequence here
Run table for all columns in sequence
Excluding Id column reshape from wide-to-long using stack, then table to get counts including NAs, t
ranspose to have column names as rows, then convert table object to dataframe:
data.frame(rbind(t(table(stack(d[, -1]), useNA = "always"))))
# X82 X87 X88 NA.
# Col_A_1 1 2 2 0
# Col_A_2 1 3 1 0
# Col_A_3 3 1 0 1
# Col_A_100 1 1 3 0
# NA. 0 0 0 0
Spark dataframe how to select columns using Seq[String]
Use :
.select(
(colsWithoutPlanWeekData.map(c => col(c)) ++ Seq(
col("bbDemoImpsAttribute.bbDemoImpsAttributes.demoId").as("bbDemoId"),
col("demoValuesAttribute.demoAttributes.demoId").as("demoId"),
col("hhDemoAttribute.demoId").as("hhDemoId"))): _*
)
Concat the 2 Seq
before using the syntactic-sugar : _*
Add a sequence column in a query
I suspect that you are looking about numbering the rows based on the rate
so use an analytic function like this :
select ref_leger_code, rate, sumbalance, due_date,
ROW_NUMBER() OVER (PARTITION BY rate ORDER BY due_date asc ) AS sequence
from (
select ref_leger_code, rate, sum(balance) sumbalance, to_char(due_date,'yyyymm') due_date
from tbl_value_temp
group by ref_leger_code, rate, to_char(due_date,'yyyymm')
);
Related Topics
Take the Subsets of a Data.Frame with the Same Feature and Select a Single Row from Each Subset
Rolling by Group in Data.Table R
R: Pivoting Using 'Spread' Function
R - Converting Posixct to Milliseconds
Extra Curly Braces When Using Xtable and Knitr, After Specifiying Size
How to Change Gender Factor into an Numerical Coding in R
R Dplyr Subset with Missing Columns
How to Drop Factor Levels While Scraping Data Off Us Census HTML Site
How to Get This Data Structure in R
R: How to Judge Date in the Same Week
Combining Rows Based on a Column
Using Jupyter R Kernel with Visual Studio Code
How to Use Geom_Rect with Discrete Axis Values
Adding a New Column to Matrix Error