Add column with constant value to pandas dataframe
The reason this puts NaN
into a column is because df.index
and the Index
of your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandas
tries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaN
wherever they aren't aligned. Play around with the reindex
and align
methods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align()
works with partially aligned indices:
In [7]: from pandas import DataFrame
In [8]: from numpy.random import randint
In [9]: df = DataFrame({'a': randint(3, size=10)})
In [10]:
In [10]: df
Out[10]:
a
0 0
1 2
2 0
3 1
4 0
5 0
6 0
7 0
8 0
9 0
In [11]: s = df.a[:5]
In [12]: dfa, sa = df.align(s, axis=0)
In [13]: dfa
Out[13]:
a
0 0
1 2
2 0
3 1
4 0
5 0
6 0
7 0
8 0
9 0
In [14]: sa
Out[14]:
0 0
1 2
2 0
3 1
4 0
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
Name: a, dtype: float64
Add column to dataframe with constant value
df['Name']='abc'
will add the new column and set all rows to that value:
In [79]:
df
Out[79]:
Date, Open, High, Low, Close
0 01-01-2015, 565, 600, 400, 450
In [80]:
df['Name'] = 'abc'
df
Out[80]:
Date, Open, High, Low, Close Name
0 01-01-2015, 565, 600, 400, 450 abc
How to add a constant value to a column in python pandas?
You can do like below:
user['UID'] = 1
If just one row is getting filled, you can use ffill()
. It will replicate the first row's value in all the rows.
user.UID = user.UID.ffill()
Add a column with a constant value to a DataFrame
A more general alternative is:
julia> insertcols!(df, :z => 1)
10×3 DataFrame
Row │ x y z
│ Int64 Char Int64
─────┼────────────────────
1 │ 1 a 1
2 │ 2 b 1
3 │ 3 c 1
4 │ 4 d 1
5 │ 5 e 1
6 │ 6 f 1
7 │ 7 g 1
8 │ 8 h 1
9 │ 9 i 1
10 │ 10 j 1
which by default does the same, but it additionally:
- allows you to specify the location of the new column;
- by default makes sure that you do not accidentally overwrite an existing column
How to add a constant value column to an empty dataframe?
You can do this if instead of relying on R to "recycle" the values the right number of times you explicitly use rep
:
df = data.frame(x = numeric())
df['Country'] = rep("CHL", nrow(df))
df
# [1] x Country
# <0 rows> (or 0-length row.names)
df = data.frame(x = 1:3)
df['Country'] = rep("CHL", nrow(df))
df
# x Country
# 1 1 CHL
# 2 2 CHL
# 3 3 CHL
How to add a constant column in a Spark DataFrame?
Spark 2.2+
Spark 2.2 introduces typedLit
to support Seq
, Map
, and Tuples
(SPARK-19254) and following calls should be supported (Scala):
import org.apache.spark.sql.functions.typedLit
df.withColumn("some_array", typedLit(Seq(1, 2, 3)))
df.withColumn("some_struct", typedLit(("foo", 1, 0.3)))
df.withColumn("some_map", typedLit(Map("key1" -> 1, "key2" -> 2)))
Spark 1.3+ (lit
), 1.4+ (array
, struct
), 2.0+ (map
):
The second argument for DataFrame.withColumn
should be a Column
so you have to use a literal:
from pyspark.sql.functions import lit
df.withColumn('new_column', lit(10))
If you need complex columns you can build these using blocks like array
:
from pyspark.sql.functions import array, create_map, struct
df.withColumn("some_array", array(lit(1), lit(2), lit(3)))
df.withColumn("some_struct", struct(lit("foo"), lit(1), lit(.3)))
df.withColumn("some_map", create_map(lit("key1"), lit(1), lit("key2"), lit(2)))
Exactly the same methods can be used in Scala.
import org.apache.spark.sql.functions.{array, lit, map, struct}
df.withColumn("new_column", lit(10))
df.withColumn("map", map(lit("key1"), lit(1), lit("key2"), lit(2)))
To provide names for structs
use either alias
on each field:
df.withColumn(
"some_struct",
struct(lit("foo").alias("x"), lit(1).alias("y"), lit(0.3).alias("z"))
)
or cast
on the whole object
df.withColumn(
"some_struct",
struct(lit("foo"), lit(1), lit(0.3)).cast("struct<x: string, y: integer, z: double>")
)
It is also possible, although slower, to use an UDF.
Note:
The same constructs can be used to pass constant arguments to UDFs or SQL functions.
how to add a constant column to a dataframe without rows
You can use .loc
specifying the row index and column label, as follows:
df.loc[0, 'foo'] = 'bar'
Result:
print(df)
a b c foo
0 NaN NaN NaN bar
You can also use:
df['foo'] = ['bar']
Result:
print(df)
a b c foo
0 NaN NaN NaN bar
If you have a bunch of a mix of empty and non-empty dataframes and you want to assign new column to it, you can try the following code:
df['foo'] = ['bar'] * (df.shape[0] if df.shape[0] else 1)
This will assign the constant with the same length (number of rows) for non-empty dataframes and will also assign one new row for empty dataframe with the constant value for the column.
How to add column to a dataframe which remains constant with respect to date?
Lets try map the dict
of df1.date: df1.C
to the extract of date in df2.time
.
df2['C']=(pd.to_datetime(df2.time).dt.date).astype(str).map(dict(zip(df1.date,df1.C)))
How it works
#Extract date from df2.time
df2['temp']=pd.to_datetime(df2.time).dt.date
#Create dict from df1.date and df1.C
D=dict(zip(df1.date,df1.C))
#Create new column df['C'] by mapping D to df2.temp
df2.temp.map(D)
Outcome
time open high low C
0 2020-09-16 22:54:00 1.29708 1.29711 1.29695 1.287623
1 2020-09-16 22:55:00 1.29698 1.29703 1.29681 1.287623
2 2020-09-17 22:56:00 1.29701 1.29709 1.29689 1.294943
3 2020-09-17 22:57:00 1.29702 1.29720 1.29701 1.294943
4 2020-09-17 22:58:00 1.29717 1.29720 1.29715 1.294943
Alternatively as suggested by @Erfan
#Rename columns of df1 as follows
df1=df1[["date", "C"]].rename(columns={"date": "time"})
#Coerce df2.time to date
df2['time']=pd.to_datetime(df2['time']).dt.date
#Merge df2 and df1
df2.merge(df1, how='left')
Related Topics
Split a String to Even Sized Chunks
Tkinter: Using Scrollbars on a Canvas
Getting Rid of Console Output When Freezing Python Programs Using Pyinstaller
How to Get a Raw, Compiled SQL Query from a SQLalchemy Expression
Why Is the Global Keyword Not Required in This Case
Sqlalchemy Orm Conversion to Pandas Dataframe
Function Name Is Undefined in Python Class
Shuffling/Permutating a Dataframe in Pandas
How to Read Hdf5 Files in Python
How to Convert a Numpy Array to Pil Image Applying Matplotlib Colormap
Process to Convert Simple Python Script into Windows Executable
Implement Matlab's Im2Col 'Sliding' in Python
Python Read from Subprocess Stdout and Stderr Separately While Preserving Order
Implement Matlab's Im2Col 'Sliding' in Python