Add Column to Dataframe with Constant Value

Add column with constant value to pandas dataframe

The reason this puts NaN into a column is because df.index and the Index of your right-hand-side object are different. @zach shows the proper way to assign a new column of zeros. In general, pandas tries to do as much alignment of indices as possible. One downside is that when indices are not aligned you get NaN wherever they aren't aligned. Play around with the reindex and align methods to gain some intuition for alignment works with objects that have partially, totally, and not-aligned-all aligned indices. For example here's how DataFrame.align() works with partially aligned indices:

In [7]: from pandas import DataFrame

In [8]: from numpy.random import randint

In [9]: df = DataFrame({'a': randint(3, size=10)})

In [10]:

In [10]: df
Out[10]:
a
0 0
1 2
2 0
3 1
4 0
5 0
6 0
7 0
8 0
9 0

In [11]: s = df.a[:5]

In [12]: dfa, sa = df.align(s, axis=0)

In [13]: dfa
Out[13]:
a
0 0
1 2
2 0
3 1
4 0
5 0
6 0
7 0
8 0
9 0

In [14]: sa
Out[14]:
0 0
1 2
2 0
3 1
4 0
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
Name: a, dtype: float64

Add column to dataframe with constant value

df['Name']='abc' will add the new column and set all rows to that value:

In [79]:

df
Out[79]:
Date, Open, High, Low, Close
0 01-01-2015, 565, 600, 400, 450
In [80]:

df['Name'] = 'abc'
df
Out[80]:
Date, Open, High, Low, Close Name
0 01-01-2015, 565, 600, 400, 450 abc

How to add a constant value to a column in python pandas?

You can do like below:

user['UID'] = 1

If just one row is getting filled, you can use ffill(). It will replicate the first row's value in all the rows.

user.UID = user.UID.ffill()

Add a column with a constant value to a DataFrame

A more general alternative is:

julia> insertcols!(df, :z => 1)
10×3 DataFrame
Row │ x y z
│ Int64 Char Int64
─────┼────────────────────
1 │ 1 a 1
2 │ 2 b 1
3 │ 3 c 1
4 │ 4 d 1
5 │ 5 e 1
6 │ 6 f 1
7 │ 7 g 1
8 │ 8 h 1
9 │ 9 i 1
10 │ 10 j 1

which by default does the same, but it additionally:

  1. allows you to specify the location of the new column;
  2. by default makes sure that you do not accidentally overwrite an existing column

How to add a constant value column to an empty dataframe?

You can do this if instead of relying on R to "recycle" the values the right number of times you explicitly use rep:

df = data.frame(x = numeric())
df['Country'] = rep("CHL", nrow(df))
df
# [1] x Country
# <0 rows> (or 0-length row.names)

df = data.frame(x = 1:3)
df['Country'] = rep("CHL", nrow(df))
df
# x Country
# 1 1 CHL
# 2 2 CHL
# 3 3 CHL

How to add a constant column in a Spark DataFrame?

Spark 2.2+

Spark 2.2 introduces typedLit to support Seq, Map, and Tuples (SPARK-19254) and following calls should be supported (Scala):

import org.apache.spark.sql.functions.typedLit

df.withColumn("some_array", typedLit(Seq(1, 2, 3)))
df.withColumn("some_struct", typedLit(("foo", 1, 0.3)))
df.withColumn("some_map", typedLit(Map("key1" -> 1, "key2" -> 2)))

Spark 1.3+ (lit), 1.4+ (array, struct), 2.0+ (map):

The second argument for DataFrame.withColumn should be a Column so you have to use a literal:

from pyspark.sql.functions import lit

df.withColumn('new_column', lit(10))

If you need complex columns you can build these using blocks like array:

from pyspark.sql.functions import array, create_map, struct

df.withColumn("some_array", array(lit(1), lit(2), lit(3)))
df.withColumn("some_struct", struct(lit("foo"), lit(1), lit(.3)))
df.withColumn("some_map", create_map(lit("key1"), lit(1), lit("key2"), lit(2)))

Exactly the same methods can be used in Scala.

import org.apache.spark.sql.functions.{array, lit, map, struct}

df.withColumn("new_column", lit(10))
df.withColumn("map", map(lit("key1"), lit(1), lit("key2"), lit(2)))

To provide names for structs use either alias on each field:

df.withColumn(
"some_struct",
struct(lit("foo").alias("x"), lit(1).alias("y"), lit(0.3).alias("z"))
)

or cast on the whole object

df.withColumn(
"some_struct",
struct(lit("foo"), lit(1), lit(0.3)).cast("struct<x: string, y: integer, z: double>")
)

It is also possible, although slower, to use an UDF.

Note:

The same constructs can be used to pass constant arguments to UDFs or SQL functions.

how to add a constant column to a dataframe without rows

You can use .loc specifying the row index and column label, as follows:

df.loc[0, 'foo'] = 'bar'

Result:

print(df)

a b c foo
0 NaN NaN NaN bar

You can also use:

df['foo'] = ['bar']

Result:

print(df)

a b c foo
0 NaN NaN NaN bar

If you have a bunch of a mix of empty and non-empty dataframes and you want to assign new column to it, you can try the following code:

df['foo'] = ['bar'] * (df.shape[0] if df.shape[0] else 1)

This will assign the constant with the same length (number of rows) for non-empty dataframes and will also assign one new row for empty dataframe with the constant value for the column.

How to add column to a dataframe which remains constant with respect to date?

Lets try map the dict of df1.date: df1.C to the extract of date in df2.time.

df2['C']=(pd.to_datetime(df2.time).dt.date).astype(str).map(dict(zip(df1.date,df1.C)))

How it works

#Extract date from df2.time

df2['temp']=pd.to_datetime(df2.time).dt.date

#Create dict from df1.date and df1.C
D=dict(zip(df1.date,df1.C))

#Create new column df['C'] by mapping D to df2.temp

df2.temp.map(D)

Outcome

               time     open     high      low         C
0 2020-09-16 22:54:00 1.29708 1.29711 1.29695 1.287623
1 2020-09-16 22:55:00 1.29698 1.29703 1.29681 1.287623
2 2020-09-17 22:56:00 1.29701 1.29709 1.29689 1.294943
3 2020-09-17 22:57:00 1.29702 1.29720 1.29701 1.294943
4 2020-09-17 22:58:00 1.29717 1.29720 1.29715 1.294943

Alternatively as suggested by @Erfan

#Rename columns of df1 as follows

df1=df1[["date", "C"]].rename(columns={"date": "time"})

#Coerce df2.time to date
df2['time']=pd.to_datetime(df2['time']).dt.date

#Merge df2 and df1
df2.merge(df1, how='left')


Related Topics



Leave a reply



Submit