Create an Empty Data.Frame

Creating an empty Pandas DataFrame, then filling it?

Here's a couple of suggestions:

Use date_range for the index:

import datetime
import pandas as pd
import numpy as np

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

columns = ['A','B', 'C']

Note: we could create an empty DataFrame (with NaNs) simply by writing:

df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs

To do these type of calculations for the data, use a numpy array:

data = np.array([np.arange(10)]*3).T

Hence we can create the DataFrame:

In [10]: df = pd.DataFrame(data, index=index, columns=columns)

In [11]: df
Out[11]:
A B C
2012-11-29 0 0 0
2012-11-30 1 1 1
2012-12-01 2 2 2
2012-12-02 3 3 3
2012-12-03 4 4 4
2012-12-04 5 5 5
2012-12-05 6 6 6
2012-12-06 7 7 7
2012-12-07 8 8 8
2012-12-08 9 9 9

how to create python empty dataframe where df.empty results in True

The simplest is pd.DataFrame():

df = pd.DataFrame()   

df.empty
# True

If you want create a data frame of specify number of columns:

df = pd.DataFrame(columns=['A', 'B'])

df.empty
# True

Besides, an array of shape (1, 1) is not empty (it has one row), which is the reason you get empty = False, in order to create an empty array, it needs to be of shape (0, n):

df = pd.DataFrame(pd.np.empty((0, 3)))

df.empty
# True

Pandas create empty DataFrame with only column names

You can create an empty DataFrame with either column names or an Index:

In [4]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
In [6]: df
Out[6]:
Empty DataFrame
Columns: [A, B, C, D, E, F, G]
Index: []

Or

In [7]: df = pd.DataFrame(index=range(1,10))
In [8]: df
Out[8]:
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Edit:
Even after your amendment with the .to_html, I can't reproduce. This:

df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
df.to_html('test.html')

Produces:

<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
</tr>
</thead>
<tbody>
</tbody>
</table>

Create an empty data.frame

Just initialize it with empty vectors:

df <- data.frame(Date=as.Date(character()),
File=character(),
User=character(),
stringsAsFactors=FALSE)

Here's an other example with different column types :

df <- data.frame(Doubles=double(),
Ints=integer(),
Factors=factor(),
Logicals=logical(),
Characters=character(),
stringsAsFactors=FALSE)

str(df)
> str(df)
'data.frame': 0 obs. of 5 variables:
$ Doubles : num
$ Ints : int
$ Factors : Factor w/ 0 levels:
$ Logicals : logi
$ Characters: chr

N.B. :

Initializing a data.frame with an empty column of the wrong type does not prevent further additions of rows having columns of different types.

This method is just a bit safer in the sense that you'll have the correct column types from the beginning, hence if your code relies on some column type checking, it will work even with a data.frame with zero rows.

Initializing an empty DataFrame and appending rows

df.concat() over an array of dataframes is probably the way to go, especially for clean CSVs. But in case you suspect your CSVs are either dirty or could get recognized by read_csv() with mixed types between files, you may want to explicity create each dataframe in a loop.

You can initialize a dataframe for the first file, and then each subsequent file start with an empty dataframe based on the first.

df2 = pd.DataFrame(data=None, columns=df1.columns,index=df1.index)

This takes the structure of dataframe df1 but no data, and create df2. If you want to force data type on columns, then you can do it to df1 when it is created, before its structure is copied.

more details

Create empty data frame with column names by assigning a string vector?

How about:

df <- data.frame(matrix(ncol = 3, nrow = 0))
x <- c("name", "age", "gender")
colnames(df) <- x

To do all these operations in one-liner:

setNames(data.frame(matrix(ncol = 3, nrow = 0)), c("name", "age", "gender"))

#[1] name age gender
#<0 rows> (or 0-length row.names)

Or

data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("name", "age", "gender"))))

How do I initialize an empty data frame *with a Date column* in R?

The issue is more likely due to concatenation as vector cannot have multiple class. We can pass a data.frame

rbind(test, setNames(data.frame(as.Date("2020-01-01"), 123, 456), 
names(test)))

-output

        date var1 var2
1 2020-01-01 123 456

How to make pandas read empty data file as an empty dataframe?

You can catch exception and create a empty data frame

import pandas as pd
zero = pd.DataFrame()
fh = open('/tmp/test.txt','r')
fh.close()
try:
newzero = pd.read_csv('test.txt',header=None)
except pd.errors.EmptyDataError:
newzero = pd.DataFrame()



Related Topics



Leave a reply



Submit