Creating an empty Pandas DataFrame, then filling it?
Here's a couple of suggestions:
Use date_range
for the index:
import datetime
import pandas as pd
import numpy as np
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')
columns = ['A','B', 'C']
Note: we could create an empty DataFrame (with NaN
s) simply by writing:
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0) # with 0s rather than NaNs
To do these type of calculations for the data, use a numpy array:
data = np.array([np.arange(10)]*3).T
Hence we can create the DataFrame:
In [10]: df = pd.DataFrame(data, index=index, columns=columns)
In [11]: df
Out[11]:
A B C
2012-11-29 0 0 0
2012-11-30 1 1 1
2012-12-01 2 2 2
2012-12-02 3 3 3
2012-12-03 4 4 4
2012-12-04 5 5 5
2012-12-05 6 6 6
2012-12-06 7 7 7
2012-12-07 8 8 8
2012-12-08 9 9 9
how to create python empty dataframe where df.empty results in True
The simplest is pd.DataFrame()
:
df = pd.DataFrame()
df.empty
# True
If you want create a data frame of specify number of columns:
df = pd.DataFrame(columns=['A', 'B'])
df.empty
# True
Besides, an array of shape (1, 1) is not empty (it has one row), which is the reason you get empty = False, in order to create an empty array, it needs to be of shape (0, n):
df = pd.DataFrame(pd.np.empty((0, 3)))
df.empty
# True
Pandas create empty DataFrame with only column names
You can create an empty DataFrame with either column names or an Index:
In [4]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
In [6]: df
Out[6]:
Empty DataFrame
Columns: [A, B, C, D, E, F, G]
Index: []
Or
In [7]: df = pd.DataFrame(index=range(1,10))
In [8]: df
Out[8]:
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Edit:
Even after your amendment with the .to_html, I can't reproduce. This:
df = pd.DataFrame(columns=['A','B','C','D','E','F','G'])
df.to_html('test.html')
Produces:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>F</th>
<th>G</th>
</tr>
</thead>
<tbody>
</tbody>
</table>
Create an empty data.frame
Just initialize it with empty vectors:
df <- data.frame(Date=as.Date(character()),
File=character(),
User=character(),
stringsAsFactors=FALSE)
Here's an other example with different column types :
df <- data.frame(Doubles=double(),
Ints=integer(),
Factors=factor(),
Logicals=logical(),
Characters=character(),
stringsAsFactors=FALSE)
str(df)
> str(df)
'data.frame': 0 obs. of 5 variables:
$ Doubles : num
$ Ints : int
$ Factors : Factor w/ 0 levels:
$ Logicals : logi
$ Characters: chr
N.B. :
Initializing a data.frame
with an empty column of the wrong type does not prevent further additions of rows having columns of different types.
This method is just a bit safer in the sense that you'll have the correct column types from the beginning, hence if your code relies on some column type checking, it will work even with a data.frame
with zero rows.
Initializing an empty DataFrame and appending rows
df.concat()
over an array of dataframes is probably the way to go, especially for clean CSVs. But in case you suspect your CSVs are either dirty or could get recognized by read_csv()
with mixed types between files, you may want to explicity create each dataframe in a loop.
You can initialize a dataframe for the first file, and then each subsequent file start with an empty dataframe based on the first.
df2 = pd.DataFrame(data=None, columns=df1.columns,index=df1.index)
This takes the structure of dataframe df1
but no data, and create df2
. If you want to force data type on columns, then you can do it to df1
when it is created, before its structure is copied.
more details
Create empty data frame with column names by assigning a string vector?
How about:
df <- data.frame(matrix(ncol = 3, nrow = 0))
x <- c("name", "age", "gender")
colnames(df) <- x
To do all these operations in one-liner:
setNames(data.frame(matrix(ncol = 3, nrow = 0)), c("name", "age", "gender"))
#[1] name age gender
#<0 rows> (or 0-length row.names)
Or
data.frame(matrix(ncol=3,nrow=0, dimnames=list(NULL, c("name", "age", "gender"))))
How do I initialize an empty data frame *with a Date column* in R?
The issue is more likely due to concatenation as vector
cannot have multiple class
. We can pass a data.frame
rbind(test, setNames(data.frame(as.Date("2020-01-01"), 123, 456),
names(test)))
-output
date var1 var2
1 2020-01-01 123 456
How to make pandas read empty data file as an empty dataframe?
You can catch exception and create a empty data frame
import pandas as pd
zero = pd.DataFrame()
fh = open('/tmp/test.txt','r')
fh.close()
try:
newzero = pd.read_csv('test.txt',header=None)
except pd.errors.EmptyDataError:
newzero = pd.DataFrame()
Related Topics
Saving Output of Confusionmatrix as a .Csv Table
Changing from Upper to Lower Case in Several Data Frames
Find All Combinations of a Set of Numbers That Add Up to a Certain Total
Loop Through Data Frame and Variable Names
How to Dplyr Rename a Column, by Column Index
Remove Unwanted Symbols from Expression Function - R
Creating Grouped Bar-Plot of Multi-Column Data in R
Calculate Row Means on Subset of Columns
Break Dataframe into Smaller Dataframe'S and Save Them
Combine Two Lists in a Dataframe in R
Conditionally Remove Rows from a Database Using R
Create Counter Within Consecutive Runs of Values
Convert Categorical Variables to Numeric in R
Calculate the Area Under a Curve
How to Add a Diagonal Line to a Plot
Reorder Levels of a Factor Without Changing Order of Values
Looping Over a Date or Posixct Object Results in a Numeric Iterator