How to produce time series for each row of a data frame with an unnamed first column
Not sure what you need, maybe something like this?
library(reshape2)
library(ggplot2)
df$metadata <- row.names(df)
df <- melt(df, "metadata")
ggplot(df, aes(variable, value, group = metadata, color = metadata)) +
geom_line()
Find first column for each row in a data set that has a specific value
Here is one approach with tidyverse
. First, would use pivot_longer
to put your dates into columns. Use mutate
to convert the date character to Date format, noting the column names are improper (depending on how your data frame looks, this may be different than what I have). You can filter
rows that have values of interest (3 or 4), ensure dates are arranged in order, and then select the first row for each Country
. Finally, you can group_by(date)
and determine the number of countries found for each date.
library(tidyverse)
df %>%
pivot_longer(cols = -Country, names_to = "date") %>%
mutate(date = as.Date(date, format = "X%m.%d.%Y")) %>%
filter(value == 3 | value == 4) %>%
arrange(date) %>%
group_by(Country) %>%
slice(1) %>%
group_by(date) %>%
summarise(num_countries = n())
Output
# A tibble: 2 x 2
date num_countries
<date> <int>
1 2020-02-01 2
2 2020-04-01 1
How to get the first row of a dataframe with names when there is only one column?
When extracting a dataframe using [
by default drop
argument is TRUE
.
From ?Extract
drop - If TRUE the result is coerced to the lowest possible dimension.
Also you can check the class for both the dataframe after extracting the row.
df1 = data.frame(A=c(12,13), B=c(24,25))
df2 = data.frame(A=c(12,13))
class(df1[1, ])
#[1] "data.frame"
class(df2[1, ])
#[1] "numeric"
As we can see df2
is coerced to a vector. Using drop = FALSE
will keep it as dataframe and not drop the dimensions.
df2[1,, drop = FALSE]
# A
#1 12
class(df[1,, drop = FALSE])
#[1] "data.frame"
Make first row and first column as column header and row index respectively in a dataframe
Use DataFrame.rename_axis
:
a.columns = a.iloc[0]
a = a[1:]
a.set_index(a.columns[0]).rename_axis(index=None, columns=None)
Or for one line solution use DataFrame.set_axis
for set columns names:
a = (a.iloc[1:]
.set_axis(a.iloc[0], axis=1)
.set_index(a.iat[0,0])
.rename_axis(index=None, columns=None))
print (a)
A B C
T1 33 2.1 22
T2 52 2.1 23
T3 55 2.1 14
T4 21 2.1 19
Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......'unnamed: n'
Are you sure you have the right encoding?
I see your data file starts with ÿþ
when read in a cp1252 encoding. That looks like a UTF16 byte order mark (BOM.) Wikipedia has a table of these, and if you look at that table, you'll see it's a match with UTF16-LE (little endian.)
Once you figure out the right encoding, you can tell Pandas what encoding to use by calling pd.read_csv(..., encoding='...')
. To figure out what to put in the encoding field, you can consult this table. If you want UTF16-LE, that's 'utf_16_le'
.
More information:
Pandas docs on read_csv
What is this "ÿþA"? This is the same question, but about R instead of Python.
How to get rid of Unnamed: 0 column in a pandas DataFrame read in from CSV file?
It's the index column, pass pd.to_csv(..., index=False)
to not write out an unnamed index column in the first place, see the to_csv()
docs.
Example:
In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))
Out[37]:
Unnamed: 0 a b c
0 0 0.109066 -1.112704 -0.545209
1 1 0.447114 1.525341 0.317252
2 2 0.507495 0.137863 0.886283
3 3 1.452867 1.888363 1.168101
4 4 0.901371 -0.704805 0.088335
compare with:
In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))
Out[38]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335
You could also optionally tell read_csv
that the first column is the index column by passing index_col=0
:
In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)
Out[40]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335
Related Topics
Rank a Vector Based on Order and Replace Ties with Their Average
Subset Dataframe Such That All Values in Each Row Are Less Than a Certain Value
Options for Deploying R Models in Production
Clustering List for Hclust Function
Plot a Legend and Well-Spaced Universal Y-Axis and Main Titles in Grid.Arrange
Specifying Column Types When Importing Xlsx Data to R with Package Readxl
Find Location of Current .R File
Adding Vertical Line in Plot Ggplot
Add New Variable to List of Data Frames with Purrr and Mutate() from Dplyr
R Partial Reshape Data from Long to Wide
R: Arranging Multiple Plots Together Using Gridextra
Xgboost in R: How Does Xgb.Cv Pass the Optimal Parameters into Xgb.Train
How to Get Rstudio to Automatically Compile R Markdown Vignettes
Make Dataframe of Top N Frequent Terms for Multiple Corpora Using Tm Package in R