How to Produce Time Series for Each Row of a Data Frame with an Unnamed First Column

How to produce time series for each row of a data frame with an unnamed first column

Not sure what you need, maybe something like this?

library(reshape2)
library(ggplot2)
df$metadata <- row.names(df)
df <- melt(df, "metadata")
ggplot(df, aes(variable, value, group = metadata, color = metadata)) +
geom_line()

Sample Image

Find first column for each row in a data set that has a specific value

Here is one approach with tidyverse. First, would use pivot_longer to put your dates into columns. Use mutate to convert the date character to Date format, noting the column names are improper (depending on how your data frame looks, this may be different than what I have). You can filter rows that have values of interest (3 or 4), ensure dates are arranged in order, and then select the first row for each Country. Finally, you can group_by(date) and determine the number of countries found for each date.

library(tidyverse)

df %>%
pivot_longer(cols = -Country, names_to = "date") %>%
mutate(date = as.Date(date, format = "X%m.%d.%Y")) %>%
filter(value == 3 | value == 4) %>%
arrange(date) %>%
group_by(Country) %>%
slice(1) %>%
group_by(date) %>%
summarise(num_countries = n())

Output

# A tibble: 2 x 2
date num_countries
<date> <int>
1 2020-02-01 2
2 2020-04-01 1

How to get the first row of a dataframe with names when there is only one column?

When extracting a dataframe using [ by default drop argument is TRUE.

From ?Extract

drop - If TRUE the result is coerced to the lowest possible dimension.

Also you can check the class for both the dataframe after extracting the row.

df1 = data.frame(A=c(12,13), B=c(24,25))
df2 = data.frame(A=c(12,13))

class(df1[1, ])
#[1] "data.frame"
class(df2[1, ])
#[1] "numeric"

As we can see df2 is coerced to a vector. Using drop = FALSE will keep it as dataframe and not drop the dimensions.

df2[1,, drop = FALSE]
# A
#1 12

class(df[1,, drop = FALSE])
#[1] "data.frame"

Make first row and first column as column header and row index respectively in a dataframe

Use DataFrame.rename_axis:

a.columns = a.iloc[0]
a = a[1:]
a.set_index(a.columns[0]).rename_axis(index=None, columns=None)

Or for one line solution use DataFrame.set_axis for set columns names:

a = (a.iloc[1:]
.set_axis(a.iloc[0], axis=1)
.set_index(a.iat[0,0])
.rename_axis(index=None, columns=None))
print (a)
A B C
T1 33 2.1 22
T2 52 2.1 23
T3 55 2.1 14
T4 21 2.1 19

Why pandas dataframe displaying column names as 'unnamed: 1', unnamed: 2',.......'unnamed: n'

Are you sure you have the right encoding?

I see your data file starts with ÿþ when read in a cp1252 encoding. That looks like a UTF16 byte order mark (BOM.) Wikipedia has a table of these, and if you look at that table, you'll see it's a match with UTF16-LE (little endian.)

Once you figure out the right encoding, you can tell Pandas what encoding to use by calling pd.read_csv(..., encoding='...'). To figure out what to put in the encoding field, you can consult this table. If you want UTF16-LE, that's 'utf_16_le'.

More information:

Pandas docs on read_csv

What is this "ÿþA"? This is the same question, but about R instead of Python.

How to get rid of Unnamed: 0 column in a pandas DataFrame read in from CSV file?

It's the index column, pass pd.to_csv(..., index=False) to not write out an unnamed index column in the first place, see the to_csv() docs.

Example:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
Unnamed: 0 a b c
0 0 0.109066 -1.112704 -0.545209
1 1 0.447114 1.525341 0.317252
2 2 0.507495 0.137863 0.886283
3 3 1.452867 1.888363 1.168101
4 4 0.901371 -0.704805 0.088335

compare with:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335

You could also optionally tell read_csv that the first column is the index column by passing index_col=0:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
a b c
0 0.109066 -1.112704 -0.545209
1 0.447114 1.525341 0.317252
2 0.507495 0.137863 0.886283
3 1.452867 1.888363 1.168101
4 0.901371 -0.704805 0.088335


Related Topics



Leave a reply



Submit