Fastest Way to Parse a Date-Time String to Class Date

Fastest way to parse a date-time string to class Date

Note that as.Date will ignore junk after the date so this takes less than 10 seconds on my not particularly fast laptop:

xx <- rep("10/17/2017 12:00:00 AM", 5000000) # test input
system.time(as.Date(xx, "%m/%d/%Y"))
## user system elapsed
## 9.57 0.20 9.82

Fastest way to parse String from java.util.Date

tl;dr

The back-port of the java.time classes takes under a microsecond to generate a String from a LocalDate in your desired pattern.

String output = myLocalDate.toString() ;  // Takes less than a microsecond.

Using this library, I would not worry about date-time strings being a bottleneck.

ThreeTen-Backport

The modern approach uses the java.time classes that supplanted the terrible old date-time classes such as Date & Calendar. For Java 6 & 7, most of that functionality is back-ported in the ThreeTen-Backport project, using nearly identical API. Add the library and import: import org.threeten.bp.*;

Your example format of YYYY-MM-DD is the default used by the LocalDate class when parsing/generating text.

Example code.

Set up a list of LocalDate objects.

long years = 1000;
LocalDate today = LocalDate.now();
LocalDate lastDate = today.plusYears( years );
int initialCapacity = ( int ) ( ( years + 1 ) * 366 );
List < LocalDate > dates = new ArrayList <>( initialCapacity );
LocalDate localDate = today;
System.out.println( "From: " + today + " to: " + lastDate );
while ( localDate.isBefore( lastDate ) ) {
dates.add( localDate );
// Setup next loop.
localDate = localDate.plusDays( 1 );
}

Run the test.

long start = System.nanoTime();

for ( LocalDate date : dates ) {
String output = date.toString(); // Generate text in standard ISO 8601 format.
}

long stop = System.nanoTime();
long elapsed = ( stop - start );
long nanosEach = elapsed / dates.size();

System.out.println( "nanosEach: " + nanosEach );

Results: Under a microsecond each

When running on a MacBook Pro (Retina, 15-inch, Late 2013), 2.3 GHz Intel Core i7, 16 GB 1600 MHz DDR3, in IntelliJ 2018.3, using Java 10.0.2 from the OpenJDK-based Zulu JVM from Azul Systems…

When running a batch of 100 years, I get about 650 nanoseconds each. That is about two/thirds of a microsecond.

When running a batch of 1,000 years, I get about 260 nanoseconds each. That is about a quarter of a microsecond.

I doubt processing date strings with this library will prove to be a bottleneck in your app’s performance.

Thread-safety

The java.time classes are designed to be inherently thread-safe, including the use of immutable objects.

You can cache a single DateTimeFormatter object, and use it repeatedly, even across threads.

Your desired format, defined by the ISO 8601 standard, is pre-defined as a constant in both java.time and in the ThreeTen-Backport library: DateTimeFormatter .ISO_LOCAL_DATE.

DateTimeFormatter f = DateTimeFormatter.ofPattern( "uuuu-MM-dd" ) ;  // Or just use the pre-defined constant for that particular pattern, `DateTimeFormatter .ISO_LOCAL_DATE`, also used by default in `LocalDate::toString`.

String output = localDate.format( f ) ;

About java.time

The java.time framework is built into Java 8 and later. These classes supplant the troublesome old legacy date-time classes such as java.util.Date, Calendar, & SimpleDateFormat.

The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

To learn more, see the Oracle Tutorial. And search Stack Overflow for many examples and explanations. Specification is JSR 310.

You may exchange java.time objects directly with your database. Use a JDBC driver compliant with JDBC 4.2 or later. No need for strings, no need for java.sql.* classes.

Where to obtain the java.time classes?

  • Java SE 8, Java SE 9, Java SE 10, Java SE 11, and later - Part of the standard Java API with a bundled implementation.

    • Java 9 adds some minor features and fixes.
  • Java SE 6 and Java SE 7
    • Most of the java.time functionality is back-ported to Java 6 & 7 in ThreeTen-Backport.
  • Android
    • Later versions of Android bundle implementations of the java.time classes.
    • For earlier Android (<26), the ThreeTenABP project adapts ThreeTen-Backport (mentioned above). See How to use ThreeTenABP….

convert string date to R Date FAST for all dates

I can get a little speedup by using the date package:

library(date)
set.seed(21)
x <- as.character(Sys.Date()-sample(40000, 1e6, TRUE))
system.time(dDate <- as.Date(x))
# user system elapsed
# 6.54 0.01 6.56
system.time(ddate <- as.Date(as.date(x,"ymd")))
# user system elapsed
# 3.42 0.22 3.64

You might want to look at the C code it uses and see if you can modify it to be faster for your specific situation.

What is the fastest they to convert string to DateTime (python)?

as you are using pandas so you can easily do this by using to_datetime() method:-

df['date']=pandas.to_datetime(df['date'])

Convert date-time string to class Date

You may be overcomplicating things, is there any reason you need the stringr package? You can use as.Date and its format argument to specify the input format of your string.

 df <- data.frame(Date = c("10/9/2009 0:00:00", "10/15/2009 0:00:00"))
as.Date(df$Date, format = "%m/%d/%Y %H:%M:%S")
# [1] "2009-10-09" "2009-10-15"

Note the Details section of ?as.Date:

Character strings are processed as far as necessary for the format specified: any trailing characters are ignored

Thus, this also works:

as.Date(df$Date, format =  "%m/%d/%Y")
# [1] "2009-10-09" "2009-10-15"

All the conversion specifications that can be used to specify the input format are found in the Details section in ?strptime. Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string.


More generally and if you need the time component as well, use as.POSIXct or strptime:

as.POSIXct(df$Date, "%m/%d/%Y %H:%M:%S")    
strptime(df$Date, "%m/%d/%Y %H:%M:%S")

I'm guessing at what your actual data might look at from the partial results you give.

how I can convert an specific character string to date time in r?

as.Date returns only the calendar date, because class Date objects are only shown as calendar dates.

in addition to POSIXct, you can also use parse_date_time from lubridate package:

library(lubridate)

parse_date_time("23/8/22 12:45", "dmy HM", tz="") #dmy - day, month, year HM - hour, minute

# Or the second function:
dmy_hm("23/8/22 12:45",tz=Sys.timezone())

Fastest way to parse a YYYYMMdd date in Java

As you see below, the performance of the date processing only is relevant when you look at millions of iterations. Instead, you should choose a solution that is easy to read and maintain.

Although you could use SimpleDateFormat, it is not reentrant so should be avoided. The best solution is to use the great Joda time classes:

private static final DateTimeFormatter DATE_FORMATTER = new DateTimeFormatterBuilder()
.appendYear(4,4).appendMonthOfYear(2).appendDayOfMonth(2).toFormatter();
...
Date date = DATE_FORMATTER.parseDateTime(dateOfBirth).toDate();

If we are talking about your math functions, the first thing to point out is that there were bugs in your math code that I've fixed. That's the problem with doing by hand. That said, the ones that process the string once will be the fastest. A quick test run shows that:

year = Integer.parseInt(dateString.substring(0, 4));
month = Integer.parseInt(dateString.substring(4, 6));
day = Integer.parseInt(dateString.substring(6));

Takes ~800ms while:

int date = Integer.parseInt(dateString);
year = date / 10000;
month = (date % 10000) / 100;
day = date % 100;
total += year + month + day;

Takes ~400ms.

However ... again... you need to take into account that this is after 10 million iterations. This is a perfect example of premature optimization. I'd choose the one that is the most readable and the easiest to maintain. That's why the Joda time answer is the best.

Is there a fast parser for date

Given

## the following two (here three) lines are all of fasttime's R/time.R
fastPOSIXct <- function(x, tz=NULL, required.components = 3L)
.POSIXct(if (is.character(x)) .Call("parse_ts", x, required.components)
else .Call("parse_ts", as.character(x), required.components), tz)

hence

## so we suggest to just use it, and convert later
fastDate <- function(x, tz=NULL)
as.Date(fastPOSIXct(x, tz=tz))

which at least beats as.Date():

R> library(microbenchmark)
R> library(fasttime)
R> d <- rep("2010-11-12", n=1e4)
R> microbenchmark(fastDate(d), as.Date(d), times=100)
Unit: microseconds
expr min lq mean median uq max neval cld
fastDate(d) 47.469 48.8605 54.3232 55.7270 57.1675 104.447 100 a
as.Date(d) 77.194 79.4120 85.3020 85.2585 87.3135 121.979 100 b

R>

If you wanted to go super fast, you could start with tparse.c to create the date-only subset you want.

Python date string to date object

You can use strptime in the datetime package of Python:

>>> import datetime
>>> datetime.datetime.strptime('24052010', "%d%m%Y").date()
datetime.date(2010, 5, 24)

Easiest way to parse date in String format to GregorianCalendar

DateFormat format = new SimpleDateFormat( "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" )
Date date = format.parse( "2011-10-07T08:51:52.006Z" );
Calendar calendar = new GregorianCalendar();

calendar.setTime( date );


Related Topics



Leave a reply



Submit