Calculate Readmission Rate

Calculating readmission rate

You can try this one also ( Don't know why upper one was giving false readmission flags for me):
After sorting on visit_start_date

visits_pandas_df.groupby('PatientId').apply(lambda x: (((x['visit_start_date'].shift(-1)-x['visit_end_date']).dt.days.shift(1).le(30)) ).astype(int)).values

Visits having only difference of one day are not counted in readmissions. So you will also need to check in your logic.

readmission rates in R

OK, I've got an answer for you that uses the dplyr package. I'll try to explain what's going on but you might have to read up a little bit. If you don't know the pipe-operator (%>%) - just read it as a "then".

# we need 2 libraries
library(tidyverse) # this provides several useful packages
library(lubridate) # this lets you deal more easily with dates

# first, we create a table that contains the csn of the relevant cases
# and the days since last admission

parkDf <- df %>% # we take our data frame
    mutate(admit_date = as_date(admit_date)) %>% # turn the dates into date format
    filter(encounter_type == "Inpatient") %>% # filter out cases with "Inpatient"
    arrange(mrn, admit_date) %>% # sort them first by mrn and then by admit_date
    group_by(mrn) %>% # group them by mrn so we can for each patient...
    mutate(daysSinceLastAdmit = admit_date - lag(admit_date)) %>% # ...get the days since last admit
    mutate(daysSinceLastAdmit = as.integer(daysSinceLastAdmit)) %>% # turn this into an integer
    ungroup() %>% # ungroup it (must be done - don't ask)
    select(csn, daysSinceLastAdmit) # and keep only these two columns

# now we left-join this to our original dataframe
df %>%
    left_join(parkDf, by = "csn") %>%
    mutate(readmission_30day = if_else(is.na(daysSinceLastAdmit), "no", "yes")) %>% # create the wanted variable
    select(-daysSinceLastAdmit) # and remove the unwanted one

You can read more about the used functions if you google "R for data science" or "r4ds" - this is a book written by Hadley Wickham which is a great author. If you don't know what a left join is, just google "sql left join". It's basically: take what's on the right (our newly created dataframe), add its information to what's on the left (original dataframe) and do so by matching a column that exists in both dataframes (by).

Hope this helps.

How to calculate monthly 30-days readmission rate from historical inpatient data in SQL?

Looks, like I am a bit late, but nonetheless - here is my suggestion:

WITH mnths AS (
  select cast(1 as int) m 
  UNION ALL select m+1 FROM mnths WHERE m<60 
), admdis AS (
  SELECT *, 12*(year(admdate)-2014)+month(admdate)a, 12*(year(disdate)-2014)+month(disdate) d 
  FROM tbl t
  WHERE exists (SELECT 1 FROM tbl WHERE id=t.id AND disdate>DATEADD(day, -30, t.admdate) and sn<t.sn) 
)
SELECT (m-1)/12+2014 yr, 1+(m-1)%12 mn,m, count(id) readm 
FROM mnths LEFT JOIN admdis ON a=m
GROUP BY m

I set up a number table mnths to generate a regular grid of months for which I am then counting the readmissions. The CTE admdis only lists those patient admissions that happened less than 30 days after a previous discharge.

A demo can be found here: https://rextester.com/TLM57882

Edit:

When looking at forpas's solution I realised that I did not calculate the actual readmission rate. Here is a modified version that does exactly that:

WITH mnths AS (
  select cast(1 as int) m 
  UNION ALL select m+1 FROM mnths WHERE m<60 
), admdis AS (
  SELECT *, 12*(year(admdate)-2014)+month(admdate)a, 12*(year(disdate)-2014)+month(disdate) d,
        (SELECT 1 FROM tbl WHERE id=t.id AND disdate>DATEADD(day, -30, t.admdate) and sn<t.sn) readm
  FROM tbl t
)

SELECT (m-1)/12+2014 yr, 1+(m-1)%12 mn, count(id) totl ,count(readm) readm, 
       case when count(id)>0 THEN (0.+count(readm))/ count(id) ELSE 0 END rate
FROM mnths LEFT JOIN admdis ON a=m
GROUP BY m

When calculating the rate I had to make sure that I did not "divide by zero", so , in those months where no admissions happened the rate is assumed to be zero (but not actually calculated).

demo: https://rextester.com/NFCXQ24711

And here is the generated result:

yr    mn totl readm rate
2014    1   3   1   0.333333333333
2014    2   0   0   0.000000000000
2014    3   0   0   0.000000000000
2014    4   0   0   0.000000000000
2014    5   0   0   0.000000000000
2014    6   0   0   0.000000000000
2014    7   0   0   0.000000000000
2014    8   0   0   0.000000000000
2014    9   0   0   0.000000000000
2014    10  0   0   0.000000000000
2014    11  0   0   0.000000000000
2014    12  0   0   0.000000000000
2015    1   0   0   0.000000000000
2015    2   0   0   0.000000000000
2015    3   0   0   0.000000000000
2015    4   0   0   0.000000000000
2015    5   0   0   0.000000000000
2015    6   0   0   0.000000000000
2015    7   0   0   0.000000000000
2015    8   0   0   0.000000000000
2015    9   0   0   0.000000000000
2015    10  0   0   0.000000000000
2015    11  0   0   0.000000000000
2015    12  0   0   0.000000000000
2016    1   0   0   0.000000000000
2016    2   0   0   0.000000000000
2016    3   0   0   0.000000000000
2016    4   0   0   0.000000000000
2016    5   0   0   0.000000000000
2016    6   0   0   0.000000000000
2016    7   0   0   0.000000000000
2016    8   0   0   0.000000000000
2016    9   0   0   0.000000000000
2016    10  0   0   0.000000000000
2016    11  0   0   0.000000000000
2016    12  1   0   0.000000000000
2017    1   0   0   0.000000000000
2017    2   0   0   0.000000000000
2017    3   0   0   0.000000000000
2017    4   0   0   0.000000000000
2017    5   0   0   0.000000000000
2017    6   1   0   0.000000000000
2017    7   0   0   0.000000000000
2017    8   0   0   0.000000000000
2017    9   0   0   0.000000000000
2017    10  0   0   0.000000000000
2017    11  0   0   0.000000000000
2017    12  0   0   0.000000000000
2018    1   1   0   0.000000000000
2018    2   1   1   1.000000000000
2018    3   0   0   0.000000000000
2018    4   0   0   0.000000000000
2018    5   0   0   0.000000000000
2018    6   0   0   0.000000000000
2018    7   0   0   0.000000000000
2018    8   0   0   0.000000000000
2018    9   0   0   0.000000000000
2018    10  1   0   0.000000000000
2018    11  0   0   0.000000000000
2018    12  0   0   0.000000000000

The column totl contains the total admissions in that period, readm the number of readmissions and rate is the ratio of readm/totl (or 0 in those cases where totl is 0).

How to mark episodes in which patient will be readmitted in 30-days?

would go with something like this:

require(tidyverse)
df %>% 
  arrange(patientPersonalNumber, admission.date) %>% 
  group_by(patientPersonalNumber) %>% 
  mutate(re.admin = (lag(discharge.date) + 30) >= admission.date) %>% 
  mutate(re.admin = ifelse(is.na(re.admin), FALSE, re.admin ))

# A tibble: 14 x 4
# Groups:   patientPersonalNumber [10]
   patientPersonalNumber admission.date discharge.date re.admin
   <chr>                 <date>         <date>         <lgl>   
 1 001                   2013-06-23     2013-06-25     FALSE   
 2 002                   2013-06-30     2014-07-03     FALSE   
 3 004                   2013-07-22     2014-08-01     FALSE   
 4 004                   2014-07-12     2014-07-17     TRUE    
 5 005                   2013-06-24     2013-06-30     FALSE   
 6 005                   2013-07-02     2013-07-05     TRUE    
 7 005                   2013-07-09     2013-07-12     TRUE    
 8 005                   2013-09-08     2013-10-12     FALSE   
 9 006                   2013-06-28     2013-06-30     FALSE   
10 007                   2013-06-29     2013-07-02     FALSE   
11 008                   2013-06-23     2013-06-29     FALSE   
12 009                   2013-06-24     2013-06-29     FALSE   
13 010                   2013-06-24     2013-06-27     FALSE   
14 011                   2013-06-24     2013-06-28     FALSE

Creating a dummy variable in R that indicates whether an ID is subsequently duplicated

We could use duplicated

df1$ReEnters <- +(duplicated(df1$ID, fromLast = TRUE))

-output

df1
#  ID Year ReEnters
#1 A1 2007        0
#2 B3 2007        1
#3 B3 2009        0
#4 C6 2000        1
#5 C6 2007        1
#6 C6 2010        0

data

df1 <- structure(list(ID = c("A1", "B3", "B3", "C6", "C6", "C6"), Year = c(2007L, 
2007L, 2009L, 2000L, 2007L, 2010L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Program calculating tax rate per line item equaling into ExclVAT

There is a bit of a problem here with rounding, no combination of taxrates actually gives us exactly the right answer.

I've updated the code to find the combination that gives us the most accurate number that can be achieved:

from itertools import product

# get all possible tax rate combinations
x = [0.00, 0.09, 0.21]
combinations = np.array(list(product(*[x]*10)))

# get amount columns
amounts = dfcalc.filter(like='line amount')

# calculate amounts excluding VAT for each row for each tax rate combination
exclvat = amounts.fillna(0).dot((1 + combinations.T)**-1)

# for each row find the combination that gives amounts excluding VAT
# that is equal to the value in ExclVAT column for that row
ix = np.abs(exclvat.sub(dfcalc['ExclVAT'].squeeze(), 0)).idxmin(1)
taxrates = np.where(amounts.notna(), combinations[ix], np.nan)

# subtract tax from line amounts
dfcalc[amounts.columns] /= (1 + taxrates)
dfcalc['line amount sum'] = dfcalc.filter(like='line amount').sum(1)
dfcalc.T

Output:

                         0
line amount#1     4.917355
line amount#2     4.917355
line amount#3    14.449541
line amount#4    14.449541
line amount#5     3.223140
line amount#6     2.396694
line amount#7          NaN
line amount#8          NaN
line amount#9          NaN
line amount#10         NaN
BTW               5.850000
ExclVAT          44.350000
Totaal           50.200000
line amount sum  44.353628