Percentage Value with Gnu Diff

Percentage value with GNU Diff

Something like this perhaps?

Two files, A1 and A2.

$ sdiff -B -b -s A1 A2 | wc would give you how many lines differed. wc gives total, just divide.

The -b and -B are to ignore blanks and blank lines, and -s says to suppress the common lines.

I'm trying to make a function to calculate percent difference for all pair combinations within a group in a vector

Using tidyverse:

library(tidyverse)

df %>%
  group_by(grp = str_extract(Levelname, "\\w+"))%>%
  summarise(pair = combn(Levelname, 2, str_c, collapse = " - "),
            perc_diff = combn(y, 2, function(x) 200*abs(diff(x))/sum(x)),
            .groups = 'drop')

A tibble: 12 x 3
   grp   pair      perc_diff
   <chr> <chr>         <dbl>
 1 B     B 1 - B 2     45.1 
 2 B     B 1 - B 3     26.1 
 3 B     B 1 - B 4     15.3 
 4 B     B 2 - B 3     19.7 
 5 B     B 2 - B 4     30.4 
 6 B     B 3 - B 4     10.9 
 7 D     D 1 - D 2     39.8 
 8 D     D 1 - D 3      9.42
 9 D     D 1 - D 4     38.6 
10 D     D 2 - D 3     30.6 
11 D     D 2 - D 4      1.24
12 D     D 3 - D 4     29.4

Calculate percentage between two values

I'm not familiar with why you have (x,0) as a syntax

But I see that you have

(COUNT(ApprovalProvision.ClaimNumber),0) - (COUNT(Submitted.ClaimNumber),0)
              /COUNT(Submitted.ClaimNumber) * 100

shouldn't it be,

( COUNT(ApprovalProvision.ClaimNumber) - COUNT(Submitted.ClaimNumber) )
              /COUNT(Submitted.ClaimNumber) * 100

It looks like it would do count of ApprovalProvision.ClaimNumber - 100 since submitted.claimnumber divided by itself is 1 times 100 is 100.

The 4900 number actually sounds right. Lets take the following example, you have 2 apples, and then you're given 98 more and got 100 apples.

An increase of 98% would have meant from 2 apples, you would have 3.96 apples.

An increase of 100% means from 2 apples you end with 4 apples. An increase of 1000% means from 2 apples you end with 22 apples. So 4000% means you end with 82 apples. 5000% means from 2 apples, you reach 102 apples.

(100-2)/2*100 = 98 / 2 = 49 * 100 = 4900, so it looks like there is a 4900% increase in number of apples if you started with 2 apples and reach 100.

Now if you had flipped the 2 and 100, say starting with 100, now you have 2,
(2-100)/100*100 = -98, so a -98% change of apples, or a 98% decrease.

Hope this solves your problem.

How to check how much data has changed without storing two versions

I don't know of a generic algorithm that does this. But given your constraints then I think its pretty straightforward.

Calculate a 32-bit hash of every line in the CSV and store them in a sorted array. You then compare hashes. If 10% of your hashes have changed then likely 10% of your file has changed. ( as a percentage of lines )

If this is too large, then calculate the 32-bit hash of each csv line, but store the last 8 bits of each hash in a histogram. E.g. if you have 10 hashes where the last byte was 0, then hist[0] = 10. You can then compute roughly how many lines have changed.

This structure would be really small - like 256 32-bit numbers. ( about 1k )

This is not perfect since when a line changes it moves to another bucket, but some lines in that bucket may also come out, masking the ones that went in. This is a problem with hash collisions. As you store more bits the data structure gets larger, but more accurate since the hash collisions will be fewer.

You can increase or decrease your odds of a hash collision by increasing the number of hash bits you use in your histogram. For example if you did this using the lower 12 bits of each hash, your hash collisions would be many fewer - the data structure could be 4k 32-bit numbers, or 16k.

calculate percentage difference - python

You are loosing precision when performing (val_2)/val_1 so convert either one of them to float to get the end result as floats and then convert the result to int

values = [0.11889, 0.07485, 0.01070, 0.03076, 0.01606]
values = [int(round(i*100)) for i in values]

conversion_values = []
for x in range(1, len(values), 1):
    val_1 = values[x-1]
    if val_1 == 0.0: #Check if val_1 is 0.
        conversion_values.append('-')
    else:
        val_2 = values[x]
        diff = int(round((float(val_2)/val_1)*100)) # change to float -->round--> int
        conversion_values.append(diff)

conversion_values

Output:

[58, 14, 300, 67]

Kusto query to get percentage value of events over time

There are a couple of ways to achieve this, first, calculate the hourly avg as an additional column then calculate the diffs from the hourly average:

let minuteValues = customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
| extend Day = startofday(timestamp), hour =hourofday(timestamp);
let hourlyAverage = customEvents
| where name == "EventICareAbout"
| extend channel = customDimensions["ChannelName"]
| summarize events=count() by bin(timestamp, 1m), tostring(channel)
| summarize hourlyAvgEvents = avg(events) by bin(timestamp,1h), tostring(channel)
| extend Day = startofday(timestamp),hour =hourofday(timestamp);
minuteValues
| lookup hourlyAverage on hour, Day
| extend Diff = events- hourlyAvgEvents

Another option is to use the built-in Anomaly detection

How do I calculate percentages from two tables in Django

When you define a ForeignKey it creates a "reverse" field on the related object. You can name this using related_name, otherwise it defaults to <modelname>_set (modelname is lowercased). In this case, donation_set

That's probably what you were missing. The code will be something like

@property
def percent_raised(self):

    total = 0.0
    for donation in self.donation_set.all():
        total += float( donation.raised)

    return total / float( self.target_donation) * 100.0

It's more efficient in this case but much less generalizable, to calculate the sum of donations in the DB query using an aggregation function. See the cheat sheet here (third example using Avg, but in this case you'd want Sum not Avg)

percent symbol in Bash, what's it used for?

Delete the shortest match of string in $var from the beginning:

${var#string}

Delete the longest match of string in $var from the beginning:

${var##string}

Delete the shortest match of string in $var from the end:

${var%string}

Delete the longest match of string in $var from the end:

${var%%string}

Try:

var=foobarbar
echo "${var%b*r}"
> foobar
echo "${var%%b*r}"
> foo

Percentage Value with Gnu Diff