Obtaining Number of Block Parameters

Obtaining number of block parameters

When you materialize a block with &, it becomes a Proc object, which has an arity method. Just be careful - it returns the one's complement if the proc takes a *splat arg.

def foobar(x, y, z, &block)
p block.arity
end

(Answer via "The Ruby Programming Language" book.)

Ruby: how to check how many parameters a block accepts?

You can use the Proc#arity method to check how many arguments the block accepts:

def foo(&block)
puts block.arity
end

foo { } # => 0
foo { |a| } # => 1
foo { |a, b| } # => 2

From the documentation:

Returns the number of arguments that would not be ignored. If the
block is declared to take no arguments, returns 0. If the block is
known to take exactly n arguments, returns n. If the block has
optional arguments, return -n-1, where n is the number of mandatory
arguments. A proc with no argument declarations is the same a block
declaring || as its arguments.

Block with two parameters

If you look at the documentation of Enumerable#find, you see that it accepts only one parameter to the block. The reason why you can send it two, is because Ruby conveniently lets you do this with blocks, based on it's "parallel assignment" structure:

[[1,2,3], [4,5,6]].each {|x,y,z| puts "#{x}#{y}#{z}"}
# 123
# 456

So basically, each yields an array element to the block, and because Ruby block syntax allows "expanding" array elements to their components by providing a list of arguments, it works.

You can find more tricks with block arguments here.

a.combination(2) results in an array of arrays, where each of the sub array consists of 2 elements. So:

a = [1,2,3,4]
a.combination(2)
# => [[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

As a result, you are sending one array like [1,2] to find's block, and Ruby performs the parallel assignment to assign 1 to x and 2 to y.

Also see this SO question, which brings other powerful examples of parallel assignment, such as this statement:

a,(b,(c,d)) = [1,[2,[3,4]]]

Number of arguments in a ruby block

This works because Ruby supports destructuring.

Destructuring allows you to bind a set of variables to a corresponding set of values anywhere that you can normally bind a value to a single variable.

This allows the following to hold true:

arr = [1, 2]
x = arr
x == [1, 2] # true

y, z = arr
y == 1 # true
z == 2 # true

You can see from the following code that destructuring in arguments to blocks isn't unique to the built-in methods that take a block:

def my_method(arr)
yield arr
end

my_method([1, 2, 3]) {|x| puts x.inspect }
# => [1, 2, 3]
my_method([1, 2, 3]) {|x, y, z| puts x.inspect }
# => 1

Check out Destructuring with Ruby for more information.

Is there a way to know how many parameters are needed for a method?

You can use the method Method#arity:

"string".method(:strip).arity
# => 0

From the Ruby documentation:

Returns an indication of the number of arguments accepted by a method.
Returns a nonnegative integer for methods that take a fixed number of
arguments. For Ruby methods that take a variable number of arguments,
returns -n-1, where n is the number of required arguments. For methods
written in C, returns -1 if the call takes a variable number of
arguments.

So, for example:

# Variable number of arguments, one is required
def foo(a, *b); end
method(:foo).arity
# => -2

# Variable number of arguments, none required
def bar(*a); end
method(:bar).arity
# => -1

# Accepts no argument, implemented in C
"0".method(:to_f).arity
# => 0

# Variable number of arguments (0 or 1), implemented in C
"0".method(:to_i).arity
# => -1


Update I've just discovered the exitence of Method#parameters, it could be quite useful:

def foo(a, *b); end
method(:foo).parameters
# => [[:req, :a], [:rest, :b]]

Get number of parameters passed to a method?

Can you pass the parameters as a collection? This way it is unlimited and easy to use.
This is how I do it for my own projects

public void AddSomethingToDatabase(Dictionary<string, object> parameters)
{
foreach(KeyValuePair<string, object> param in parameters)
{
string paramname = param.Key;
object paramvalue= param.Value;
sp.AddParameter(paramname, paramvalue);
}
conn.Execute(...);
}

EDIT : I'd like to clarify more how I used this method in my own programs.

I specify the database procedure parameters in the method itself, and pass the parameters like you do. I do realise there is a better way, like using DTO's

public void AddSomethingToDatabase(string param1, int param2)
{
Dictionary<string, object> parameters = new Dictionary<string, object>();
parameters.Add("pID", param1);
parameters.Add("pName", param2);

ModifyDatabase(parameters, "update_myTable");
}

public void ModifyDatabase(Dictionary<string, object> parameters, string procedure)
{
// Do necessary checks on parameters here
// Check database availability
// And many other checks that would be recurring for every database transaction
// ... that's why I have them all in one place. Executing Queries is the same
// ... every time. Why would you write the error handling twice? :-)

// Loop parameters and fill procedure parameters

// Execute the lot
}

Check the total number of parameters in a PyTorch model

PyTorch doesn't have a function to calculate the total number of parameters as Keras does, but it's possible to sum the number of elements for every parameter group:

pytorch_total_params = sum(p.numel() for p in model.parameters())

If you want to calculate only the trainable parameters:

pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

Answer inspired by this answer on PyTorch Forums.

How is the number of parameters be calculated in BERT model?

Transformer Encoder-Decoder Architecture
The BERT model contains only the encoder block of the transformer architecture. Let's look at individual elements of an encoder block for BERT to visualize the number weight matrices as well as the bias vectors. The given configuration L = 12 means there will be 12 layers of self attention, H = 768 means that the embedding dimension of individual tokens will be of 768 dimensions, A = 12 means there will be 12 attention heads in one layer of self attention. The encoder block performs the following sequence of operations:

  1. The input will be the sequence of tokens as a matrix of S * d dimension. Where s is the sequence length and d is the embedding dimension. The resultant input sequence will be the sum of token embeddings, token type embeddings as well as position embedding as a d-dimensional vector for each token. In the BERT model, the first set of parameters is the vocabulary embeddings. BERT uses WordPiece[2] embeddings that has 30522 tokens. Each token is of 768 dimensions.

  2. Embedding layer normalization. One weight matrix and one bias vector.

  3. Multi-head self attention. There will be h number of heads, and for each head there will be three matrices which will correspond to query matrix, key matrix and the value matrix. The first dimension of these matrices will be the embedding dimension and the second dimension will be the embedding dimension divided by the number of attention heads. Apart from this, there will be one more matrix to transform the concatenated values generated by attention heads to the final token representation.

  4. Residual connection and layer normalization. One weight matrix and one bias vector.

  5. Position-wise feedforward network will have one hidden layer, that will correspond to two weight matrices and two bias vectors. In the paper, it is mentioned that the number of units in the hidden layer will be four times the embedding dimension.

  6. Residual connection and layer normalization. One weight matrix and one bias vector.

Let's calculate the actual number of parameters by associating the right dimensions to the weight matrices and bias vectors for the BERT base model.

Embedding Matrices:

  • Word Embedding Matrix size [Vocabulary size, embedding dimension] = [30522, 768] = 23440896
  • Position embedding matrix size, [Maximum sequence length, embedding dimension] = [512, 768] = 393216
  • Token Type Embedding matrix size [2, 768] = 1536
  • Embedding Layer Normalization, weight and Bias [768] + [768] = 1536
  • Total Embedding parameters = ≈ /strong>

Attention Head:

  • Query Weight Matrix size [768, 64] = 49152 and Bias [768] = 768

  • Key Weight Matrix size [768, 64] = 49152 and Bias [768] = 768

  • Value Weight Matrix size [768, 64] = 49152 and Bias [768] = 768

  • Total parameters for one layer attention with 12 heads = 12∗(3 ∗(49152+768)) = 1797120

  • Dense weight for projection after concatenation of heads [768, 768] = 589824 and Bias [768] = 768, (589824+768 = 590592)

  • Layer Normalization weight and Bias [768], [768] = 1536

  • Position wise feedforward network weight matrices and bias [3072, 768] = 2359296, [3072] = 3072 and [768, 3072 ] = 2359296, [768] = 768, (2359296+3072+ 2359296+768 = 4722432)

  • Layer Normalization weight and Bias [768], [768] = 1536

  • Total parameters for one complete attention layer (1797120 + 590592 + 1536 + 4722432 + 1536 = 7113216 ≈ 7/strong>)

  • Total parameters for 12 layers of attention ( ∗ = ≈ /strong>)

Output layer of BERT Encoder:

  • Dense Weight Matrix and Bias [768, 768] = 589824, [768] = 768, (589824 + 768 = 590592)

Total Parameters in ase = + + = ≈ /strong>



Related Topics



Leave a reply



Submit