How to Input Integer Value to an Array, Based Preceeding Row + Column Values

How to calculate values for a column in a row based on previous row's column's value for a PySpark Dataframe?

Tracking the previously calculated value from the same column is hard to do in spark -- I'm not saying it's impossible, and there certainly are ways (hacks) to achieve it. One way to do is using array of structs and aggregate function.

Two assumptions in your data

  • There is an ID column that has the sort order of the data - spark does not retain dataframe sorting due to its distributed nature
  • There is a grouping key for the processing to be optimized
# input data with aforementioned assumptions
data_sdf.show()

# +---+---+-------+---------+----+----+
# | gk|idx| name| dept| a| b|
# +---+---+-------+---------+----+----+
# | gk| 1| James| Sales|3000|2500|
# | gk| 2|Michael| Sales|4600|1650|
# | gk| 3| Robert| Sales|4100|1100|
# | gk| 4| Maria| Finance|3000|7000|
# | gk| 5| James| Finance|3000|5000|
# | gk| 6| Scott|Marketing|3300|4300|
# | gk| 7| Jen|Marketing|3900|3700|
# +---+---+-------+---------+----+----+
# create structs with all columns and collect it to an array
# use the array of structs to do the val calcs
# NOTE - keep the ID field at the beginning for the `array_sort` to work as reqd
arr_of_structs_sdf = data_sdf. \
withColumn('allcol_struct', func.struct(*data_sdf.columns)). \
groupBy('gk'). \
agg(func.array_sort(func.collect_list('allcol_struct')).alias('allcol_struct_arr'))

# function to create struct schema string
struct_fields = lambda x: ', '.join([str(x)+'.'+k+' as '+k for k in data_sdf.columns])

# use `aggregate` to do the val calc
arr_of_structs_sdf. \
withColumn('new_allcol_struct_arr',
func.expr('''
aggregate(slice(allcol_struct_arr, 2, size(allcol_struct_arr)),
array(struct({0}, (allcol_struct_arr[0].a+allcol_struct_arr[0].b) as val)),
(x, y) -> array_union(x,
array(struct({1}, ((y.a+y.b)-element_at(x, -1).val) as val))
)
)
'''.format(struct_fields('allcol_struct_arr[0]'), struct_fields('y'))
)
). \
selectExpr('inline(new_allcol_struct_arr)'). \
show(truncate=False)

# +---+---+-------+---------+----+----+----+
# |gk |idx|name |dept |a |b |val |
# +---+---+-------+---------+----+----+----+
# |gk |1 |James |Sales |3000|2500|5500|
# |gk |2 |Michael|Sales |4600|1650|750 |
# |gk |3 |Robert |Sales |4100|1100|4450|
# |gk |4 |Maria |Finance |3000|7000|5550|
# |gk |5 |James |Finance |3000|5000|2450|
# |gk |6 |Scott |Marketing|3300|4300|5150|
# |gk |7 |Jen |Marketing|3900|3700|2450|
# +---+---+-------+---------+----+----+----+

User-input values in first column with squares displayed in second column for multi-dimensional arrays

Actually, "invalid operands to binary (have 'int' and 'int*')" caused by expression square[row]*square[row] where square[row] using only one index, so type of int* is trying to be squared. The second index required to have type int.

But ...

Why do you use temp as index?

Instead of

temp = square[row]*square[row];
printf("%5d %15d\n", square[row][temp]);

should be

square[row][0] = square[row][1]*square[row][1];
printf("%5d %15d\n", square[row][1], square[row][0]);

And pay attention, that in my example square[row][0] is uses (with [0]), but actually I suppose that you intended to use square[row][0] for VALUE and square[row][1] for SQUARED, but you forgot about the fact that in C indexing starts from 0. So to have last part of your code working properly change "input part" and bring the output of the header to the table in place immediately before the "output part"

#include <stdio.h>
#define ROWS 5 //number of defined rows
#define COLS 2 //number of defined columns

int main(void)
{
int square[ROWS][COLS];
int row, col, temp;

// "input part"
for(row=0; row<ROWS; row++)
{
// TODO: type this again after reading my aswer above
}
// "output part"
printf("VALUE SQUARED\n");
for(row=0; row<ROWS; row++)
{
for(col=0; col<COLS; col++)
printf("%10d", square[row][col]);
printf("\n");
}

return 0;
}

Sorting 2D array of integers by column

If I understood correctly :

IN :
{{124, 188, 24, 254, 339},
{0, 7, 77, 145, 159},
{206, 340, 280, 523, 433},
{310, 265, 151, 411, 398},
{24, 104, 0, 183, 198}}

OUT :
{{1, 1, 4, 1, 1}
{4, 4, 0, 4, 4}
{0, 0, 1, 0, 0}
{2, 3, 3, 3, 3}
{3, 2, 2, 2, 2}

Here's the code :

public static int[][] createArray(int[][] a) {
int[][] nA = new int[a.length][a[0].length];
int[] col = new int[a.length];
int minIndex = -1;
for (int i = 0; i < a.length; i++) {
// First get the col out
for (int j = 0; j < a[0].length; j++) {
col[j] = a[j][i];
}
// Loop through the col
for (int k = 0; k < a[0].length; k++) {
int min = Integer.MAX_VALUE;
// Loop through the remaining numbers of the col
for (int j = 0; j < col.length; j++) {
// Find the remaining lowest number
if (min > col[j]) {
min = col[j];
minIndex = j;
}
}
// Delete the number from the array
col[minIndex] = Integer.MAX_VALUE;
// Set this number in the final array
nA[k][i] = minIndex;
}
}
return nA;
}

There might be an easier way, but it works !

JAVA. My nested for loop which I made to insert integers in array from another array skips the whole part except the first and the last values

I think you intended this:

public void setCells(int rowsArray[ ], int columnsArray[ ], int valuesArray[ ]) {
for (int i= 0; i< rowsArray.length; i++) {
setValue(rowsArray[i], columnsArray[i], valuesArray[i]);
}
}

Also in your test code the rowsArray and valuesArray are both filled with zeroes since you did not set any values.

I would expect test code like this:

int[] valuesArray = new int[] {12, 222, 31, 45, 42, 99};
int[] rowsArray = new int[] {5, 4, 3, 2, 1, 0};
int[] columnsArray = new int[] {5, 4, 3, 2, 1, 0};
Grid lol = new Grid(11);
lol.setCells(rowsArray , columnsArray, valuesArray);
lol.getValue(5,5);
lol.getValue(4,4);
lol.getValue(3,3);
lol.getValue(2,2);
lol.getValue(1,1);
lol.getValue(0,0);

BTW a nicer way to model this would be using a separate class to represent one set of user input:

class UserInput {
private final int row;
private final int col;
private final int value;

UserInput (int row, int col, int value) {
this.row = row;
this.col = col;
this.val = val;
}

// getters and setters
}

Every time the user enters a new row/col/val you create a new UserInput instance and add it to a list:

List<UserInput> inputs = new UserInput<>();    

// for each set of user input do:

inputs.add(new UserInput(row, col, val));

Then to set the entries in the grid:

for (UserInput i: inputs) {                  
setValue(i.getRow(), i.getCol(), i.getValue());
}

You can remove the 2-args constructor since your grid must always be square.

Your code is pretty flawed. I think isValid should probably also check for the grid size. What is the setValue method supposed to do? Increase both the row and column index if a cell is already filled? And what if that cell is also already filled? You don't need the second !isValid check in the else (you just checked for isValid in the if).

Please take some more time to think about the logic of your code.

Next time, post the full code straight away to save us all some time.

Pushing user input numbers into a 2D Array to form 4 columns with 3 rows

Please try this code:

public static void main(String[] args) {
//set up the array and assign variable name and table size
int[][] startNum = new int[3][4];

//set user input variable for the array
Scanner userInput = new Scanner(System.in);

for (int i = 0; i < startNum[0].length; i++) {
System.out.print("please enter your value: ");
int inputValue = userInput.nextInt();
startNum[0][i] = inputValue;
}

for (int i = 1; i < startNum.length; i++) {
for (int j = 0; j < startNum[0].length; j++) {
startNum[i][j] = (i + 1) * startNum[0][j];
}
}
}

In the first loop you are getting the values from user and setting first row of 2d array.

In the second for loops(2 fors) you are setting the values for 2nd and 3rd loop for each column from first row. That's why first for loop is starting from i=1 and second for loop is starting from j=1.



Related Topics



Leave a reply



Submit