Melt/Reshape in Excel Using Vba

melt / reshape in excel using VBA?

I've got two posts, with usable code and downloadable workbook, on doing this in Excel/VBA on my blog:

http://yoursumbuddy.com/data-normalizer

http://yoursumbuddy.com/data-normalizer-the-sql/

Here's the code:

'Arguments
'List: The range to be normalized.
'RepeatingColsCount: The number of columns, starting with the leftmost,
' whose headings remain the same.
'NormalizedColHeader: The column header for the rolled-up category.
'DataColHeader: The column header for the normalized data.
'NewWorkbook: Put the sheet with the data in a new workbook?
'
'NOTE: The data must be in a contiguous range and the
'columns that will be repeated must be to the left,
'with the columns to be normalized to the right.

Sub NormalizeList(List As Excel.Range, RepeatingColsCount As Long, _
NormalizedColHeader As String, DataColHeader As String, _
Optional NewWorkbook As Boolean = False)

Dim FirstNormalizingCol As Long, NormalizingColsCount As Long
Dim ColsToRepeat As Excel.Range, ColsToNormalize As Excel.Range
Dim NormalizedRowsCount As Long
Dim RepeatingList() As String
Dim NormalizedList() As Variant
Dim ListIndex As Long, i As Long, j As Long
Dim wbSource As Excel.Workbook, wbTarget As Excel.Workbook
Dim wsTarget As Excel.Worksheet

With List
'If the normalized list won't fit, you must quit.
If .Rows.Count * (.Columns.Count - RepeatingColsCount) > .Parent.Rows.Count Then
MsgBox "The normalized list will be too many rows.", _
vbExclamation + vbOKOnly, "Sorry"
Exit Sub
End If

'You have the range to be normalized and the count of leftmost rows to be repeated.
'This section uses those arguments to set the two ranges to parse
'and the two corresponding arrays to fill
FirstNormalizingCol = RepeatingColsCount + 1
NormalizingColsCount = .Columns.Count - RepeatingColsCount
Set ColsToRepeat = .Cells(1).Resize(.Rows.Count, RepeatingColsCount)
Set ColsToNormalize = .Cells(1, FirstNormalizingCol).Resize(.Rows.Count, NormalizingColsCount)
NormalizedRowsCount = ColsToNormalize.Columns.Count * .Rows.Count
ReDim RepeatingList(1 To NormalizedRowsCount, 1 To RepeatingColsCount)
ReDim NormalizedList(1 To NormalizedRowsCount, 1 To 2)
End With

'Fill in every i elements of the repeating array with the repeating row labels.
For i = 1 To NormalizedRowsCount Step NormalizingColsCount
ListIndex = ListIndex + 1
For j = 1 To RepeatingColsCount
RepeatingList(i, j) = List.Cells(ListIndex, j).Value2
Next j
Next i

'We stepped over most rows above, so fill in other repeating array elements.
For i = 1 To NormalizedRowsCount
For j = 1 To RepeatingColsCount
If RepeatingList(i, j) = "" Then
RepeatingList(i, j) = RepeatingList(i - 1, j)
End If
Next j
Next i

'Fill in each element of the first dimension of the normalizing array
'with the former column header (which is now another row label) and the data.
With ColsToNormalize
For i = 1 To .Rows.Count
For j = 1 To .Columns.Count
NormalizedList(((i - 1) * NormalizingColsCount) + j, 1) = .Cells(1, j)
NormalizedList(((i - 1) * NormalizingColsCount) + j, 2) = .Cells(i, j)
Next j
Next i
End With

'Put the normal data in the same workbook, or a new one.
If NewWorkbook Then
Set wbTarget = Workbooks.Add
Set wsTarget = wbTarget.Worksheets(1)
Else
Set wbSource = List.Parent.Parent
With wbSource.Worksheets
Set wsTarget = .Add(after:=.Item(.Count))
End With
End If

With wsTarget
'Put the data from the two arrays in the new worksheet.
.Range("A1").Resize(NormalizedRowsCount, RepeatingColsCount) = RepeatingList
.Cells(1, FirstNormalizingCol).Resize(NormalizedRowsCount, 2) = NormalizedList

'At this point there will be repeated header rows, so delete all but one.
.Range("1:" & NormalizingColsCount - 1).EntireRow.Delete

'Add the headers for the new label column and the data column.
.Cells(1, FirstNormalizingCol).Value = NormalizedColHeader
.Cells(1, FirstNormalizingCol + 1).Value = DataColHeader
End With
End Sub

You’d call it like this:

Sub TestIt()
NormalizeList ActiveSheet.UsedRange, 4, "Variable", "Value", False
End Sub

Reshaping table Excel PowerQuery

Columns are harder to work with, so I'd first transform each column into a new row as a list.

ColumnsToRows =
Table.FromColumns(
{
Table.ToColumns(Source),
Table.ColumnNames(Source)
},
{"ColumnValues","ColumnName"}
)

This should give you a table as follows where each list consists of values in the corresponding column. For example, the top list is {1/1/2019,1/2/2019}. (The from columns part is to add the ColumnName column.)

| ColumnValues | ColumnName |
|--------------|------------|
| [List] | DateA |
| [List] | ValueA |
| [List] | DateB |
| [List] | ValueB |
| [List] | DateZ |
| [List] | ValueZ |

We can then filter this based on the data type in each list. To get the date rows you can write:

DataRows =
Table.SelectRows(
ColumnsToRows,
each Value.Type(List.First([ColumnValues])) = type date
)

Which gets you the following filtered table:

| ColumnValues | ColumnName |
|--------------|------------|
| [List] | DateA |
| [List] | DateB |
| [List] | DateZ |

If you expand the first column with Table.ExpandListColumn(DataRows, "ColumnValues"), then you get

| ColumnValues | ColumnName |
|--------------|------------|
| 1/1/2019 | DateA |
| 1/2/2019 | DateA |
| 1/1/2019 | DateB |
| 1/4/2019 | DateB |
| 1/4/2019 | DateZ |
| 1/5/2019 | DateZ |

The logic is analogous to filter and expand the value rows.

ValueRows =
Table.ExpandListColumn(
Table.SelectRows(
ColumnsToRows,
each Value.Type(List.First([ColumnValues])) = type number
),
"ColumnValues"
)

Which gets you a similar looking table:

| ColumnValues | ColumnName |
|--------------|------------|
| 3 | ValueA |
| 1 | ValueA |
| 6 | ValueB |
| 2 | ValueB |
| 7 | ValueZ |
| 3 | ValueZ |

Now we just need to combine together the columns we want into a single table:

Combine Columns =
Table.FromColumns(
{
DateRows[ColumnValues],
ValueRows[ColumnValues],
ValueRows[ColumnName]
},
{"Date", "Value", "Type"}
)

and then extract the text following Value in the column names.

ExtractType =
Table.TransformColumns(
CombineColumnns,
{{"Type", each Text.AfterDelimiter(_, "Value"), type text}}
)

The final table should be just as specified:

| Date     | Value | Type |
|----------|-------|------|
| 1/1/2019 | 3 | A |
| 1/2/2019 | 1 | A |
| 1/1/2019 | 6 | B |
| 1/4/2019 | 2 | B |
| 1/4/2019 | 7 | Z |
| 1/5/2019 | 3 | Z |

All in a single query, the M code looks like this:

let
Source = <Source Goes Here>,
ColumnsToRows = Table.FromColumns({Table.ToColumns(Source), Table.ColumnNames(Source)}, {"ColumnValues","ColumnName"}),
DateRows = Table.ExpandListColumn(Table.SelectRows(ColumnsToRows, each Value.Type(List.First([ColumnValues])) = type date), "ColumnValues"),
ValueRows = Table.ExpandListColumn(Table.SelectRows(ColumnsToRows, each Value.Type(List.First([ColumnValues])) = type number), "ColumnValues"),
CombineColumnns = Table.FromColumns({DateRows[ColumnValues], ValueRows[ColumnValues], ValueRows[ColumnName]},{"Date", "Value", "Type"}),
ExtractType = Table.TransformColumns(CombineColumnns, {{"Type", each Text.AfterDelimiter(_, "Value"), type text}})
in
ExtractType

How to reshape a Pivot Table?

Try:

brincando3 = df4.pivot_table('OCCUPANCY %', 'NAME', 'NEW_TIME', aggfunc='median') \
.rename_axis(columns=None)
  1. Don't use ['OCCUPANCY %'] if you don't want an outer level, prefer 'OCCUPANCY %' if you have only one variable to pivot.

  2. NEW_TIME is the column label. To remove it, use rename_axis(columns=None).

Melt function (R/reshape) delivering an error

This is what I eventually went with, should anyone else have this problem:

rawMelt <- melt(df, id.vars = c("Year", "Month")) %>%
mutate(
theSource = ifelse(grepl("test", variable), "test", "control"),
metric = ifelse(grepl("alpha", variable), "alpha", "beta"),
monthText = paste0(Year, "_", ifelse(Month < 10, "0", ""), Month)
) %>%
select(-variable)

g_maker <- function(theMetric) {
theChart <- rawMelt %>%
filter(metric == theMetric)
g <- ggplot(theChart, aes(x = as.factor(monthText), y = value, group = theSource)) +
geom_path(aes(color = theSource)) +
scale_color_manual(values = c("red", "black")) +
theme_minimal() +
xlab(NULL) +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
return(g)
}

alpha_graph <- g_maker("alpha")
beta_graph <- g_maker("beta")
alpha_graph
beta_graph

Normalizing Excel Grid Intersection data into a flat list

I've got two posts, with usable code and downloadable workbook, on doing this in Excel/VBA on my blog:

http://yoursumbuddy.com/data-normalizer

http://yoursumbuddy.com/data-normalizer-the-sql/

Here's the code:

'Arguments
'List: The range to be normalized.
'RepeatingColsCount: The number of columns, starting with the leftmost,
' whose headings remain the same.
'NormalizedColHeader: The column header for the rolled-up category.
'DataColHeader: The column header for the normalized data.
'NewWorkbook: Put the sheet with the data in a new workbook?
'
'NOTE: The data must be in a contiguous range and the
'rows that will be repeated must be to the left,
'with the rows to be normalized to the right.

Sub NormalizeList(List As Excel.Range, RepeatingColsCount As Long, _
NormalizedColHeader As String, DataColHeader As String, _
Optional NewWorkbook As Boolean = False)

Dim FirstNormalizingCol As Long, NormalizingColsCount As Long
Dim ColsToRepeat As Excel.Range, ColsToNormalize As Excel.Range
Dim NormalizedRowsCount As Long
Dim RepeatingList() As String
Dim NormalizedList() As Variant
Dim ListIndex As Long, i As Long, j As Long
Dim wbSource As Excel.Workbook, wbTarget As Excel.Workbook
Dim wsTarget As Excel.Worksheet

With List
'If the normalized list won't fit, you must quit.
If .Rows.Count * (.Columns.Count - RepeatingColsCount) > .Parent.Rows.Count Then
MsgBox "The normalized list will be too many rows.", _
vbExclamation + vbOKOnly, "Sorry"
Exit Sub
End If

'You have the range to be normalized and the count of leftmost rows to be repeated.
'This section uses those arguments to set the two ranges to parse
'and the two corresponding arrays to fill
FirstNormalizingCol = RepeatingColsCount + 1
NormalizingColsCount = .Columns.Count - RepeatingColsCount
Set ColsToRepeat = .Cells(1).Resize(.Rows.Count, RepeatingColsCount)
Set ColsToNormalize = .Cells(1, FirstNormalizingCol).Resize(.Rows.Count, NormalizingColsCount)
NormalizedRowsCount = ColsToNormalize.Columns.Count * .Rows.Count
ReDim RepeatingList(1 To NormalizedRowsCount, 1 To RepeatingColsCount)
ReDim NormalizedList(1 To NormalizedRowsCount, 1 To 2)
End With

'Fill in every i elements of the repeating array with the repeating row labels.
For i = 1 To NormalizedRowsCount Step NormalizingColsCount
ListIndex = ListIndex + 1
For j = 1 To RepeatingColsCount
RepeatingList(i, j) = List.Cells(ListIndex, j).Value2
Next j
Next i

'We stepped over most rows above, so fill in other repeating array elements.
For i = 1 To NormalizedRowsCount
For j = 1 To RepeatingColsCount
If RepeatingList(i, j) = "" Then
RepeatingList(i, j) = RepeatingList(i - 1, j)
End If
Next j
Next i

'Fill in each element of the first dimension of the normalizing array
'with the former column header (which is now another row label) and the data.
With ColsToNormalize
For i = 1 To .Rows.Count
For j = 1 To .Columns.Count
NormalizedList(((i - 1) * NormalizingColsCount) + j, 1) = .Cells(1, j)
NormalizedList(((i - 1) * NormalizingColsCount) + j, 2) = .Cells(i, j)
Next j
Next i
End With

'Put the normal data in the same workbook, or a new one.
If NewWorkbook Then
Set wbTarget = Workbooks.Add
Set wsTarget = wbTarget.Worksheets(1)
Else
Set wbSource = List.Parent.Parent
With wbSource.Worksheets
Set wsTarget = .Add(after:=.Item(.Count))
End With
End If

With wsTarget
'Put the data from the two arrays in the new worksheet.
.Range("A1").Resize(NormalizedRowsCount, RepeatingColsCount) = RepeatingList
.Cells(1, FirstNormalizingCol).Resize(NormalizedRowsCount, 2) = NormalizedList

'At this point there will be repeated header rows, so delete all but one.
.Range("1:" & NormalizingColsCount - 1).EntireRow.Delete

'Add the headers for the new label column and the data column.
.Cells(1, FirstNormalizingCol).Value = NormalizedColHeader
.Cells(1, FirstNormalizingCol + 1).Value = DataColHeader
End With
End Sub

You’d call it like this:

Sub TestIt()
NormalizeList ActiveSheet.UsedRange, 1, "Name", "Count", False
End Sub

How can I explode a single row of columns into multiple rows in Excel VBA

I've got two posts, with usable code and downloadable workbook, on doing this in Excel/VBA on my blog:

http://yoursumbuddy.com/data-normalizer

http://yoursumbuddy.com/data-normalizer-the-sql/

Here's the code:

'Arguments
'List: The range to be normalized.
'RepeatingColsCount: The number of columns, starting with the leftmost,
' whose headings remain the same.
'NormalizedColHeader: The column header for the rolled-up category.
'DataColHeader: The column header for the normalized data.
'NewWorkbook: Put the sheet with the data in a new workbook?
'
'NOTE: The data must be in a contiguous range and the
'rows that will be repeated must be to the left,
'with the rows to be normalized to the right.

Sub NormalizeList(List As Excel.Range, RepeatingColsCount As Long, _
NormalizedColHeader As String, DataColHeader As String, _
Optional NewWorkbook As Boolean = False)

Dim FirstNormalizingCol As Long, NormalizingColsCount As Long
Dim ColsToRepeat As Excel.Range, ColsToNormalize As Excel.Range
Dim NormalizedRowsCount As Long
Dim RepeatingList() As String
Dim NormalizedList() As Variant
Dim ListIndex As Long, i As Long, j As Long
Dim wbSource As Excel.Workbook, wbTarget As Excel.Workbook
Dim wsTarget As Excel.Worksheet

With List
'If the normalized list won't fit, you must quit.
If .Rows.Count * (.Columns.Count - RepeatingColsCount) > .Parent.Rows.Count Then
MsgBox "The normalized list will be too many rows.", _
vbExclamation + vbOKOnly, "Sorry"
Exit Sub
End If

'You have the range to be normalized and the count of leftmost rows to be repeated.
'This section uses those arguments to set the two ranges to parse
'and the two corresponding arrays to fill
FirstNormalizingCol = RepeatingColsCount + 1
NormalizingColsCount = .Columns.Count - RepeatingColsCount
Set ColsToRepeat = .Cells(1).Resize(.Rows.Count, RepeatingColsCount)
Set ColsToNormalize = .Cells(1, FirstNormalizingCol).Resize(.Rows.Count, NormalizingColsCount)
NormalizedRowsCount = ColsToNormalize.Columns.Count * .Rows.Count
ReDim RepeatingList(1 To NormalizedRowsCount, 1 To RepeatingColsCount)
ReDim NormalizedList(1 To NormalizedRowsCount, 1 To 2)
End With

'Fill in every i elements of the repeating array with the repeating row labels.
For i = 1 To NormalizedRowsCount Step NormalizingColsCount
ListIndex = ListIndex + 1
For j = 1 To RepeatingColsCount
RepeatingList(i, j) = List.Cells(ListIndex, j).Value2
Next j
Next i

'We stepped over most rows above, so fill in other repeating array elements.
For i = 1 To NormalizedRowsCount
For j = 1 To RepeatingColsCount
If RepeatingList(i, j) = "" Then
RepeatingList(i, j) = RepeatingList(i - 1, j)
End If
Next j
Next i

'Fill in each element of the first dimension of the normalizing array
'with the former column header (which is now another row label) and the data.
With ColsToNormalize
For i = 1 To .Rows.Count
For j = 1 To .Columns.Count
NormalizedList(((i - 1) * NormalizingColsCount) + j, 1) = .Cells(1, j)
NormalizedList(((i - 1) * NormalizingColsCount) + j, 2) = .Cells(i, j)
Next j
Next i
End With

'Put the normal data in the same workbook, or a new one.
If NewWorkbook Then
Set wbTarget = Workbooks.Add
Set wsTarget = wbTarget.Worksheets(1)
Else
Set wbSource = List.Parent.Parent
With wbSource.Worksheets
Set wsTarget = .Add(after:=.Item(.Count))
End With
End If

With wsTarget
'Put the data from the two arrays in the new worksheet.
.Range("A1").Resize(NormalizedRowsCount, RepeatingColsCount) = RepeatingList
.Cells(1, FirstNormalizingCol).Resize(NormalizedRowsCount, 2) = NormalizedList

'At this point there will be repeated header rows, so delete all but one.
.Range("1:" & NormalizingColsCount - 1).EntireRow.Delete

'Add the headers for the new label column and the data column.
.Cells(1, FirstNormalizingCol).Value = NormalizedColHeader
.Cells(1, FirstNormalizingCol + 1).Value = DataColHeader
End With
End Sub

You’d call it like this:

Sub TestIt()
NormalizeList ActiveSheet.UsedRange, 4, "Variable", "Value", False
End Sub

Does Excel/VBA optimize redundant variables away?

These helper variables will have negligible effect on the the performance. Using an enumeration would be more efficient.

Public Enum NamedArrayColumns
output = 1
Sensor_Top = 2
Age = 4
Material = 78
End Enum

Sub Foo()
Rem Some Code

For n = 1 To 1000000
arr(n, output) = arr(n, Sensor_Top) + arr(n, Age) * arr(n, Material)
Next n

End Sub


Related Topics



Leave a reply



Submit