Filter a Content File to Table

filter a content file to table

Try this:

BEGIN {
    # set records separated by empty lines
    RS=""
    # set fields separated by newline, each record has 3 fields
    FS="\n"
}
{
    # remove undesired parts of every first line of a record
    sub("the course of ", "", $1)
    sub(" is :", "", $1)
    sub("on ", "", $1)
    # now store the rest in time and course
    time=$1
    course=$1
    # remove time from string to extract the course title
    sub("^[^ ]* ", "", course)
    # remove course title to retrieve time from string
    sub(course, "", time)
    # get theory info from second line per record
    sub("course:theory:", "", $2)
    # get application info from third line
    sub("course:applicaton:", "", $3)
    # if new course
    if (! (course in header)) {
        # save header information (first words of each line in output)
        header[course] = course
        theory[course] = "theory"
        app[course] = "application"
    }
    # append the relevant info to the output strings
    header[course] = header[course] "," time
    theory[course] = theory[course] "," $2
    app[course] = app[course] "," $3

}
END {
    # now for each course found
    for (key in header) {
        # print the strings constructed
        print header[key]
        print theory[key]
        print app[key]
        print ""
}

I hope the comments are self explanatory, if you have questions about the script be sure to ask them.

Filter Text File into HTML Table in PHP

You could try a blend of file_get_contents(), foreach() & preg_replace() for that like the commented code below illustrates:

CONTENTS OF TXT FILE:

    Name        : MyName1
    State       : Apt
    PathName    : C:\xx\MyName1.txt

    Name        : MyName2
    State       : Apf
    PathName    : C:\xx\MyName2.txt

    Name        : MyName3
    State       : Apf
    PathName    : C:\xx\MyName3.txt

TABLE CREATION ALGORITHM:

    <?php

        $txtFile        = __DIR__ . "/test.txt"; //<== PATH TO THE TXT FILE
        $txtFileData    = file_get_contents($txtFile);

        // CONVERT $txtFileData TO ARRAY BY SPLITTING AT THE NEW-LINE BOUNDARY.
        $arrTxtData     = explode("\n", $txtFileData);
        $sections       = array();
        $index          = 0;

        // LOOP THROUGH THE ARRAY (FROM ABOVE) AND BUILD SECTION GROUPS
        // USING THE EMPTY LINE AS THE CUE-POINT FOR THE NEXT SECTION GROUP
        foreach($arrTxtData as $line){
            if(!empty(trim($line))){
                $sections[$index][] = trim($line);
            }else{
                $index++;
            }
        }

        // EXTRACT THE VALUES FOR THE TABLE HEADINGS...
        $extract    = $sections[0];
        $heading1   = preg_replace("#(\t|\s)*?(\:.*)$#", "", $extract[0]);
        $heading2   = preg_replace("#(\t|\s)*?(\:.*)$#", "", $extract[1]);
        $heading3   = preg_replace("#(\t|\s)*?(\:.*)$#", "", $extract[2]);

        // BUILD THE HEAD-SECTION OF YOUR TABLE...
        $output     = "<table class=''>" . PHP_EOL;
        $output    .= "<tr class=''>" . PHP_EOL;
        $output    .= "<th class=''>{$heading1}</th>" . PHP_EOL;
        $output    .= "<th class=''>{$heading2}</th>" . PHP_EOL;
        $output    .= "<th class=''>{$heading3}</th>" . PHP_EOL;
        $output    .= "</tr>" . PHP_EOL;
        $output    .= "<tbody>" . PHP_EOL;

        // LOOP THROUGH ALL THE SECTIONS AND BUILD YOUR ROWS + CELLS
        foreach($sections as $section){
            $output.= "<tr>" . PHP_EOL;
            foreach($section as $data){
                $strVal = preg_replace("#(^.*?\:\s*?)#", "", $data);
                $output.= "<td class=''>{$strVal}</td>" . PHP_EOL;
            }
            $output.= "</tr>" . PHP_EOL;
        }
        // CLOSE OFF THE <TBODY> AND <TABLE> TAGS
        $output    .= "</tbody>" . PHP_EOL;
        $output    .= "</table>" . PHP_EOL;

        // ECHO THE RESULTING OUTPUT [THE TABLE]
        echo $output;

RESULT OF THE ECHO STATEMENT:

        Name        State   PathName
        MyName1     Apt     C:\xx\MyName1.txt
        MyName2     Apf     C:\xx\MyName2.txt
        MyName3     Apf     C:\xx\MyName3.txt

JTable Filtering and getting filtered data in text file

but getting data from model based on that filter applied above.

You don't get the data from the model because the model still contain all the data. You need to get the data from the table since the table is display the currently filtered data. For this you just create a loop through all the row/column in the table and use:

table.getValueAt(row, column);

How can I filter out specific rows in tables in R loaded from an HTML file?

Let's do a bit more to help you out.

I'm using these packages:

library(rvest)
library(httr)
library(stringi)
library(hrbrthemes)
library(tidyverse)

We'll use this function to clean up column names:

mcga <- function(tbl) {

  x <- colnames(tbl)
  x <- tolower(x)
  x <- gsub("[[:punct:][:space:]]+", "_", x)
  x <- gsub("_+", "_", x)
  x <- gsub("(^_|_$)", "", x)
  x <- make.unique(x, sep = "_")

  colnames(tbl) <- x

  tbl

}

Since you may want/need to do this for other form combos, we'll start from the main form page:

eval_pg <- read_html("https://opir.fiu.edu/instructor_eval.asp")

We'll eventually grab the actual data that the form submission generates, but we need to "fill in the form" with the option values, so let's get them.

These are the valid parameters for the Term:

term_nodes <- html_nodes(eval_pg, "select[name='Term'] > option")
data_frame(
  name = html_text(term_nodes),
  id = html_attr(term_nodes, "value")
) -> Terms

Terms
## # A tibble: 42 x 2
##           name    id
##          <chr> <chr>
##  1 Summer 2017  1175
##  2 Spring 2017  1171
##  3   Fall 2016  1168
##  4 Summer 2016  1165
##  5 Spring 2016  1161
##  6   Fall 2015  1158
##  7 Summer 2015  1155
##  8 Spring 2015  1151
##  9   Fall 2014  1148
## 10 Summer 2014  1145
# ... with 32 more rows

These are the valid parameters for the Coll:

college_nodes <- html_nodes(eval_pg, "select[name='Coll'] > option")
data_frame(
  name = html_text(college_nodes),
  id = html_attr(college_nodes, "value")
) -> Coll

Coll
## # A tibble: 12 x 2
##                                      name    id
##                                     <chr> <chr>
##  1                                    All     %
##  2 Communication, Architecture & the Arts CARTA
##  3            Arts, Sciences & Education   CASE
##  4                               Business CBADM
##  5                Engineering & Computing CENGR
##  6                         Honors College HONOR
##  7       Hospitality & Tourism Management SHMGT
##  8                                    Law  CLAW
##  9              Nursing & Health Sciences  CNHS
## 10            Public Health & Social Work CPHSW
## 11         International & Public Affairs  SIPA
## 12                Undergraduate Education UGRED

Make the request like a browser. The form creates an HTTP GET request with query parameters which opens up a new browser tab/window. We'll make the same request programmatically using the values obtained ^^:

GET("https://opir.fiu.edu/instructor_evals/instr_eval_result.asp",
    query = list(
      Term = "1171",
      Coll = "CBADM",
      Dept = "",
      RefNum = "",
      Crse = "",
      Instr = ""
    )) -> res

report <- content(res, as="parsed", encoding="UTF-8")

The report variable has the parsed, HTML/XML document with all the data you want. Now, we'll extract & iterate over each table vs yank them all out at once. This will let us associate metadata with each table.

We'll automagically get metadata fields using this helper vector:

fields <- c("Term:", "Instructor Name:", "Course:", "Department:", "Section:",
            "Ref#:", "Title:", "Completed Forms:")

This finds all the tables:

tables_found <- html_nodes(report, xpath=".//table[contains(., 'Term')]")

This sets up a progress bar (the operation takes ~1-2m):

pb <- progress_estimated(length(tables_found))

Now, we iterate over each table we found.

map(tables_found, ~{

  pb$tick()$print() # increment progress

  tab <- .x # this is just for naming sanity convenience

  # Extract the fields
  # - Iterate over each field string
  # - Find that table cell
  # - Extract the text
  # - Remove the field string
  # - Clean up whitespace

  map(fields, ~{
    html_nodes(tab, xpath=sprintf(".//td[contains(., '%s')]", .x)) %>% 
      html_text(trim = TRUE) %>% 
      stri_replace_first_regex(.x, "") %>% 
      stri_trim_both() %>% 
      as.list() %>% 
      set_names(.x)
  }) %>% 
    flatten() %>% 
    as_data_frame() %>% 
    mcga() -> table_meta

  # Extract the actual table
  # Remove cruft and just get the rows with header and data, turn it back into a table and
  # then make a data frame out of it

  html_nodes(tab, xpath=".//tr[contains(@class, 'question') or contains(@class, 'tableback')]")  %>% 
    as.character() %>% 
    paste0(collapse="") %>% 
    sprintf("<table>%s</table>", .) %>% 
    read_html() %>% 
    html_table(header=TRUE) %>% 
    .[[1]] %>% 
    mcga() -> table_vals

  # you may want to clean up % columns here

  # Associate the table values with the table metadata
  table_meta$values <-  list(table_vals)

  # return the combined table
  table_meta

}) %>% 
  bind_rows() -> scraped_tables # bind them all together

We now have a nice, compact nested data frame:

glimpse(scraped_tables)
## Observations: 595
## Variables: 9
## $ term            <chr> "1171 - Spring 2017", "1171 - Spring 2017", "1171 - Spring 2017", "1171 - Spring 2017", "1171...
## $ instructor_name <chr> "Elias, Desiree", "Sueiro, Alexander", "Kim, Myung Sub", "Islam, Mohammad Nazrul", "Ling, Ran...
## $ course          <chr> "ACG   2021", "ACG   2021", "ACG   2021", "ACG   2021", "ACG   2021", "ACG   2021", "ACG   20...
## $ department      <chr> "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOU...
## $ section         <chr> "RVC -1", "U01 -1", "U02 -1", "U03 -1", "U04 -1", "U05 -1", "U06 -1", "U07 -1", "RVC -1", "P8...
## $ ref             <chr> "15164 -1", "15393 -1", "15163 -1", "15345 -1", "15346 -1", "17299 -1", "17300 -1", "33841 -1...
## $ title           <chr> "ACC Decisions", "ACC Decisions", "ACC Decisions", "ACC Decisions", "ACC Decisions", "ACC Dec...
## $ completed_forms <chr> "57", "47", "48", "43", "21", "12", "48", "31", "44", "8", "82", "43", "20", "13", "59", "12"...
## $ values          <list> [<c("Description of course objectives and assignments", "Communication of ideas and informat...

We can "unnest" one "table" at a time:

unnest(scraped_tables[1,])
## # A tibble: 8 x 15
##                 term instructor_name     course department section      ref         title completed_forms
##                <chr>           <chr>      <chr>      <chr>   <chr>    <chr>         <chr>           <chr>
## 1 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 2 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 3 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 4 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 5 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 6 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 7 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## 8 1171 - Spring 2017  Elias, Desiree ACG   2021 SCHACCOUNT  RVC -1 15164 -1 ACC Decisions              57
## # ... with 7 more variables: question <chr>, no_response <chr>, excellent <chr>, very_good <chr>, good <chr>, fair <chr>,
## #   poor <chr>

Focus on "just" the question data:

unnest(scraped_tables[1,]) %>% 
  select(-c(1:8))
## # A tibble: 8 x 7
##                                                   question no_response excellent very_good  good  fair  poor
##                                                      <chr>       <chr>     <chr>     <chr> <chr> <chr> <chr>
## 1         Description of course objectives and assignments        0.0%     64.9%     14.0% 14.0%  3.5%  3.5%
## 2                   Communication of ideas and information        0.0%     56.1%     17.5% 15.8%  5.3%  5.3%
## 3 Expression of expectations for performance in this class        0.0%     63.2%     12.3% 14.0%  8.8%  1.8%
## 4       Availability to assist students in or out of class        3.5%     50.9%     21.1% 10.5% 14.0%  0.0%
## 5                         Respect and concern for students        1.8%     59.6%     10.5% 14.0% 10.5%  3.5%
## 6                        Stimulation of interest in course        1.8%     52.6%     12.3% 17.5%  7.0%  8.8%
## 7                                 Facilitation of learning        0.0%     52.6%     19.3% 10.5% 10.5%  7.0%
## 8                         Overall assessment of instructor        0.0%     54.4%     15.8% 12.3% 14.0%  3.5%

Or, "unnest" them all:

glimpse(unnest(scraped_tables))
## Observations: 4,760
## Variables: 15
## $ term            <chr> "1171 - Spring 2017", "1171 - Spring 2017", "1171 - Spring 2017", "1171 - Spring 2017", "1171...
## $ instructor_name <chr> "Elias, Desiree", "Elias, Desiree", "Elias, Desiree", "Elias, Desiree", "Elias, Desiree", "El...
## $ course          <chr> "ACG   2021", "ACG   2021", "ACG   2021", "ACG   2021", "ACG   2021", "ACG   2021", "ACG   20...
## $ department      <chr> "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOUNT", "SCHACCOU...
## $ section         <chr> "RVC -1", "RVC -1", "RVC -1", "RVC -1", "RVC -1", "RVC -1", "RVC -1", "RVC -1", "U01 -1", "U0...
## $ ref             <chr> "15164 -1", "15164 -1", "15164 -1", "15164 -1", "15164 -1", "15164 -1", "15164 -1", "15164 -1...
## $ title           <chr> "ACC Decisions", "ACC Decisions", "ACC Decisions", "ACC Decisions", "ACC Decisions", "ACC Dec...
## $ completed_forms <chr> "57", "57", "57", "57", "57", "57", "57", "57", "47", "47", "47", "47", "47", "47", "47", "47...
## $ question        <chr> "Description of course objectives and assignments", "Communication of ideas and information",...
## $ no_response     <chr> "0.0%", "0.0%", "0.0%", "3.5%", "1.8%", "1.8%", "0.0%", "0.0%", "0.0%", "0.0%", "0.0%", "2.1%...
## $ excellent       <chr> "64.9%", "56.1%", "63.2%", "50.9%", "59.6%", "52.6%", "52.6%", "54.4%", "66.0%", "59.6%", "66...
## $ very_good       <chr> "14.0%", "17.5%", "12.3%", "21.1%", "10.5%", "12.3%", "19.3%", "15.8%", "23.4%", "23.4%", "23...
## $ good            <chr> "14.0%", "15.8%", "14.0%", "10.5%", "14.0%", "17.5%", "10.5%", "12.3%", "8.5%", "8.5%", "8.5%...
## $ fair            <chr> "3.5%", "5.3%", "8.8%", "14.0%", "10.5%", "7.0%", "10.5%", "14.0%", "0.0%", "6.4%", "2.1%", "...
## $ poor            <chr> "3.5%", "5.3%", "1.8%", "0.0%", "3.5%", "8.8%", "7.0%", "3.5%", "2.1%", "2.1%", "0.0%", "0.0%...

We can also deal with the % here:

unnest(scraped_tables) %>% 
  mutate_all(~{gsub("%", "", .x)}) %>%
  type_convert() %>% 
  select(-c(1:8))
## # A tibble: 4,760 x 7
##                                                    question no_response excellent very_good  good  fair  poor
##                                                       <chr>       <dbl>     <dbl>     <dbl> <dbl> <dbl> <dbl>
##  1         Description of course objectives and assignments         0.0      64.9      14.0  14.0   3.5   3.5
##  2                   Communication of ideas and information         0.0      56.1      17.5  15.8   5.3   5.3
##  3 Expression of expectations for performance in this class         0.0      63.2      12.3  14.0   8.8   1.8
##  4       Availability to assist students in or out of class         3.5      50.9      21.1  10.5  14.0   0.0
##  5                         Respect and concern for students         1.8      59.6      10.5  14.0  10.5   3.5
##  6                        Stimulation of interest in course         1.8      52.6      12.3  17.5   7.0   8.8
##  7                                 Facilitation of learning         0.0      52.6      19.3  10.5  10.5   7.0
##  8                         Overall assessment of instructor         0.0      54.4      15.8  12.3  14.0   3.5
##  9         Description of course objectives and assignments         0.0      66.0      23.4   8.5   0.0   2.1
## 10                   Communication of ideas and information         0.0      59.6      23.4   8.5   6.4   2.1
## # ... with 4,750 more rows

Then, you can do interesting things like:

unnest(scraped_tables) %>% 
  mutate_all(~{gsub("%", "", .x)}) %>%
  type_convert() -> scraped_tables

group_by(scraped_tables, course) %>% 
  filter(question == "Description of course objectives and assignments") %>% 
  gather(resp_cat, resp_val, no_response, excellent, very_good, good, fair, poor) %>% 
  mutate(resp_val = resp_val/100) %>%
  mutate(resp_cat = factor(resp_cat, levels=unique(resp_cat))) %>% 
  filter(resp_val > 0) %>%
  ungroup() -> description_df

ggplot(description_df, aes(resp_val)) +
  geom_density(aes(y=..count..), fill="#b2abd288", color="#2d004b") +
  scale_x_percent() +
  scale_y_comma() +
  facet_wrap(~resp_cat) +
  labs(title="Description of course objectives and assignments distribution across courses") +
  theme_ipsum_rc(grid="XY")

Sample Image

UPDATE

The magrittr pipes can be daunting at first. Here's a version w/o them:

library(rvest)
library(httr)
library(stringi)
library(hrbrthemes)
library(tidyverse)

mcga <- function(tbl) {

  x <- colnames(tbl)
  x <- tolower(x)
  x <- gsub("[[:punct:][:space:]]+", "_", x)
  x <- gsub("_+", "_", x)
  x <- gsub("(^_|_$)", "", x)
  x <- make.unique(x, sep = "_")

  colnames(tbl) <- x

  tbl

}

eval_pg <- read_html("https://opir.fiu.edu/instructor_eval.asp")

term_nodes <- html_nodes(eval_pg, "select[name='Term'] > option")
data_frame(
  name = html_text(term_nodes),
  id = html_attr(term_nodes, "value")
) -> Terms

Terms

college_nodes <- html_nodes(eval_pg, "select[name='Coll'] > option")
data_frame(
  name = html_text(college_nodes),
  id = html_attr(college_nodes, "value")
) -> Coll

Coll

GET("https://opir.fiu.edu/instructor_evals/instr_eval_result.asp",
    query = list(
      Term = "1171",
      Coll = "CBADM",
      Dept = "",
      RefNum = "",
      Crse = "",
      Instr = ""
    )) -> res

report <- content(res, as="parsed", encoding="UTF-8")

fields <- c("Term:", "Instructor Name:", "Course:", "Department:", "Section:",
            "Ref#:", "Title:", "Completed Forms:")

tables_found <- html_nodes(report, xpath=".//table[contains(., 'Term')]")

pb <- progress_estimated(length(tables_found))
map(tables_found, function(.x) {

  pb$tick()$print()

  tab <- .x

  map(fields, function(.x) {

   tmp_field <- html_nodes(tab, xpath=sprintf(".//td[contains(., '%s')]", .x))
   tmp_field <- html_text(tmp_field, trim = TRUE) 
   tmp_field <- stri_replace_first_regex(tmp_field, .x, "") 
   tmp_field <- stri_trim_both(tmp_field)
   tmp_field <- as.list(tmp_field)
   tmp_field <- set_names(tmp_field, .x)
   tmp_field

  }) -> tmp_meta

  tmp_meta <- flatten(tmp_meta)
  tmp_meta <- as_data_frame(tmp_meta) 

  table_meta <- mcga(tmp_meta) 

  tmp_vals <- html_nodes(tab, xpath=".//tr[contains(@class, 'question') or contains(@class, 'tableback')]")
  tmp_vals <- as.character(tmp_vals)
  tmp_vals <- paste0(tmp_vals, collapse="") 
  tmp_vals <- sprintf("<table>%s</table>", tmp_vals) 
  tmp_vals <- read_html(tmp_vals) 
  tmp_vals <- html_table(tmp_vals, header=TRUE)[[1]]

  table_vals <- mcga(tmp_vals)

  table_meta$values <-  list(table_vals)

  table_meta

}) -> list_of_tables

scraped_tables <- bind_rows(list_of_tables)

glimpse(scraped_tables)

unnest(scraped_tables[1,])

tmp_df <- unnest(scraped_tables[1,])
select(tmp_df, -c(1:8))

glimpse(unnest(scraped_tables))

tmp_df <- unnest(scraped_tables)
tmp_df <- mutate_all(tmp_df, function(.x) { gsub("%", "", .x) })
scraped_tables <- type_convert(tmp_df)

(I stripped out the comments since they sections are still the same)

Using PHP, Load content from a CSV file and filter it to meet user input criteria and output in to a table

You need to move the if statement to inside your loop, so that you are checking each book against the user's input.

I.e.

$author = $_GET['author'];
echo '<table border=1>'; //start table
$handle = fopen("books.csv", "r");
while (($books = fgetcsv($handle, 1000, ',')) !== FALSE)
{
    if (strtolower ($author)==strtolower($books[1]))
    {
        echo '<tr><td>',$books[0], //names
        '</td><td>',$books[1], //authors
        '</td><td>',$books[2], //ISBN
        '</td><td>',$books[3], //price
        '</td></tr>';
    }
}
fclose($handle);
echo '</table>'; //end table

You'll also need to add some logic for the case where there are no books which match the search parameters.

Can I filter one excel table based on the content of another?

You can open both files in Excel and use a VLOOKUP to check if the value exists, wrap an If(ISERROR()) around it and you can return a true or a false.

If as above your data spans accross columns A, B and C you can use the following formula (adjust for each file).

=IF(ISERROR(VLOOKUP(B1,<WorkbookToCheck>!B:B,1,False))=TRUE,"X","")

If the ID doesn't exist the the WorkbookToCheck then the formula will return X, otherwise it will return blank.

Filter a Content File to Table