How to Further Filter a Result of Resultset

How to continue filtering beyond BeautifulSoup find_all ResultSet?

You can use CSS selector for selecting <a> tags under specific <th> tags.

For example th[attr="attr"].title a will select all <a> tags under <th> tags with attr="attr" and class="title":

txt = '''<table>
<tbody>
<tr>
<th attr="attr" class="title">
<a href="link.com/arhwth">Title Text</a>
</th>

<th attr="attr" class="title">
<a href="link.com/dfdsth">Title Text 2</a>
</th>

<th attr="attr" class="title">
<a href="link.com/gsfbf">Title Text 3</a>
</th>
</tr>
</tbody>
<a href"otherlink.com">Other link to throw you off</a>
</table>'''

soup = BeautifulSoup(txt, 'html.parser')

print([a.text for a in soup.select('th[attr="attr"].title a')])

Prints:

['Title Text', 'Title Text 2', 'Title Text 3']

Or using BeautifulSoup's own API:

print( [th.a.text for th in soup.find_all("th", {"attr": "attr"}, class_="title") if th.a] )

How to filter content of resultset in java

A better idea still is to only fetch the data you actually want to process, via a suitable WHERE clause.

LINQ - filtering a result set and joining the result set

You can try a join:

var result = (from laptop in DB.tblLaptops
join user in DB.Users
on user.DeviceId equals laptop.DeviceId
where user.DomainId =='MS\\aram'
select new { user.UserId, laptop.DeviceId }).ToList();

The above query would return the laptop devices and user ids for a specific domain. Regarding the DisplayName, we need some more info, in order to plug it also this in the above query, in order to fetch also this information.

Update

Since the above it is not going to work, since you access these tables through different contexts, here is my thought.

Provided that laptops is not a rather big table, you could fetch it in memory and make the join in memory. Apparently, this is not an optimal solution, but rather a workaround, that it wouldn't hurt you, if the laptops table is not big.

In terms of code:

// fetch the user devices in the specific domain:
var usersDevices = (from user in DB.Users
where user.DomainId == 'MS\\aram'
select new
{
user.UserId,
user.DeviceId
}).ToList();



// fetch **ALL** the laptops:
var laptops = DB.tblLaptops.ToList();

// perform the join:
var userLaptops = (from laptop in laptops
join userDevice in usersDevices
on userDevice.DeviceId equals laptop.DeviceId
select new
{
user.UserId,
laptop.DeviceId
}).ToList();

The correct approach it would be to think about, why these related info are behind different DbContext classes. That essentially means that these data are in different databases. If so, are these databases in the same machine ? If so and you don't plan these databases to be in different machines in the short future, you could quite probably makes all these queries in the database and fetch from the server that your application leaves only the needed data and not all the data and then filter/join them. IMHO just my 2 cents, based on many assumptions :)

Is it better to filter a resultset using a WHERE clause or using application code?

The rule of thumb for any application is to let the DB do the things it does well: filtering, sorting, and joining.

Separate the queries into their own functions or class methods:

$men = $foo->fetchMaleUsers();
$women = $foo->fetchFemaleUsers();

Update

I took Steven's PostgreSQL demonstration of a full table scan query performing twice as good as two separate indexed queries and mimicked it using MySQL (which is used in the actual question):

Schema

CREATE TABLE `gender_test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`gender` enum('male','female') NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=26017396 DEFAULT CHARSET=utf8

I changed the gender type to not be a VARCHAR(20) as it is more realistic for the purpose of this column, I also provide a primary key as you would expect on a table instead of an arbitrary DOUBLE value.

Unindexed Results

mysql> select sql_no_cache * from gender_test WHERE gender = 'male';

12995993 rows in set (31.72 sec)

mysql> select sql_no_cache * from gender_test WHERE gender = 'female';

13004007 rows in set (31.52 sec)

mysql> select sql_no_cache * from gender_test;

26000000 rows in set (32.95 sec)

I trust this needs no explanation.

Indexed Results

ALTER TABLE gender_test ADD INDEX (gender);

...

mysql> select sql_no_cache * from gender_test WHERE gender = 'male';

12995993 rows in set (15.97 sec)

mysql> select sql_no_cache * from gender_test WHERE gender = 'female';

13004007 rows in set (15.65 sec)

mysql> select sql_no_cache * from gender_test;

26000000 rows in set (27.80 sec)

The results shown here are radically different from Steven's data. The indexed queries perform almost twice as fast as the full table scan. This is from a properly indexed table using common sense column definitions. I don't know PostgreSQL at all, but there must be some significant misconfiguration in Steven's example to not show similar results.

Given PostgreSQL's reputation for doing things better than MySQL, or at least as good as, I daresay that PostgreSql would demonstrate similar performance if properly used.

Also note, on this same machine an overly simplified for loop doing 52 million comparisons takes an additional 7.3 seconds to execute.

<?php
$N = 52000000;
for($i = 0; $i < $N; $i++) {
if (true == true) {
}
}

I think it's rather obvious what is the better approach given this data.

Filtering a Result Set based on changed columns SQL

You can use LAG to get the previous row's value to compare to.

SELECT
ClaimId,
adjustmentVersion,
ServiceDateFrom,
ServiceDateTo,
ProcedureCode,
PlaceOfService
FROM (
SELECT *,
ServiceDateFrom_prev = LAG(rs.ServiceDateFrom) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion),
ServiceDateTo_prev = LAG(rs.ServiceDateTo ) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion),
ProcedureCode_prev = LAG(rs.ProcedureCode ) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion),
PlaceOfService_prev = LAG(rs.PlaceOfService ) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion)
FROM #ResultSet_fields rs
) rs
WHERE (
@CompareFields LIKE '%ServiceDateFrom%' AND ServiceDateFrom <> ServiceDateFrom_prev
OR @CompareFields LIKE '%ServiceDateTo%' AND ServiceDateTo <> ServiceDateTo_prev
OR @CompareFields LIKE '%ProcedureCode%' AND ProcedureCode <> ProcedureCode_prev
OR @CompareFields LIKE '%PlaceOfService%' AND ProcedureCode <> PlaceOfService_prev
);

If you want to use indexes, or you have a lot of columns ot compare, you can use dynamic SQL

DECLARE @lagCols nvarchar(max), @whereFilters nvarchar(max);

SELECT
@lagCols = STRING_AGG(CAST(
' ' + QUOTENAME(c.name + '_chg') + ' = LAG(rs.' + QUOTENAME(c.name) + ') OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion)
' AS nvarchar(max)), ',')

,@whereFilters = STRING_AGG(CAST(
QUOTENAME(c.name + '_chg') + ' <> ' + QUOTENAME(c.name)
AS nvarchar(max)), ' OR
')

FROM STRING_SPLIT(@CompareFields, ',') s
JOIN tempdb.sys.columns c ON c.name = TRIM(s.value) -- make sure to get the right database
WHERE c.object_id = OBJECT_ID('tempdb..#ResultSet_fields');

DECLARE @sql nvarchar(max) = '
SELECT
ClaimId,
adjustmentVersion,
ServiceDateFrom,
ServiceDateTo,
ProcedureCode,
PlaceOfService
FROM (
SELECT *,
' + @lagCols + '
FROM #ResultSet_fields rs
) rs
WHERE (
' + @whereFilters + '
);
';

PRINT @sql; -- for testing
EXEC sp_executesql @sql;

How to filter resultset rows by comparing malformed date string and today's date?

Here is the bad news: Your code is suffering from the ramifications of bad database design. If you are storing date values, never store them in anything other than a date-type column.

Here is the good news: Even if you don't repair your database structure (and you really, really should), you don't need any php to get your query working. In fact, you don't even need to employ a prepared statement because there are no variables to include.

Code: (SQLFiddle Demo)

SELECT discount,
description,
logouploader,
DATE_FORMAT(STR_TO_DATE(expirydate, '%Y,%m,%d'), '%Y-%m-%d') AS `reformatted`
FROM `table`
WHERE STR_TO_DATE(expirydate, '%Y,%m,%d') >= CURDATE()
ORDER BY expirydate

This will yield all rows of discounts that have not yet expired (ordered by date).

You may notice that:

  • You don't need to reformat expirydate in the ORDER BY clause because it is accurately sorted as a string.
  • I could have just as easily called upon REPLACE() instead of STR_TO_DATE() to format your comma-separated date to hyphen-separated.
  • DATE_FORMAT() will offer you great flexibility in formatting the date in your resultset, so that your php can do less processing.


Related Topics



Leave a reply



Submit