How to continue filtering beyond BeautifulSoup find_all ResultSet?
You can use CSS selector for selecting <a>
tags under specific <th>
tags.
For example th[attr="attr"].title a
will select all <a>
tags under <th>
tags with attr="attr"
and class="title"
:
txt = '''<table>
<tbody>
<tr>
<th attr="attr" class="title">
<a href="link.com/arhwth">Title Text</a>
</th>
<th attr="attr" class="title">
<a href="link.com/dfdsth">Title Text 2</a>
</th>
<th attr="attr" class="title">
<a href="link.com/gsfbf">Title Text 3</a>
</th>
</tr>
</tbody>
<a href"otherlink.com">Other link to throw you off</a>
</table>'''
soup = BeautifulSoup(txt, 'html.parser')
print([a.text for a in soup.select('th[attr="attr"].title a')])
Prints:
['Title Text', 'Title Text 2', 'Title Text 3']
Or using BeautifulSoup's own API:
print( [th.a.text for th in soup.find_all("th", {"attr": "attr"}, class_="title") if th.a] )
How to filter content of resultset in java
A better idea still is to only fetch the data you actually want to process, via a suitable WHERE clause.
LINQ - filtering a result set and joining the result set
You can try a join:
var result = (from laptop in DB.tblLaptops
join user in DB.Users
on user.DeviceId equals laptop.DeviceId
where user.DomainId =='MS\\aram'
select new { user.UserId, laptop.DeviceId }).ToList();
The above query would return the laptop devices and user ids for a specific domain. Regarding the DisplayName, we need some more info, in order to plug it also this in the above query, in order to fetch also this information.
Update
Since the above it is not going to work, since you access these tables through different contexts, here is my thought.
Provided that laptops is not a rather big table, you could fetch it in memory and make the join in memory. Apparently, this is not an optimal solution, but rather a workaround, that it wouldn't hurt you, if the laptops table is not big.
In terms of code:
// fetch the user devices in the specific domain:
var usersDevices = (from user in DB.Users
where user.DomainId == 'MS\\aram'
select new
{
user.UserId,
user.DeviceId
}).ToList();
// fetch **ALL** the laptops:
var laptops = DB.tblLaptops.ToList();
// perform the join:
var userLaptops = (from laptop in laptops
join userDevice in usersDevices
on userDevice.DeviceId equals laptop.DeviceId
select new
{
user.UserId,
laptop.DeviceId
}).ToList();
The correct approach it would be to think about, why these related info are behind different DbContext classes. That essentially means that these data are in different databases. If so, are these databases in the same machine ? If so and you don't plan these databases to be in different machines in the short future, you could quite probably makes all these queries in the database and fetch from the server that your application leaves only the needed data and not all the data and then filter/join them. IMHO just my 2 cents, based on many assumptions :)
Is it better to filter a resultset using a WHERE clause or using application code?
The rule of thumb for any application is to let the DB do the things it does well: filtering, sorting, and joining.
Separate the queries into their own functions or class methods:
$men = $foo->fetchMaleUsers();
$women = $foo->fetchFemaleUsers();
Update
I took Steven's PostgreSQL demonstration of a full table scan query performing twice as good as two separate indexed queries and mimicked it using MySQL (which is used in the actual question):
Schema
CREATE TABLE `gender_test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`gender` enum('male','female') NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=26017396 DEFAULT CHARSET=utf8
I changed the gender type to not be a VARCHAR(20) as it is more realistic for the purpose of this column, I also provide a primary key as you would expect on a table instead of an arbitrary DOUBLE value.
Unindexed Results
mysql> select sql_no_cache * from gender_test WHERE gender = 'male';
12995993 rows in set (31.72 sec)
mysql> select sql_no_cache * from gender_test WHERE gender = 'female';
13004007 rows in set (31.52 sec)
mysql> select sql_no_cache * from gender_test;
26000000 rows in set (32.95 sec)
I trust this needs no explanation.
Indexed Results
ALTER TABLE gender_test ADD INDEX (gender);
...
mysql> select sql_no_cache * from gender_test WHERE gender = 'male';
12995993 rows in set (15.97 sec)
mysql> select sql_no_cache * from gender_test WHERE gender = 'female';
13004007 rows in set (15.65 sec)
mysql> select sql_no_cache * from gender_test;
26000000 rows in set (27.80 sec)
The results shown here are radically different from Steven's data. The indexed queries perform almost twice as fast as the full table scan. This is from a properly indexed table using common sense column definitions. I don't know PostgreSQL at all, but there must be some significant misconfiguration in Steven's example to not show similar results.
Given PostgreSQL's reputation for doing things better than MySQL, or at least as good as, I daresay that PostgreSql would demonstrate similar performance if properly used.
Also note, on this same machine an overly simplified for loop doing 52 million comparisons takes an additional 7.3 seconds to execute.
<?php
$N = 52000000;
for($i = 0; $i < $N; $i++) {
if (true == true) {
}
}
I think it's rather obvious what is the better approach given this data.
Filtering a Result Set based on changed columns SQL
You can use LAG
to get the previous row's value to compare to.
SELECT
ClaimId,
adjustmentVersion,
ServiceDateFrom,
ServiceDateTo,
ProcedureCode,
PlaceOfService
FROM (
SELECT *,
ServiceDateFrom_prev = LAG(rs.ServiceDateFrom) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion),
ServiceDateTo_prev = LAG(rs.ServiceDateTo ) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion),
ProcedureCode_prev = LAG(rs.ProcedureCode ) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion),
PlaceOfService_prev = LAG(rs.PlaceOfService ) OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion)
FROM #ResultSet_fields rs
) rs
WHERE (
@CompareFields LIKE '%ServiceDateFrom%' AND ServiceDateFrom <> ServiceDateFrom_prev
OR @CompareFields LIKE '%ServiceDateTo%' AND ServiceDateTo <> ServiceDateTo_prev
OR @CompareFields LIKE '%ProcedureCode%' AND ProcedureCode <> ProcedureCode_prev
OR @CompareFields LIKE '%PlaceOfService%' AND ProcedureCode <> PlaceOfService_prev
);
If you want to use indexes, or you have a lot of columns ot compare, you can use dynamic SQL
DECLARE @lagCols nvarchar(max), @whereFilters nvarchar(max);
SELECT
@lagCols = STRING_AGG(CAST(
' ' + QUOTENAME(c.name + '_chg') + ' = LAG(rs.' + QUOTENAME(c.name) + ') OVER (PARTITION BY rs.ClaimId ORDER BY rs.adjustmentVersion)
' AS nvarchar(max)), ',')
,@whereFilters = STRING_AGG(CAST(
QUOTENAME(c.name + '_chg') + ' <> ' + QUOTENAME(c.name)
AS nvarchar(max)), ' OR
')
FROM STRING_SPLIT(@CompareFields, ',') s
JOIN tempdb.sys.columns c ON c.name = TRIM(s.value) -- make sure to get the right database
WHERE c.object_id = OBJECT_ID('tempdb..#ResultSet_fields');
DECLARE @sql nvarchar(max) = '
SELECT
ClaimId,
adjustmentVersion,
ServiceDateFrom,
ServiceDateTo,
ProcedureCode,
PlaceOfService
FROM (
SELECT *,
' + @lagCols + '
FROM #ResultSet_fields rs
) rs
WHERE (
' + @whereFilters + '
);
';
PRINT @sql; -- for testing
EXEC sp_executesql @sql;
How to filter resultset rows by comparing malformed date string and today's date?
Here is the bad news: Your code is suffering from the ramifications of bad database design. If you are storing date values, never store them in anything other than a date-type column.
Here is the good news: Even if you don't repair your database structure (and you really, really should), you don't need any php to get your query working. In fact, you don't even need to employ a prepared statement because there are no variables to include.
Code: (SQLFiddle Demo)
SELECT discount,
description,
logouploader,
DATE_FORMAT(STR_TO_DATE(expirydate, '%Y,%m,%d'), '%Y-%m-%d') AS `reformatted`
FROM `table`
WHERE STR_TO_DATE(expirydate, '%Y,%m,%d') >= CURDATE()
ORDER BY expirydate
This will yield all rows of discounts that have not yet expired (ordered by date).
You may notice that:
- You don't need to reformat
expirydate
in theORDER BY
clause because it is accurately sorted as a string. - I could have just as easily called upon REPLACE() instead of STR_TO_DATE() to format your comma-separated date to hyphen-separated.
- DATE_FORMAT() will offer you great flexibility in formatting the date in your resultset, so that your php can do less processing.
Related Topics
Check Type: How to Check If Something Is a Rdd or a Dataframe
Incorrect Column Alignment When Printing Table in Python Using Tab Characters
Fastest 2D Convolution or Image Filter in Python
How to Restart Airflow Webserver
Regex to Match Digits and At Most One Space Between Them
Setting Matplotlib Colorbar Range
How to Manage Division of Huge Numbers in Python
Split List into Two Parts Based on Some Delimiter in Each List Element in Python
Construct Networkx Graph from Pandas Dataframe
How to Get All Days in Current Month
How to Copy/Repeat an Array N Times into a New Array
Unable to Install Psycopg2 (Pip Install Psycopg2)
Error Opening File in H5Py (File Signature Not Found)
How to Deal With Certificates Using Selenium
Using Regex to Find All Phrases That Are Completely Capitalized
How Would I Make a Dictionary That Can Store User Input in Python