Python Script for Minifying CSS

Python script for minifying CSS?

This seemed like a good task for me to get into python, which has been pending for a while. I hereby present my first ever python script:

import sys, re

with open( sys.argv[1] , 'r' ) as f:
css = f.read()

# remove comments - this will break a lot of hacks :-P
css = re.sub( r'\s*/\*\s*\*/', "$$HACK1$$", css ) # preserve IE<6 comment hack
css = re.sub( r'/\*[\s\S]*?\*/', "", css )
css = css.replace( "$$HACK1$$", '/**/' ) # preserve IE<6 comment hack

# url() doesn't need quotes
css = re.sub( r'url\((["\'])([^)]*)\1\)', r'url(\2)', css )

# spaces may be safely collapsed as generated content will collapse them anyway
css = re.sub( r'\s+', ' ', css )

# shorten collapsable colors: #aabbcc to #abc
css = re.sub( r'#([0-9a-f])\1([0-9a-f])\2([0-9a-f])\3(\s|;)', r'#\1\2\3\4', css )

# fragment values can loose zeros
css = re.sub( r':\s*0(\.\d+([cm]m|e[mx]|in|p[ctx]))\s*;', r':\1;', css )

for rule in re.findall( r'([^{]+){([^}]*)}', css ):

# we don't need spaces around operators
selectors = [re.sub( r'(?<=[\[\(>+=])\s+|\s+(?=[=~^$*|>+\]\)])', r'', selector.strip() ) for selector in rule[0].split( ',' )]

# order is important, but we still want to discard repetitions
properties = {}
porder = []
for prop in re.findall( '(.*?):(.*?)(;|$)', rule[1] ):
key = prop[0].strip().lower()
if key not in porder: porder.append( key )
properties[ key ] = prop[1].strip()

# output rule if it contains any declarations
if properties:
print "%s{%s}" % ( ','.join( selectors ), ''.join(['%s:%s;' % (key, properties[key]) for key in porder])[:-1] )

I believe this to work, and output it tests fine on recent Safari, Opera, and Firefox. It will break CSS hacks other than the underscore & /**/ hacks! Do not use a minifier if you have a lot of hacks going on (or put them in a separate file).

Any tips on my python appreciated. Please be gentle though, it's my first time. :-)

Minify all CSS and Javascript in a directory

As stated by @Squall,

rcssmin.cssmin() and rjsmin.jsmin() expect the first element to be the CSS respectively JS code to minify as string. You have to open and read the CSS and JS files by yourself.


if filename.endswith(".css"):
with open(os.path.join(dirname, filename), "r") as assetfile:
assetdata = assetfile.read().replace("\n", "")
cssMinified = io.StringIO()
cssMinified.write(rcssmin.cssmin(assetdata, keep_bang_comments=True))
themePak.writestr(os.path.join(dirname, filename), cssMinified.getvalue())

if filename.endswith(".js"):
with open(os.path.join(dirname, filename), "r") as assetfile:
assetdata = assetfile.read().replace("\n", "")
jsMinified = io.StringIO()
jsMinified.write(rjsmin.jsmin(assetdata, keep_bang_comments=True))
themePak.writestr(os.path.join(dirname, filename), jsMinified.getvalue())

The changes in my if statements in the above code open the asset files as strings, then pass them along for minification.

I learned the hard way that you have to be sure to os.path.join() the filenames and the directories.

with open(os.path.join(dirname, filename), "r") as assetfile:
assetdata = assetfile.read().replace("\n", "")

Then minify assetdata and write to file. (In this case, memory object.)

Minify/compress javascript and css on deployment in webapp2?

You can write a simple script to do so.

# -- update_batch.py --
import sys
import os

def main():
if len(sys.argv) == 1:
return

appId = sys.argv[1]
print "appId", appId

# Your script to minify javascipt
#os.chdir(r".\template")
#cmd = r'jscom.py ./js/new/xxx_plugin.js xxx_plugin.js %s.appspot.com'%appId
#os.system(cmd)

os.chdir("..")
# Perform appcfg.py to update GAE server
cmd = r'"C:\Program Files\Google\google_appengine\appcfg.py"'
os.system(cmd + " update . " + " -A %s"%appId)

#os.system(cmd + " backends . " + " update worker " + " -A %s"%appId)

if __name__ == "__main__":
main()

# Usage update_batch.py YOUR_APP_ID_HERE

Compress(minimize) HTML from python

You can use htmlmin to minify your html:

import htmlmin

html = """
<!DOCTYPE html>
<html lang="en">
<head>
<title>Bootstrap Case</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body>
<div class="container">
<h2>Well</h2>
<div class="well">Basic Well</div>
</div>
</body>
</html>
"""

minified = htmlmin.minify(html.decode("utf-8"), remove_empty_space=True)
print(minified)

Django: auto minifying css/js files before release

Did you try django-compress ?

See http://djangopackages.com/grids/g/asset-managers/ for a fairly complete list of available asset managers for Django...

If you already are using django-compress, you should have a look at upgrading to django-pipeline, which is a well maintained fork, with a lot of new features. I encourage everyone to who is using django-compress to switch to django-pipeline instead: * django-pipeline documentation

Any recommendations for a CSS minifier?

The YUI Compressor is fantastic. It works on JavaScript and CSS. Check it out.

Are there libraries for packing and minifying multiple CSS and JS files into one file each?

What you are looking for is a css and javascript pipeline. Its becoming a standard for frameworks to provide this kind of tools. For instance, Rails 3.1 has its own asset pipeline built-in.

Not only it will merge your css and javascripts into a single pack, but it will also compress them for even further performance boost.

Fortunately, django also has its own plugin for that

https://github.com/cyberdelia/django-pipeline

How can I scrape a website that does not show any HTML codes in the source using Python without Selenium

In hope that your next question(s) will contain a minimal reproducible example, here is one way to scrape that information: data is being hydrated into page dynamically, via XHR calls to an API. You can see that by inspecting Dev tools - Network tab in your browser.

import requests
import pandas as pd
from tqdm import tqdm

big_df = pd.DataFrame()

headers = {
'Origin': 'https://www.cea.gov.sg',
'Content-Type': 'application/json;charset=UTF-8',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

s = requests.Session()
s.headers.update(headers)
url = 'https://www.cea.gov.sg/aceas/api/internet/profile/v2/public-register/filter'

for x in tqdm(range(1, 10)):
payload = '{"page":' + str(x) + ',"pageSize":100,"sortAscFlag":"true","registrationNumber":"R","sort":"name","profileType":2}'
r = s.post(url, data=payload)
df = pd.json_normalize(r.json()['data'])
big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
print(big_df)

This will display the datadframe in terminal:

id  name    businessName    licenseNumber   validityDateStart   validityDateEnd awards  disciplinaryActions registrationNumber  photoUrl    currentEa
0 0679f4a5-ca99-4c6c-a4c4-12528ece6294 'AFFAN BIN ASHAK HARI 'AFFAN A.H. L3002382K 2019-02-27T00:00:00+08:00 2022-12-31T23:59:59.99+08:00 None None R060832J None ERA REALTY NETWORK PTE LTD
1 c55b6be6-15fa-490c-a688-745a91839596 AARON BAN QI WEI AARON BAN L3002382K 2019-08-30T00:00:00+08:00 2022-12-31T23:59:59.99+08:00 None None R061593I None ERA REALTY NETWORK PTE LTD
2 6525dc35-5d0b-467f-8fc1-68894884e3fb AARON GOH JIN HAO None L3008022J 2019-09-16T00:00:00+08:00 2022-12-31T19:14:00+08:00 None None R052117I None PROPNEX REALTY PTE. LTD.
3 38d64cfc-5cb2-4027-9add-f01ee4ed8769 AARON HUAN SHEN LI AARON HUAN L3008022J 2019-01-01T00:00:00+08:00 2022-12-31T14:57:00+08:00 None None R041988I None PROPNEX REALTY PTE. LTD.
4 15aa903c-2402-4bed-87d2-ef6fce88a502 AARON LEONG JIA SHENG None L3008022J 2021-01-01T00:00:00+08:00 2022-12-31T23:59:59.99+08:00 None None R062835F None PROPNEX REALTY PTE. LTD.
... ... ... ... ... ... ... ... ... ... ... ...
895 d845c8a7-0713-40ba-b0d9-7f06933997ee ANG YAM NEE, AGNES JAEL JAEL ANG L3002382K 2017-08-28T00:00:00+08:00 2022-12-31T20:14:00+08:00 None None R058715C None ERA REALTY NETWORK PTE LTD
896 705a51a0-2024-4fc3-9a84-27530a579681 ANG YAN BRYAN ANG L3008022J 2018-02-27T00:00:00+08:00 2022-12-31T23:59:59.99+08:00 None None R009088G None PROPNEX REALTY PTE. LTD.
897 968a17a8-b0bb-455f-a95b-0281084d1da5 ANG YANG MING ANG YM L3008022J 2011-01-01T00:00:00+08:00 2022-12-31T20:24:00+08:00 None None R009471H None PROPNEX REALTY PTE. LTD.
898 98fbdf7a-107a-4980-a7b4-82560123769e ANG YAP CHOW Ang Yap Chow L3008899K 2021-04-30T00:00:00+08:00 2022-12-31T00:21:00+08:00 None None R063534D None HUTTONS ASIA PTE. LTD.
899 848efbcf-3050-4a1d-9394-bf4e8051f4a0 ANG YAP HWEE SUNNY L3010497H 2018-01-01T00:00:00+08:00 2022-12-31T11:44:00+08:00 None None R040155F None ASSET PROPERTY PRIVATE LIMITED

There are 341 pages, and you can go through all, the example above is only pulling the first 10 pages of data.

Relevant pandas documentation: https://pandas.pydata.org/pandas-docs/stable/index.html

Requests documentation: https://requests.readthedocs.io/en/latest/



Related Topics



Leave a reply



Submit