打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
WEB高性能开发(10)

前言:    

     上一篇随笔中网友 skyaspnet 问我如何压缩HTML,当时回答是推荐他使用gzip,后来想想,要是能把所有的html,jsp(aspx)在运行前都压缩成1行未免不是一件好事啊。一般我们启动gzip都比较少对html启动gzip,因为现在的html都是动态的,不会使用浏览器缓存,而启用gzip的话每次请求都需要压缩,会比较消耗服务器资源,对js,css启动gzip比较好是因为js,css都会使用缓存。我个人觉得的压缩html的最大好处就是一本万利,只要写好了一次,以后所有程序都可以使用,不会增加任何额外的开发工作。

 

     在“JS、CSS的合并、压缩、缓存管理”一文中说到自己写过的1个自动合并、压缩JS,CSS,并添加版本号的组件。这次把压缩html的功能也加入到该组件中,流程很简单,就是在程序启动(contextInitialized or Application_Start)的时候扫描所有html,jsp(aspx)进行压缩。

 

压缩的注意事项:

     实现的方式主要是用正则表达式去查找,替换。在html压缩的时候,主要要注意下面几点:

          1. pre,textarea 标签里面的内容格式需要保留,不能压缩。

          2. 去掉html注释的时候,有些注释是不能去掉的,比如:<!--[if IE 6]> ..... <![endif]-->

          3. 压缩嵌入式js中的注释要注意,因为可能注释符号会出现在字符串中,比如: var url = "http://www.cnblogs.com";    // 前面的//不是注释

              去掉JS换行符的时候,不能直接跟一下行动内容,需要有空格,考虑下面的代码:

              else

                 return;

             如果不带空格,则变成elsereturn。

          4. jsp(aspx) 中很有可能会使用<% %>嵌入一些服务器代码,这个时候也需要单独处理,里面注释的处理方法跟js的一样。

 

源代码:

    下面是java实现的源代码,也可以 猛击此处 下载该代码,相信大家都看的懂,也很容易改成net代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
import java.io.StringReader;
import java.io.StringWriter;
import java.util.*;
import java.util.regex.*;
/*******************************************
 * 压缩jsp,html中的代码,去掉所有空白符、换行符
 * @author  bearrui(ak-47)
 * @version 0.1
 * @date     2010-5-13
 *******************************************/
public class HtmlCompressor {
    private static String tempPreBlock = "%%%HTMLCOMPRESS~PRE&&&";
    private static String tempTextAreaBlock = "%%%HTMLCOMPRESS~TEXTAREA&&&";
    private static String tempScriptBlock = "%%%HTMLCOMPRESS~SCRIPT&&&";
    private static String tempStyleBlock = "%%%HTMLCOMPRESS~STYLE&&&";
    private static String tempJspBlock = "%%%HTMLCOMPRESS~JSP&&&";
     
    private static Pattern commentPattern = Pattern.compile("<!--\\s*[^\\[].*?-->", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    private static Pattern itsPattern = Pattern.compile(">\\s+?<", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    private static Pattern prePattern = Pattern.compile("<pre[^>]*?>.*?</pre>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    private static Pattern taPattern = Pattern.compile("<textarea[^>]*?>.*?</textarea>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    private static Pattern jspPattern = Pattern.compile("<%([^-@][\\w\\W]*?)%>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    // <script></script>
    private static Pattern scriptPattern = Pattern.compile("(?:<script\\s*>|<script type=['\"]text/javascript['\"]\\s*>)(.*?)</script>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    private static Pattern stylePattern = Pattern.compile("<style[^>()]*?>(.+)</style>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    // 单行注释,
    private static Pattern signleCommentPattern = Pattern.compile("//.*");
    // 字符串匹配
    private static Pattern stringPattern = Pattern.compile("(\"[^\"\\n]*?\"|'[^'\\n]*?')");
    // trim去空格和换行符
    private static Pattern trimPattern = Pattern.compile("\\n\\s*",Pattern.MULTILINE);
    private static Pattern trimPattern2 = Pattern.compile("\\s*\\r",Pattern.MULTILINE);
    // 多行注释
    private static Pattern multiCommentPattern = Pattern.compile("/\\*.*?\\*/", Pattern.DOTALL | Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
    private static String tempSingleCommentBlock = "%%%HTMLCOMPRESS~SINGLECOMMENT&&&"// //占位符
    private static String tempMulitCommentBlock1 = "%%%HTMLCOMPRESS~MULITCOMMENT1&&&"// /*占位符
    private static String tempMulitCommentBlock2 = "%%%HTMLCOMPRESS~MULITCOMMENT2&&&"// */占位符
     
     
    public static String compress(String html) throws Exception {
        if(html == null || html.length() == 0) {
            return html;
        }
         
        List<String> preBlocks = new ArrayList<String>();
        List<String> taBlocks = new ArrayList<String>();
        List<String> scriptBlocks = new ArrayList<String>();
        List<String> styleBlocks = new ArrayList<String>();
        List<String> jspBlocks = new ArrayList<String>();
         
        String result = html;
         
        //preserve inline java code
        Matcher jspMatcher = jspPattern.matcher(result);
        while(jspMatcher.find()) {
            jspBlocks.add(jspMatcher.group(0));
        }
        result = jspMatcher.replaceAll(tempJspBlock);
         
        //preserve PRE tags
        Matcher preMatcher = prePattern.matcher(result);
        while(preMatcher.find()) {
            preBlocks.add(preMatcher.group(0));
        }
        result = preMatcher.replaceAll(tempPreBlock);
         
        //preserve TEXTAREA tags
        Matcher taMatcher = taPattern.matcher(result);
        while(taMatcher.find()) {
            taBlocks.add(taMatcher.group(0));
        }
        result = taMatcher.replaceAll(tempTextAreaBlock);
         
        //preserve SCRIPT tags
        Matcher scriptMatcher = scriptPattern.matcher(result);
        while(scriptMatcher.find()) {
            scriptBlocks.add(scriptMatcher.group(0));
        }
        result = scriptMatcher.replaceAll(tempScriptBlock);
         
        // don't process inline css
        Matcher styleMatcher = stylePattern.matcher(result);
        while(styleMatcher.find()) {
            styleBlocks.add(styleMatcher.group(0));
        }
        result = styleMatcher.replaceAll(tempStyleBlock);
         
        //process pure html
        result = processHtml(result);
         
        //process preserved blocks
        result = processPreBlocks(result, preBlocks);
        result = processTextareaBlocks(result, taBlocks);
        result = processScriptBlocks(result, scriptBlocks);
        result = processStyleBlocks(result, styleBlocks);
        result = processJspBlocks(result, jspBlocks);
         
        preBlocks = taBlocks = scriptBlocks = styleBlocks = jspBlocks = null;
         
        return result.trim();
    }
     
    private static String processHtml(String html) {
        String result = html;
         
        //remove comments
//      if(removeComments) {
            result = commentPattern.matcher(result).replaceAll("");
//      }
         
        //remove inter-tag spaces
//      if(removeIntertagSpaces) {
            result = itsPattern.matcher(result).replaceAll("><");
//      }
         
        //remove multi whitespace characters
//      if(removeMultiSpaces) {
            result = result.replaceAll("\\s{2,}"," ");
//      }
                 
        return result;
    }
     
    private static String processJspBlocks(String html, List<String> blocks){
        String result = html;
        for(int i = 0; i < blocks.size(); i++) {
            blocks.set(i, compressJsp(blocks.get(i)));
        }
        //put preserved blocks back
        while(result.contains(tempJspBlock)) {
            result = result.replaceFirst(tempJspBlock, Matcher.quoteReplacement(blocks.remove(0)));
        }
         
        return result;
    }
    private static String processPreBlocks(String html, List<String> blocks) throws Exception {
        String result = html;
         
        //put preserved blocks back
        while(result.contains(tempPreBlock)) {
            result = result.replaceFirst(tempPreBlock, Matcher.quoteReplacement(blocks.remove(0)));
        }
         
        return result;
    }
     
    private static String processTextareaBlocks(String html, List<String> blocks) throws Exception {
        String result = html;
         
        //put preserved blocks back
        while(result.contains(tempTextAreaBlock)) {
            result = result.replaceFirst(tempTextAreaBlock, Matcher.quoteReplacement(blocks.remove(0)));
        }
         
        return result;
    }
     
    private static String processScriptBlocks(String html, List<String> blocks) throws Exception {
        String result = html;
         
//      if(compressJavaScript) {
            for(int i = 0; i < blocks.size(); i++) {
                blocks.set(i, compressJavaScript(blocks.get(i)));
            }
//      }
         
        //put preserved blocks back
        while(result.contains(tempScriptBlock)) {
            result = result.replaceFirst(tempScriptBlock, Matcher.quoteReplacement(blocks.remove(0)));
        }
         
        return result;
    }
     
    private static String processStyleBlocks(String html, List<String> blocks) throws Exception {
        String result = html;
         
//      if(compressCss) {
            for(int i = 0; i < blocks.size(); i++) {
                blocks.set(i, compressCssStyles(blocks.get(i)));
            }
//      }
         
        //put preserved blocks back
        while(result.contains(tempStyleBlock)) {
            result = result.replaceFirst(tempStyleBlock, Matcher.quoteReplacement(blocks.remove(0)));
        }
         
        return result;
    }
     
    private static String compressJsp(String source)  {
        //check if block is not empty
        Matcher jspMatcher = jspPattern.matcher(source);
        if(jspMatcher.find()) {
            String result = compressJspJs(jspMatcher.group(1));
            return (new StringBuilder(source.substring(0, jspMatcher.start(1))).append(result).append(source.substring(jspMatcher.end(1)))).toString();
        } else {
            return source;
        }
    }  
    private static String compressJavaScript(String source)  {
        //check if block is not empty
        Matcher scriptMatcher = scriptPattern.matcher(source);
        if(scriptMatcher.find()) {
            String result = compressJspJs(scriptMatcher.group(1));
            return (new StringBuilder(source.substring(0, scriptMatcher.start(1))).append(result).append(source.substring(scriptMatcher.end(1)))).toString();
        } else {
            return source;
        }
    }
         
    private static String compressCssStyles(String source)  {
        //check if block is not empty
        Matcher styleMatcher = stylePattern.matcher(source);
        if(styleMatcher.find()) {
            // 去掉注释,换行
            String result= multiCommentPattern.matcher(styleMatcher.group(1)).replaceAll("");
            result = trimPattern.matcher(result).replaceAll("");
            result = trimPattern2.matcher(result).replaceAll("");
            return (new StringBuilder(source.substring(0, styleMatcher.start(1))).append(result).append(source.substring(styleMatcher.end(1)))).toString();
        } else {
            return source;
        }
    }
     
    private static String compressJspJs(String source){
        String result = source;
        // 因注释符合有可能出现在字符串中,所以要先把字符串中的特殊符好去掉
        Matcher stringMatcher = stringPattern.matcher(result);
        while(stringMatcher.find()){
            String tmpStr = stringMatcher.group(0);
             
            if(tmpStr.indexOf("//") != -1 || tmpStr.indexOf("/*") != -1 || tmpStr.indexOf("*/") != -1){
                String blockStr = tmpStr.replaceAll("//", tempSingleCommentBlock).replaceAll("/\\*", tempMulitCommentBlock1)
                                .replaceAll("\\*/", tempMulitCommentBlock2);
                result = result.replace(tmpStr, blockStr);
            }
        }
        // 去掉注释
        result = signleCommentPattern.matcher(result).replaceAll("");
        result = multiCommentPattern.matcher(result).replaceAll("");
        result = trimPattern2.matcher(result).replaceAll("");
        result = trimPattern.matcher(result).replaceAll(" ");
        // 恢复替换掉的字符串
        result = result.replaceAll(tempSingleCommentBlock, "//").replaceAll(tempMulitCommentBlock1, "/*")
                .replaceAll(tempMulitCommentBlock2, "*/");
         
        return result;
    }
}

使用注意事项

 

      使用了上面方法后,再运行程序,是不是发现每个页面查看源代码的时候都变成1行啦,还不错吧,但是在使用的时候还是要注意一些问题:

           1. 嵌入js本来想调用yuicompressor来压缩,yuicompressor压缩JS前,会先编译js是否合法,因我们嵌入的js中可能很多会用到一些服务器端代码,比如 var now = <%=DateTime.now %> ,这样的代码会编译不通过,所以无法使用yuicompressor。

              最后只能自己写压缩JS代码,自己写的比较粗燥,所以有个问题还解决,就是如果开发人员在一句js代码后面没有加分号的话,压缩成1行就很有可能出问题。所以使用这个需要保证每条语句结束后都必须带分号。

 

           2. 因为是在程序启动的时候压缩所有jsp(aspx),所以如果是用户请求的时候动态产生的html就无法压缩。


    有需要请查看:高性能WEB开发系列

[作者]:BearRui(AK-47)
[博客]: http://www.cnblogs.com/BearsTaR/
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Ubb转Html的java版函数
java 正则表达式的使用
大容量XML文件解析辅助--xml批量分解 - OO - Java - JavaEye论坛
炸了!3年图片都没了
Java爬虫的一些总结和心得
Java正则表达式验证格式(邮箱、电话号码)
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服