在日常Linux操作常常需要对一些文件或屏幕数次中重复的字段进行分组统计。另外分组统计也是常考的面试题之一。
实现的方法非常简单,核心命令为:sort | uniq --c | sort -rn
。
uniq -c
或uniq --count
用于统计重复的行sort -n
将字符串数字按数字进行比较,-r
则从大到小排列Copyhello hi hello world world my word hi hello
参考答案
Copysort demo.txt | uniq -c | sort -rn | head -3
执行结果如下
Copy3 hello 2 world 2 hi
Copy201.158.69.116 - - [03/Jan/2013:21:17:20 -0600] fwf[-] tip[-] 127.0.0.1:9000 0.007 0.007 MX pythontab.com GET /html/test.html HTTP/1.1 "200" 2426 "http://a.com" "es-ES,es;q=0.8" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11" 187.171.69.177 - - [03/Jan/2013:21:17:20 -0600] fwf[-] tip[-] 127.0.0.1:9000 0.006 0.006 MX pythontab.com GET /html/test2.html HTTP/1.1 "200" 2426 "http://a.com" "es-ES,es;q=0.8" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11"
参考答案
Copycat access.log | awk '{print $14}'|sort|uniq -c | sort -rn | head -10
或
Copysort access.log -k 14 | uniq -c | sort -rn | head -10| awk '{print $1,$15}'
sort
-k
:指定列-t
:指定分隔符
执行结果如下
Copy1 /html/test2.html 1 /html/test.html
netstat -nat | grep 80
显示如下:
Copytcp4 0 0 192.168.0.101.57581 80.254.145.118.80 SYN_SENT tcp4 0 0 192.168.0.101.57572 111.161.64.23.80 ESTABLISHED tcp4 0 0 192.168.0.101.57565 60.29.242.162.80 ESTABLISHED tcp4 0 0 192.168.0.101.57513 175.174.56.212.80 CLOSE_WAIT tcp6 0 0 fe80::18e3:52d8:.56850 fe80::1cc0:75be:.62835 ESTABLISHED tcp4 0 0 192.168.0.101.56178 175.174.56.212.80 CLOSE_WAIT
参考答案
Copynetstat -nat | grep 80 | awk '{print $6}' | sort | uniq -c | sort -rn
执行结果如下
Copy27 ESTABLISHED 10 LISTEN 2 CLOSE_WAIT 1 ce382f50fea83507 1 ce382f50fea80df7 1 SYN_SENT 1
联系客服