[leetcode-shell]192-统计词频

博主：编程我只用CPP
发布时间：2020 年 02 月 21 日
38 次浏览
暂无评论
555字数
分类： Linux运维

来源：力扣（LeetCode）

链接：https://leetcode-cn.com/problems/word-frequency

著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

一、题目描述

写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见，你可以假设：

words.txt只包括小写字母和 ' ' 。
每个单词只由小写字母组成。
单词间由一个或多个空格字符分隔。

示例：

假设 words.txt 内容如下：

the day is sunny the the
the sunny is is

你的脚本应当输出（以词频降序排列）：

the 4
is 3
sunny 2
day 1

说明：

不要担心词频相同的单词的排序问题，每个单词出现的频率都是唯一的。
你可以使用一行 Unix pipes 实现吗？

二、题解

2.1 使用awk

通过NF变量遍历所有字段，存到一个哈希表（数组）中，然后打印出所有的key-value组合，最后通过sort排序。

awk '{for (i = 1; i <= NF; i++) {m[$i]++;}} END {for (i in m) {print i, m[i]}}' words.txt | sort -nr -k 2

2.2 使用xargs

通过xargs的-n参数打印出所有的字段，然后使用uniq和sort对字段排序：

cat file.txt | xargs -n 1 | sort | uniq -c | sort -nr -k 2 | awk '{print $2" "$1}'

uniq的-c参数是统计词频

最后修改：2020 年 02 月 21 日

© 允许规范转载

喜欢就给我点赞吧

此处评论已关闭

[leetcode-shell]192-统计词频

编程我只用CPP • 2020 年 02 月 21 日

<p>来源：力扣（LeetCode）</p><p>链接：<span class="external-link"><a class="no-external-link" href="https://leetcode-cn.com/problems/word-frequency" target="_blank"><i data-feather="external-link"></i>https://leetcode-cn.com/problems/word-frequency</a></span></p><p>著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。</p><h2>一、题目描述</h2><p>写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。</p><p>为了简单起见，你可以假设：</p><ul><li>words.txt只包括小写字母和 '  ' 。</li><li>每个单词只由小写字母组成。</li><li>单词间由一个或多个空格字符分隔。</li></ul><p><strong>示例：</strong></p><p>假设 words.txt 内容如下：</p><pre><code>the day is sunny the the
the sunny is is</code></pre><p>你的脚本应当输出（以词频降序排列）：</p><pre><code>the 4
is 3
sunny 2
day 1</code></pre><p><strong>说明：</strong></p><ul><li>不要担心词频相同的单词的排序问题，每个单词出现的频率都是唯一的。</li><li>你可以使用一行 Unix pipes 实现吗？</li></ul><h2>二、题解</h2><h3>2.1 使用awk</h3><p>通过<code>NF</code>变量遍历所有字段，存到一个哈希表（数组）中，然后打印出所有的key-value组合，最后通过sort排序。</p><pre><code class="lang-shell">awk &#039;{for (i = 1; i &lt;= NF; i++) {m[$i]++;}} END {for (i in m) {print i, m[i]}}&#039; words.txt | sort -nr -k 2</code></pre><p><img src="https://i.maqian.xin/2020/02/21/image6c552cb516ad2b7c.png" alt="" title="" style=""></p><h3>2.2 使用xargs</h3><p>通过xargs的<code>-n</code>参数打印出所有的字段，然后使用<code>uniq</code>和<code>sort</code>对字段排序：</p><pre><code class="lang-shell">cat file.txt | xargs -n 1 | sort | uniq -c | sort -nr -k 2 | awk &#039;{print $2&quot; &quot;$1}&#039;</code></pre><blockquote>uniq的<code>-c</code>参数是统计词频</blockquote>