Zero's Blog - 代码匠

R 时间戳转换为日期

2022-04-24T15:30:27+00:00

如果时间戳（date-time）都是 UTC 时区的话，可以使用 base 的 as.Date 函数。

如果时间戳时区不一致或非 UTC 时区，用 lubridate 包的 as_date 可以获得在对应时区下的日期，但性能比 as.Date 稍差。

展开 Tiddle 中的 JSON 字段

2022-04-20T15:47:20+00:00

tidyr 本身就提供了 unnest() 系列函数，可以用于展开嵌套的数据结构。配合 fromJSON() 使用就可以实现 JSON 字段的展开。

library(tidyverse)
library(jsonlite)

spread_all <- function(data, cols) {
  data %>%
    mutate(across(all_of(cols), function(x)
      map(x, fromJSON))) %>%
    unnest_wider(all_of(cols))
}

# 用法
df %>%
  spread_all(c("x", "y"))

为了加速 JSON 的解析，我们可以引入 furrr 包来实现并发解析。

library(furrr)
# make future_map parallel
plan(multisession)

spread_all <- function(data, cols) {
  data %>%
    mutate(across(all_of(cols), function(x)
      future_map(x, fromJSON))) %>%
    unnest_wider(all_of(cols))
}

R Markdown 跳过中间变量缓存结果

2022-04-17T04:04:00+00:00

当要从一个很大的数据集中取子集进行分析，且不想生成中间状态的 csv 文件只用于快速生成报告时，你就需要用到 knitr 的缓存功能了。

knitr 默认的缓存就是 lazy 的，所以如果后面的代码只使用了这个子集的数据，直接在代码块上加上 cache = TRUE 配置就可以避免每次渲染都加载一次这个大数据集。如果这个代码块的数据和输出格式没有相关性的话，建议还加上 cache.path = "cache/" 这个配置，可以在切换输出格式的情况下也共用缓存。缓存的自动刷新条件可以用 cache.extra 控制，具体用法可参考 rmarkdown-cookbook 的文档。

然而，在数据集非常大的情况下，你可能会碰到 long vectors not supported yet 这个报错。这种情况下 knitr 默认的缓存机制已经不能满足需求了，我们需要更灵活的可以定制化的缓存。此时我们可以用 xfun::cache_rds() 来实现：

```{r res, cache.path = "cache/"}
res <- xfun::cache_rds({
  cars <- reda_csv(files)
  cars %>%
    filter(model = y)
}, name = "res.rds", hash = list(files, y))
```

xfun::cache_rds() 的第一个参数是要缓存的表达式，在首次执行完成后结果会被缓存并赋值给 res，再下一次执行时就可以直接从缓存中加载变量赋值给 res 了。

name 参数用于在非 knitr 环境（例如在 RStudio 中跑代码块）下和 knitr 环境复用同一个缓存文件，文件名需要和代码块的标签保持一致。注意还需要保证两种环境下 dir 参数一致，在 knitr 环境下 dir 参数默认值为 cache.path 的值，非 knitr 环境下默认为 cache/。这也是使用 xfun::cache_rds() 独有的优势，可以在 RStudio 的 notebook 环境下复用 knitr 的缓存。

hash 参数会影响缓存的文件名，所以一旦 hash 的内容发生变化就会重新计算，相当于 cache.extra。

这样就实现了跳过中间生成的大数据集 cars 直接缓存最终结果，非常方便。

Update (2022-04-22): 截至目前 RStudio 仍未支持在 R Notebook 环境下复用 knitr 缓存，见 rstudio/rstudio#9291。

Tidyverse 读取多个同格式 csv 文件

2022-04-17T03:44:00+00:00

如何将多个同样格式的 csv 文件合并读取到一个 tibble 里呢？很自然的会想到先用 read_csv() 分别读取每个文件，再用 reduce(rbind) 聚合到一起。这样确实能用，但对于大数据集来说性能很差。

readr 从 2.0.0 开始原生支持同时读取多个文件的功能，只需要把文件名字符串向量里传给 file 参数就行。实测在数据量很大的时候和 rbind 相比可以显著减少读取时间。这个功能在 readr 包的文档中没有描述，导致我走了不少弯路才发现 readr 原生就支持多文件读取。

另外 vroom 从 1.0.0 开始就原生支持了这个功能，用法一样不过 vroom 对于文件中的字符数据是懒加载的，在某些场景下可以提升性能。具体能否提升性能要看使用方式了，可以用 system.time({}) 实测一下。

microbenchmark 笔记

2022-04-03T05:47:00+00:00

背景

Fluent Bit 在代码中有将时间转化为浮点数（单位秒）再和 0.0 对比判断时间是否为 flb_time_zero 的初始值。在这个场景下用整型运算应该更快，但在提交 PR 前还是需要用数据说话，所以先来 benchmark 一下。

环境准备

这次我使用了 Google 维护的 benchmark 库，使用 Conan 管理依赖并用 CMake 构建。先安装一下：

sudo port install pipx cmake
pipx install conan

然后准备 conanfile.txt 和 CMakeLists.txt，可以参考我的仓库 benchmarks。

benchmark 单元编写

static void BM_Nano(benchmark::State& state) {
    struct flb_time out_time = {0};
    flb_time_zero(&out_time);
    bool flag;
    for (auto _ : state) {
        flag = flb_time_to_nanosec(&out_time) == 0L;
    }
}

只要把需要测试的代码放到 for 循环内就会自动测试这段代码的性能。但测试结果出来后有点奇怪：

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
BM_Nano         0.249 ns        0.249 ns   1000000000
BM_Sec          0.248 ns        0.248 ns   1000000000

运行了几次时间上都差不多，难道现代 CPU 的浮点性能这么强了？感觉有点不对劲，试着加上一个 BM_None 测试做对比，代码里直接赋值 true 不做任何运算：

for (auto _ : state) {
    flag = true;
}

测试出来结果和其他两个都相同，数据肯定是有问题了。

排查原因

打开 Compiler Explorer，添加 benchmark 库，一看编译出的汇编就明白了。

整个 for 循环都被优化掉了。。。

解决方案

在网上搜索如何阻止编译器优化某部分代码，发现最简单且相对通用的方式是使用 volatile 关键字，那给 flag 变量加上 volatile 试试。

汇编变长了，但还是没生成浮点运算。看来还不够 volatile，再加到函数参数上试试：

总算看到要 benchmark 的函数被上色了！再跑一遍 benchmark 试试？

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
BM_None         0.242 ns        0.242 ns   1000000000
BM_Nano         0.372 ns        0.371 ns   1000000000
BM_Sec           1.11 ns         1.11 ns    623846997

果然如我所料，浮点运算的性能还是要差一些的，于是提交 PR #4535，搞定收工。

Google "Where to Break" Style Guide

2021-05-05T11:27:00+00:00

Java

https://google.github.io/styleguide/javaguide.html#s4.5.1-line-wrapping-where-to-break

When a line is broken at a non-assignment operator the break comes before the symbol. (Note that this is not the same practice used in Google style for other languages, such as C++ and JavaScript.)
This also applies to the following "operator-like" symbols:
the dot separator (.)
the two colons of a method reference (::)
an ampersand in a type bound ()
a pipe in a catch block (catch (FooException | BarException e)).
When a line is broken at an assignment operator the break typically comes after the symbol, but either way is acceptable.
This also applies to the "assignment-operator-like" colon in an enhanced for ("foreach") statement.
A method or constructor name stays attached to the open parenthesis (() that follows it.
A comma (,) stays attached to the token that precedes it.
A line is never broken adjacent to the arrow in a lambda, except that a break may come immediately after the arrow if the body of the lambda consists of a single unbraced expression. Examples:

MyLambda lambda =
    (String label, Long value, Object obj) -> {
        ...
    };

Predicate predicate = str ->
    longExpressionInvolving(str);

Kotlin

https://developer.android.com/kotlin/style-guide#where_to_break

The prime directive of line-wrapping is: prefer to break at a higher syntactic level. Also:
When a line is broken at an operator or infix function name, the break comes after the operator or infix function name.
When a line is broken at the following "operator-like" symbols, the break comes before the symbol:
The dot separator (., ?.).
The two colons of a member reference (::).
A method or constructor name stays attached to the open parenthesis (() that follows it.
A comma (,) stays attached to the token that precedes it.
A lambda arrow (->) stays attached to the argument list that precedes it.

C++

https://google.github.io/styleguide/cppguide.html#Boolean_Expressions

When you have a boolean expression that is longer than the standard line length, be consistent in how you break up the lines.

if (this_one_thing > this_other_thing &&
    a_third_thing == a_fourth_thing &&
    yet_another && last_one) {
  ...
}

Note that when the code wraps in this example, both of the && logical AND operators are at the end of the line. This is more common in Google code, though wrapping all operators at the beginning of the line is also allowed. Feel free to insert extra parentheses judiciously because they can be very helpful in increasing readability when used appropriately, but be careful about overuse. Also note that you should always use the punctuation operators, such as && and ~, rather than the word operators, such as and and compl.

Canvas.clipPath 崩溃

2021-05-05T08:44:00+00:00

旧版本 Android 的模拟器可能出现调用 Canvas.clipPath() 导致 app 崩溃的现象。这是因为模拟器在过低的 API 级别下启用了 Canvas 的硬件加速¹，在对应 API 级别下强制关闭硬件加速就好了。

E/AndroidRuntime: FATAL EXCEPTION: main
    java.lang.UnsupportedOperationException
        at android.view.GLES20Canvas.clipPath(GLES20Canvas.java:408)

If your application is affected by any of these missing features or limitations, you can turn off hardware acceleration for just the affected portion of your application by calling setLayerType(View.LAYER_TYPE_SOFTWARE, null). This way, you can still take advantage of hardware acceleration everywhere else. See Control hardware acceleration for more information on how to enable and disable hardware acceleration at different levels in your application.

参考资料

https://stackoverflow.com/questions/8771219/android-4-0-compatibility-issues-with-canvas-clippath

https://developer.android.com/guide/topics/graphics/hardware-accel#unsupported ↩

Desugaring 与 Multidex

2021-05-05T06:44:00+00:00

给旧设备写 app 时遇到一个问题，在重写的 Application 类里使用 Java 8 特性会在运行时报错找不到类，而在其他地方使用都没问题。

E/AndroidRuntime: FATAL EXCEPTION: main
    java.lang.NoClassDefFoundError: $r8$backportedMethods$utility$Objects$2$equals
        at MyApplication.onCreate(MyApplication.java:xx)

排查了一圈才发现是 multidex 惹的祸，由于执行 attachBaseContext() 需要先加载 MyApplication 类，此时 multidex 还没来得及安装，所以 MyApplication 里不能调用主 DEX 文件中不存在的类。

public class MyApplication extends SomeOtherApplication {
  @Override
  protected void attachBaseContext(Context base) {
     super.attachBaseContext(base);
     MultiDex.install(this);
  }
}

参考资料

https://developer.android.com/studio/write/java8-support#library-desugaring
https://developer.android.com/studio/build/multidex

Typecho 默认主题配置代码高亮

2021-05-05T05:36:00+00:00

先在 Prism 的下载页面选好想要的配置，复制到主题目录下。然后在主题的 header.php 里添加以下两行即可。