by shigemk2

当面は技術的なことしか書かない

APFS ファイル名文字コード

MacOS High Sierra以降では、ファイルシステムはAPFSを使っている。

What's New in Apple File System - WWDC 2017 - Videos - Apple Developer

Frequently Asked Questions

FAQのページがいつなくなるかわからないので引用する。

How does Apple File System handle filenames?

APFS accepts only valid UTF-8 encoded filenames for creation, and preserves both case and normalization of the filename on disk in all variants. APFS, like HFS+, is case-sensitive on iOS and is available in case-sensitive and case-insensitive variants on macOS, with case-insensitive being the default.

In macOS High Sierra, APFS is normalization-insensitive in both the case-insensitive and case-sensitive variants, using a hash-based native normalization scheme. In iOS 11, APFS is normalization-insensitive as well, using either a native normalization scheme (erase restores only) or runtime normalization scheme (upgrades from previous versions). Runtime normalization will also be available in iOS 10.3.3 and macOS Sierra 10.12.6. Being normalization-insensitive ensures that normalization variants of a filename cannot be created in the same directory, and that a filename can be found with any of its normalization variants. This means that you don’t need to do any additional work to ensure correct normalization behavior in these versions of macOS and iOS.

Some differences between how APFS and HFS+ handle filenames include the following:

APFS implements normalization and case insensitivity according to the Unicode 9.0 standard; this enables APFS to support a wider range of languages for these features than HFS+, which is based on Unicode 3.2.
APFS preserves the normalization of the filename and uses hashes of the normalized form of the filename to provide normalization insensitivity, whereas HFS+ stores the normalized form of the filename on disk to provide normalization insensitivity.
Calling readdir(2) on a directory in APFS returns filenames in hash order, whereas HFS+ returns filenames in lexicographical order.
While both filesystems expect filenames to be encoded in UTF-8, APFS stores filenames on disk in UTF-8 encoding, whereas HFS stores filenames on disk in UTF-16 encoding.
APFS doesn’t allow files to be created with filenames that contain unassigned codepoints in the Unicode 9.0 standard, whereas HFS+ does.
In iOS 10.3 and in the case-sensitive variant of the developer preview of APFS in macOS Sierra, APFS is normalization-sensitive. For these versions, developers should be aware of behavior differences between normalization sensitivity and insensitivity that may arise when a device upgrades macOS or iOS and migrates the filesystem from HFS+ to APFS. For example, attempting to create a file using one normalization behavior and then opening that file using another normalization behavior may result in ENOENT, or “File Not Found” errors. Additionally, storing filenames externally, such as in the defaults database, Core Data, or iCloud storage may cause problems if the normalization scheme of the filename being stored is different from what exists on disk.

To avoid introducing bugs in your code with mismatched Unicode normalization (for iOS 10.3.0, 10.3.1 and 10.3.2) in filenames, do the following:

Use high-level Foundation APIs such as NSFileManager and NSURL when interacting with the filesystem.
Use the fileSystemRepresentation property of NSURL objects when creating and opening files with lower-level filesystem APIs such as POSIX open(2), or when storing filenames externally from the filesystem.

WWDC2017のスライド資料と合わせて読むと、APFSでは正規化されていないUTF-8としてファイル名が保存される。そして、MacOS High Sierra上で日本語の含まれた名前のファイルを作成し、Windowsで見ようとすると、文字化けする。

ソースコードを完全に追っかけられてはいないが、以下のようにSHIFT_JIS文字列のファイル名のファイルを作成しようとすると失敗するし

$ touch $(echo "かばん" | nkf -Ls).csv # 失敗
touch: cannot touch ''$'\202\251\202\316\202\361''.csv': Illegal byte sequence
$ touch $(echo "かばん").csv # 成功

SHIFT_JISな文字列にファイル名をリネームしても、その場では成功するが、結局ファイル名のリネームはうまくいってない

$ convmv -t shift_jis -f utf-8 *
....
No changes to your files done. Would have converted X files in 0 seconds.

なお、サーバーからSHIFT_JISなファイル名のファイルをダウンロードしてFinderで開いてみたりしてもファイル名の文字コードは変わらない