Whenever I need to check if a file contains invalid utf8 chars:
isutf8 file.txt
(in ubuntu, you need to install the `moreutils` package)
Then, to rremove invalid chars, use iconv:
iconv -f utf-8 -t utf-8 -c nonutf-8.txt > utf8.txt
-c stands for remove `invalid chars`
-f 'from' utf8
-t 'to' utf8
No comments:
Post a Comment