Thursday, December 4, 2014

Quicktip : Remove invalid utf8 charactes in a file

Whenever I need to check if a file contains invalid utf8 chars:

isutf8 file.txt

(in ubuntu, you need to install the `moreutils` package)

Then, to rremove invalid chars, use iconv:

iconv -f utf-8 -t utf-8 -c nonutf-8.txt > utf8.txt

-c stands for remove `invalid chars`
-f 'from' utf8
-t 'to' utf8 

No comments:

Post a Comment