How to convert from character positions to byte postions in UTF

2018-06-28 01:46:40

I have UTF-8 encoded text file. I can read it by chars. Each char can be either one byte or multibyte. How can I know where one byte was readen and whet it was readen more than one byte?

Count the bytes while reading the char s.

For each char c :

if(c<128)
  bytesCount++;
else if (c<2048)
  bytesCount+=2;
else
  bytesCount+=3;

See also encodeing definition wikipedia URF8

链接地址: http://www.djcxy.com/p/78434.html

上一篇: 将字节数组转换为给定编码的字符串

下一篇: 如何将字符位置转换为UTF中的字节位置