I need to convert a string from Windows-1251 to UTF-8.
I tried to do this with iconv, but all I get is something like this:
пїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅ пїЅпїЅпїЅпїЅпїЅпїЅпїЅпїЅ
var iconv = new Iconv('windows-1251', 'utf-8')
title = iconv.convert(title).toString('utf-8')
Pang
9,344146 gold badges85 silver badges121 bronze badges
asked Jan 1, 2012 at 13:52
1
Here is working solution to your problem. You have to use Buffer and convert your string to binary first.
const Iconv = require('iconv').Iconv;
request({
uri: website_url,
method: 'GET',
encoding: 'binary'
}, function (error, response, body) {
const body = new Buffer(body, 'binary');
conv = Iconv('windows-1251', 'utf8');
body = conv.convert(body).toString();
});
Ahmet Şimşek
1,3111 gold badge14 silver badges23 bronze badges
answered Jan 29, 2012 at 0:20
Alex KolarskiAlex Kolarski
3,1851 gold badge25 silver badges35 bronze badges
1
If you’re reading from file, you could use something like that:
const iconv = require('iconv-lite');
const fs = require("fs");
fs.readFile("filename.xml", null, (err, data) => {
if(err) {
console.log(err)
return
}
const encodedData = iconv.encode(iconv.decode(data, 'win1251'), 'utf8')
fs.writeFile("result_filename.xml", encodedData, () => { })
})
answered Jul 14, 2021 at 18:27
I use Node version 16 and code bellow works fine. You don’t need to use Buffer node will write warnings. You need to install iconv
package before.
fs = require('fs')
fs.readFile('printed_document.txt', function (err,data) {
if (err) {
return console.log(err);
}
console.log(require('iconv').Iconv('windows-1251', 'utf-8').convert(data).toString())
})
answered Oct 13, 2022 at 13:44
Orlov ConstOrlov Const
3123 silver badges10 bronze badges
iconv-lite: Pure JS character encoding conversion
- No need for native code compilation. Quick to install, works on Windows, Web, and in sandboxed environments.
- Used in popular projects like Express.js (body_parser),
Grunt, Nodemailer, Yeoman and others. - Faster than node-iconv (see below for performance comparison).
- Intuitive encode/decode API, including Streaming support.
- In-browser usage via browserify or webpack (~180kb gzip compressed with Buffer shim included).
- Typescript type definition file included.
- React Native is supported (need to install
stream
module to enable Streaming API). - License: MIT.
Usage
Basic API
var iconv = require('iconv-lite'); // Convert from an encoded buffer to a js string. str = iconv.decode(Buffer.from([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251'); // Convert from a js string to an encoded buffer. buf = iconv.encode("Sample input string", 'win1251'); // Check if encoding is supported iconv.encodingExists("us-ascii")
Streaming API
// Decode stream (from binary data stream to js strings) http.createServer(function(req, res) { var converterStream = iconv.decodeStream('win1251'); req.pipe(converterStream); converterStream.on('data', function(str) { console.log(str); // Do something with decoded strings, chunk-by-chunk. }); }); // Convert encoding streaming example fs.createReadStream('file-in-win1251.txt') .pipe(iconv.decodeStream('win1251')) .pipe(iconv.encodeStream('ucs2')) .pipe(fs.createWriteStream('file-in-ucs2.txt')); // Sugar: all encode/decode streams have .collect(cb) method to accumulate data. http.createServer(function(req, res) { req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) { assert(typeof body == 'string'); console.log(body); // full request body string }); });
Supported encodings
- All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
- Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap, utf32, utf32-le, and utf32-be.
- All widespread singlebyte encodings: Windows 125x family, ISO-8859 family,
IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library.
Aliases like ‘latin1’, ‘us-ascii’ also supported. - All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2312, GBK, GB18030, Big5, Shift_JIS, EUC-JP.
See all supported encodings on wiki.
Most singlebyte encodings are generated automatically from node-iconv. Thank you Ben Noordhuis and libiconv authors!
Multibyte encodings are generated from Unicode.org mappings and WHATWG Encoding Standard mappings. Thank you, respective authors!
Encoding/decoding speed
Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0).
Note: your results may vary, so please always check on your hardware.
operation iconv@2.1.4 iconv-lite@0.4.7
----------------------------------------------------------
encode('win1251') ~96 Mb/s ~320 Mb/s
decode('win1251') ~95 Mb/s ~246 Mb/s
BOM handling
- Decoding: BOM is stripped by default, unless overridden by passing
stripBOM: false
in options
(f.ex.iconv.decode(buf, enc, {stripBOM: false})
).
A callback might also be given as astripBOM
parameter — it’ll be called if BOM character was actually found. - If you want to detect UTF-8 BOM when decoding other encodings, use node-autodetect-decoder-stream module.
- Encoding: No BOM added, unless overridden by
addBOM: true
option.
UTF-16 Encodings
This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be
smart about endianness in the following ways:
- Decoding: uses BOM and ‘spaces heuristic’ to determine input endianness. Default is UTF-16LE, but can be
overridden withdefaultEncoding: 'utf-16be'
option. Strips BOM unlessstripBOM: false
. - Encoding: uses UTF-16LE and writes BOM by default. Use
addBOM: false
to override.
UTF-32 Encodings
This library supports UTF-32LE, UTF-32BE and UTF-32 encodings. Like the UTF-16 encoding above, UTF-32 defaults to UTF-32LE, but uses BOM and ‘spaces heuristics’ to determine input endianness.
- The default of UTF-32LE can be overridden with the
defaultEncoding: 'utf-32be'
option. Strips BOM unlessstripBOM: false
. - Encoding: uses UTF-32LE and writes BOM by default. Use
addBOM: false
to override. (defaultEncoding: 'utf-32be'
can also be used here to change encoding.)
Other notes
When decoding, be sure to supply a Buffer to decode() method, otherwise bad things usually happen.
Untranslatable characters are set to � or ?. No transliteration is currently supported.
Node versions 0.10.31 and 0.11.13 are buggy, don’t use them (see #65, #77).
Testing
$ git clone git@github.com:ashtuchkin/iconv-lite.git $ cd iconv-lite $ npm install $ npm test $ # To view performance: $ node test/performance.js $ # To view test coverage: $ npm run coverage $ open coverage/lcov-report/index.html
iconv-lite: Pure JS character encoding conversion
- No need for native code compilation. Quick to install, works on Windows and in sandboxed environments like Cloud9.
- Used in popular projects like Express.js (body_parser),
Grunt, Nodemailer, Yeoman and others. - Faster than node-iconv (see below for performance comparison).
- Intuitive encode/decode API, including Streaming support.
- In-browser usage via browserify or webpack (~180kb gzip compressed with Buffer shim included).
- Typescript type definition file included.
- React Native is supported (need to install
stream
module to enable Streaming API). - License: MIT.
Usage
Basic API
var iconv = require('iconv-lite');
// Convert from an encoded buffer to a js string.
str = iconv.decode(Buffer.from([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251');
// Convert from a js string to an encoded buffer.
buf = iconv.encode("Sample input string", 'win1251');
// Check if encoding is supported
iconv.encodingExists("us-ascii")
Streaming API
// Decode stream (from binary data stream to js strings)
http.createServer(function(req, res) {
var converterStream = iconv.decodeStream('win1251');
req.pipe(converterStream);
converterStream.on('data', function(str) {
console.log(str); // Do something with decoded strings, chunk-by-chunk.
});
});
// Convert encoding streaming example
fs.createReadStream('file-in-win1251.txt')
.pipe(iconv.decodeStream('win1251'))
.pipe(iconv.encodeStream('ucs2'))
.pipe(fs.createWriteStream('file-in-ucs2.txt'));
// Sugar: all encode/decode streams have .collect(cb) method to accumulate data.
http.createServer(function(req, res) {
req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) {
assert(typeof body == 'string');
console.log(body); // full request body string
});
});
Supported encodings
- All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
- Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap, utf32, utf32-le, and utf32-be.
- All widespread singlebyte encodings: Windows 125x family, ISO-8859 family,
IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library.
Aliases like ‘latin1’, ‘us-ascii’ also supported. - All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2312, GBK, GB18030, Big5, Shift_JIS, EUC-JP.
See all supported encodings on wiki.
Most singlebyte encodings are generated automatically from node-iconv. Thank you Ben Noordhuis and libiconv authors!
Multibyte encodings are generated from Unicode.org mappings and WHATWG Encoding Standard mappings. Thank you, respective authors!
Encoding/decoding speed
Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0).
Note: your results may vary, so please always check on your hardware.
operation iconv@2.1.4 iconv-lite@0.4.7
----------------------------------------------------------
encode('win1251') ~96 Mb/s ~320 Mb/s
decode('win1251') ~95 Mb/s ~246 Mb/s
BOM handling
- Decoding: BOM is stripped by default, unless overridden by passing
stripBOM: false
in options
(f.ex.iconv.decode(buf, enc, {stripBOM: false})
).
A callback might also be given as astripBOM
parameter — it’ll be called if BOM character was actually found. - If you want to detect UTF-8 BOM when decoding other encodings, use node-autodetect-decoder-stream module.
- Encoding: No BOM added, unless overridden by
addBOM: true
option.
UTF-16 Encodings
This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be
smart about endianness in the following ways:
- Decoding: uses BOM and ‘spaces heuristic’ to determine input endianness. Default is UTF-16LE, but can be
overridden withdefaultEncoding: 'utf-16be'
option. Strips BOM unlessstripBOM: false
. - Encoding: uses UTF-16LE and writes BOM by default. Use
addBOM: false
to override.
UTF-32 Encodings
This library supports UTF-32LE, UTF-32BE and UTF-32 encodings. Like the UTF-16 encoding above, UTF-32 defaults to UTF-32LE, but uses BOM and ‘spaces heuristics’ to determine input endianness.
- The default of UTF-32LE can be overridden with the
defaultEncoding: 'utf-32be'
option. Strips BOM unlessstripBOM: false
. - Encoding: uses UTF-32LE and writes BOM by default. Use
addBOM: false
to override. (defaultEncoding: 'utf-32be'
can also be used here to change encoding.)
Other notes
When decoding, be sure to supply a Buffer to decode() method, otherwise bad things usually happen.
Untranslatable characters are set to � or ?. No transliteration is currently supported.
Node versions 0.10.31 and 0.11.13 are buggy, don’t use them (see #65, #77).
Testing
$ git clone git@github.com:ashtuchkin/iconv-lite.git
$ cd iconv-lite
$ npm install
$ npm test
$ # To view performance:
$ node test/performance.js
$ # To view test coverage:
$ npm run coverage
$ open coverage/lcov-report/index.html
Halacky 2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
||||||||
1 |
||||||||
10.05.2020, 10:04. Показов 2248. Ответов 16 Метки нет (Все метки)
Всем привет.
И в обратную сторону вот так
Как вы могли догадаться, ничего не работает. Как будет правильней реализовать данную задачу?
__________________
0 |
1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
|
10.05.2020, 12:20 |
2 |
Halacky, сначала — decode, потом — encode.
0 |
2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
|
10.05.2020, 14:27 [ТС] |
3 |
Halacky, сначала — decode, потом — encode. А как мне получить buf?
0 |
shvyrevvg 1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
||||
10.05.2020, 14:33 |
4 |
|||
А как мне получить buf? Halacky, как-то так
0 |
Halacky 2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
||||
10.05.2020, 14:58 [ТС] |
5 |
|||
Halacky, как-то так К сожалению проблема не решилась.
И при вызове обратного метода, который по моей логике должен был отобразить русские символы нормально. Не как задумывалось
0 |
shvyrevvg 1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
||||||||
10.05.2020, 15:07 |
6 |
|||||||
Halacky, смотрите, у Вас изначально есть строка в какой-то кодировке(набор байт), значит нужно сначала декодировать набор байт, а затем закодировать строку в новой кодировке.
Но так как в ноде мы работаем со строками в utf-8, то можно просто декодировать
0 |
Halacky 2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
||||
11.05.2020, 05:33 [ТС] |
7 |
|||
смотрите, у Вас изначально есть строка в какой-то кодировке(набор байт), значит нужно сначала декодировать набор байт, а затем закодировать строку в новой кодировке. День новый-проблемы старые.
При вызове любого из этих методов, результат одинаково неправильный. Русские символы отображаются некоректно. Хотя обе кодировки «работают» с русскими символами. Может в процессе кодированиядекодирования, теряется часть байтов. хм
0 |
1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
|
11.05.2020, 06:37 |
8 |
Halacky, откуда text приходит? Если его вывести без преобразования как он будет выглядеть(скиньте иероглифы)?
0 |
2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
|
11.05.2020, 07:25 [ТС] |
9 |
При вводе сообщения я не трогаю кодировку вообще. Миниатюры
0 |
1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
|
11.05.2020, 07:44 |
10 |
Halacky, так у Вас изначально текст в utf-8, а в ConvertItem Вы его пытаетесь декодировать как win1251
0 |
2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
|
11.05.2020, 08:18 [ТС] |
11 |
Я получаю код изначально в utf-8, мне нужно перевести этот же текст в win1251
1 |
1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
|
11.05.2020, 08:39 |
12 |
Halacky, понял, Вы строки гоняете, туда сюда. Тут нужно подумать, может часть будет теряться.
0 |
2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
|
11.05.2020, 08:43 [ТС] |
13 |
В репозитории у iconv-lite нашел упоминание о том, что могут возникнуть проблемы при перегоне из кодировки в кодировку
0 |
1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
|
11.05.2020, 09:03 |
14 |
Halacky, да, получается нельзя туда-сюда конвертить строки(ну это было ожидаемо). Нужно хранить исходную строку. Т.е. первый раз когда вводите строку в utf-8 сохраните ее куда-нибудь. И только ее конвертите в нужную кодировку. Добавлено через 5 минут
0 |
Halacky 2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
||||||||
11.05.2020, 09:11 [ТС] |
15 |
|||||||
Ну вот такое у нас задание дали, а главное методичку скинули, а там про то как сайт верстать
И вот обратно
0 |
1786 / 1036 / 445 Регистрация: 12.05.2016 Сообщений: 2,550 |
|
11.05.2020, 09:18 |
16 |
Я сделал скрипт для кодированиядекодирования из txt документа и все работает Halacky, а в этом и прикол, что на файлах все будет нормально работать, так как нет лишних преобразований Может Вам нужно тестовые файлы конвертить, а не в текст в поле?
0 |
2 / 2 / 1 Регистрация: 24.11.2018 Сообщений: 130 |
|
11.05.2020, 09:21 [ТС] |
17 |
Во, смотрите. Полностью задание.
0 |
IT_Exp Эксперт 87844 / 49110 / 22898 Регистрация: 17.06.2006 Сообщений: 92,604 |
11.05.2020, 09:21 |
Помогаю со студенческими работами здесь Проблемы с кодировкой Проблемы с кодировкой Проблемы с кодировкой Проблемы с кодировкой Проблемы с кодировкой Проблемы с кодировкой Искать еще темы с ответами Или воспользуйтесь поиском по форуму: 17 |
Features
- Pure javascript. Doesn’t need native code compilation.
- Easy API.
- Works on Windows and in sandboxed environments like Cloud9.
- Encoding is much faster than node-iconv (see below for performance comparison).
Usage
var iconv = require('iconv-lite');
// Convert from an encoded buffer to string.
str = iconv.decode(buf, 'win1251');
// Convert from string to an encoded buffer.
buf = iconv.encode("Sample input string", 'win1251');
// Check if encoding is supported
iconv.encodingExists("us-ascii")
Supported encodings
- All node.js native encodings: ‘utf8’, ‘ucs2’, ‘ascii’, ‘binary’, ‘base64’
- All widespread single byte encodings: Windows 125x family, ISO-8859 family,
IBM/DOS codepages, Macintosh family, KOI8 family.
Aliases like ‘latin1’, ‘us-ascii’ also supported. - Multibyte encodings: ‘gbk’, ‘gb2313’, ‘Big5’, ‘cp950’.
Others are easy to add, see the source. Please, participate.
Most encodings are generated from node-iconv. Thank you Ben Noordhuis and iconv authors!
Not supported yet: EUC family, Shift_JIS.
Encoding/decoding speed
Comparison with node-iconv module (1000x256kb, on Ubuntu 12.04, Core i5/2.5 GHz, Node v0.8.7).
Note: your results may vary, so please always check on your hardware.
operation iconv@1.2.4 iconv-lite@0.2.4
----------------------------------------------------------
encode('win1251') ~115 Mb/s ~230 Mb/s
decode('win1251') ~95 Mb/s ~130 Mb/s
Notes
When decoding, a ‘binary’-encoded string can be used as a source buffer.
Untranslatable characters are set to � or ?. No transliteration is currently supported, pull requests are welcome.
Testing
git clone git@github.com:ashtuchkin/iconv-lite.git
cd iconv-lite
npm install
npm test
# To view performance:
node test/performance.js
TODO
- Support streaming character conversion, something like util.pipe(req, iconv.fromEncodingStream(‘latin1’)).
- Add more encodings.
- Add transliteration (best fit char).
- Add tests and correct support of variable-byte encodings (currently work is delegated to node).