Iso 8859 1 to windows 1251 delphi

How can i do this conversion with delphi xe?
I’ve tried using libiconv2 but didn’t worked.

RRUZ

134k19 gold badges355 silver badges482 bronze badges

asked Mar 17, 2011 at 17:58

Using Dephi’s built-in AnsiString charset support is best:

type
  // ISO-8859-1 and Windows-1252 are NOT the same, but
  // are commonly interchanged when they should not be!
  Latin1String = type AnsiString(28591);
  Windows1252String = type AnsiString(1252);
  GreekString = type AnsiString(1253);

procedure DoIt;
var
  S1: Latin1String;
  S2: Windows1252String;
  S3: GreekString;
begin
  S1 := '...'; // auto-converts to ISO-8859-1
  S2 := S1; // auto-converts from ISO-8859-1 to Unicode to Windows-1252
  S3 := S1; // auto-converts from ISO-8859-1 to Unicode to Greek
end;

answered Mar 17, 2011 at 19:54

Remy LebeauRemy Lebeau

535k30 gold badges444 silver badges750 bronze badges

Источник

Наиболее читаемое

Перекодирование из одних кодировок в другие

Falk0ner, вс, 06/07/2008 — 15:34.

Многоязычие, локализация и перекодировка

Перекодирование

Этот алгоритм позволяет перекодировать текст.
Реализованы кодировки Windows-1251, KOI8-R, ISO-8859-5 и DOS.
Кодировка – это таблица, в которой указано,
например, что символ под номером 160 — это русская буква «а», а под номером 150 – «Ц» и т. д.
Кодировки различаются номерами русских букв
(как располагать английские буквы договорились).
Разные компьютеры в Интернете используют разные кодировки.
И поэтому, когда русский текст идет по Интернету, его многократно перекодируют.
Этот алгоритм обеспечивает высокую скорость перекодирования больших объемов данных.

procedure TForm1.Button1Click(Sender: TObject);

var

code1, code2: TCode;

s: string;

c: char;

i: integer;

chars: array [char] of char;

str: array [TCode] of string;

begin

case ComboBox1.ItemIndex of

1: code1 := koi;

2: code1 := iso;

3: code1 := dos;

else code1 := win;

end;

case ComboBox2.ItemIndex of

1: code2 := koi;

2: code2 := iso;

3: code2 := dos;

else code2 := win;

end;

s := Memo1.Text;

Str[win] := ‘АаБбВвГгДдЕеЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯя’;

Str[koi] := ‘юЮаАбБцЦдДеЕфФгГхХиИйЙкКлЛмМнНоОпПяЯрРсСтТуУжЖвВьЬыЫзЗшШэЭщЩчЧъЪ’;

Str[iso] := ‘РрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯяа№бёвђгѓдєеѕжізїијйљкњлћмќн§оўпџ’;

Str[dos] := ‘Ђ ЃЎ‚ўѓЈ„¤…Ґ†¦‡§?Ё‰©ЉЄ‹»Њ¬ЌЋ®ЏЇђа’б‘в»г»д•е–ж—з?и™йљк›лњмќнћоџп’;

for c := #0 to #255 do

Chars[c] := c;

for i := 1 to Length(Str[win]) do

Chars[Str[code2][i]] := Str[code1][i];

for i := 1 to Length(s) do

s[i] := Chars[s[i]];

Memo2.Text := s;

end;

Взято с сайта http://blackman.wp-club.net/

unit ConvertEncodingUnit;
interface
type // Тип матриц перекодировки
TCodeMatrix = array[1..255] of char;
{******************************************************************************
{ANSI, KOI8-R, KOI8-U, OEM/DOS, ISO
В этой версии имеется 6 видов матриц перекодирования (тип TCodeMatrix):
1. cmAnsiToKoi8R — перекодирует строку из кодировки ANSI в кодировку KOI8-R
2. cmAnsiToKoi8U — перекодирует строку из кодировки ANSI в кодировку KOI8-U
3. cmKoi8RToAnsi — перекодирует строку из кодировки KOI8-R в кодировку ANSI
4. cmKoi8UToAnsi — перекодирует строку из кодировки KOI8-U в кодировку ANSI
5. cmOemDosToAnsi — перекодирует строку из кодировки OEM/DOS в кодировку ANSI
6. cmIsoToAnsi — перекодирует строку из кодировки ISO в кодировку ANSI
******************************************************************************}
function ConvertEncoding(sIn: string; sCoding: string): string;
const // Матрицы перекодировки
FirstCodes =
#1#2#3#4#5#6#7#8#9#10#11#12#13#14#15#16#17#18#19#20#21#22#23#24#25#26#27#28+
#29#30#31‘ !»#$%&’‘()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^’ +
‘_`abcdefghijklmnopqrstuvwxyz{|}~’;
cmAnsiToKoi8R: TCodeMatrix = FirstCodes // ver 1.0, ©VEG, 31.10.2003
+ ‘ЂЃ‚ѓ„…†‡?‰Љ‹ЊЌЋЏђ‘’“”•–—?™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬®Ї°±Ііґµ¶·Ј—є»јЅѕїбвчздецъй’
+ ‘клмнопртуфхжигюыэящшьасБВЧЗДЕЦЪЙКЛМНОПРТУФХЖИГЮЫЭЯЩШЬАС’;
cmAnsiToKoi8U: TCodeMatrix = FirstCodes // ver 0.8, ©VEG, 31.10.2003
+ ‘ЂЃ‚ѓ„…†‡?‰Љ‹ЊЌЋЏђ‘’“”•–—?™љ›њќћџ ЎўЈ¤Ґ¦§Ё©Є«¬®Ї°±Ііґµ¶·Ј—є»јЅѕїбвчздецъй’
+ ‘клмнопртуфхжигюыэящшьасБВЧЗДЕЦЪЙКЛМНОПРТУФХЖИГЮЫЭЯЩШЬАС’;
cmKoi8RToAnsi: TCodeMatrix = FirstCodes // ver 1.0, ©VEG, 31.10.2003
+ ‘-¦-¬L-++T++—¦¦—?¦•v??? ?°?·?=¦-ёгг¬¬¬LLL—¦¦¦¦Ё¦¦TTT¦¦¦+++©юабцдефгх’
+ ‘ийклмнопярстужвьызшэщчъЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩЧЪ’;
cmKoi8UToAnsi: TCodeMatrix = FirstCodes // ver 1.0, ©VEG, 31.10.2003
+ ‘-¦-¬L-++T++—¦¦—?¦•v??? ?°?·?=¦-ёєгії¬LLL-ґў¦¦¦¦ЁЄ¦ІЇT¦¦¦+ҐЎ©юабцдефгх’
+ ‘ийклмнопярстужвьызшэщчъЮАБЦДЕФГХИЙКЛМНОПЯРСТУЖВЬЫЗШЭЩЧЪ’;
cmOemDosToAnsi: TCodeMatrix = FirstCodes // ver 1.0, ©VEG, 31.10.2003
+ ‘АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмноп—¦+¦¦¬¬¦¦¬—¬L+T+-+¦¦L’
+ ‘г¦T¦=+¦¦TTLL-г++—-¦¦-рстуфхцчшщъыьэюяЁёЄєЇїЎў°•·v№¤¦ ‘;
cmIsoToAnsi: TCodeMatrix = FirstCodes // ver 1.0, ©VEG, 31.10.2003
+ ‘???????????????????????????????? ЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШ’
+ ‘ЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюя№ёђѓєѕіїјљњћќ§ўџ’;
implementation
function ConvertEncoding(sIn: string; sCoding: string): string;
//sIn — строка для перекодирования
//sCoding — матрица для перекодирования
//result — полученная перекодированная строка
var
iFtd: integer;
begin
Result:=»;
for iFtd := 1 to length(sIn) do
result := result + sCoding[ord(sIn[iFtd])];
end; // ver 1.0, (C)Vrublevsky Evgeny Gennadyevich (BELARUS/SLUTSK), 31.10.2003
{******************************************************************************}
end.

Автор RoboSol
Взято из http://forum.sources.ru

Как можно перекодировать сообщение (содержание) из Win в КОИ8-Р для отправки по EMail?

const

Koi: Array[0..66] of Char = («T», «Ё», «ё», «А», «Б», «В», «Г», «Д», «Е», «Ж»,

«З», «И», «Й», «К», «Л», «М», «Н», «О», «П», «Р»,

«С», «Т», «У», «Ф», «Х», «Ц», «Ч», «Ш», «Щ», «Ъ»,

«Ы», «Ь», «Э», «Ю», «Я», «а», «б», «в», «г», «д»,

«е», «ж», «з», «и», «й», «к», «л», «м», «н», «о»,

«п», «р», «с», «т», «у», «ф», «х», «ц», «ч», «ш»,

«щ», «ъ», «ы», «ь», «э», «ю», «я»);

Win: Array[0..66] of Char = («ё», «Ё», «T», «ю», «а», «б», «ц», «д», «е», «ф»,

«г», «х», «и», «й», «к», «л», «м», «н», «о», «п»,

«я», «р», «с», «т», «у», «ж», «в», «ь», «ы», «з»,

«ш», «э», «щ», «ч», «ъ», «Ю», «А», «Б», «Ц», «Д»,

«Е», «Ф», «Г», «Х», «И», «Й», «К», «Л», «М», «Н»,

«О», «П», «Я», «Р», «С», «Т», «У», «Ж», «В», «Ь»,

«Ы», «З», «Ш», «Э», «Щ», «Ч», «Ъ»);

function WinToKoi(Str: String): String;

var

i, j, Index: Integer;

begin

Result := «»

for i := 1 to Length(Str) do

begin

Index := -1;

for j := Low(Win) to High(Win) do

if Win[j] = Str[i] then

begin

Index := j;

Break;

end;

if Index = -1 then Result := Result + Str[i]

else Result := Result + Koi[Index];

end;

function KoiToWin(Str: String): String;

var

i, j, Index: Integer;

begin

Result := «»

for i := 1 to Length(Str) do

begin

Index := -1;

for j := Low(Win) to High(Win) do

if Koi[j] = Str[i] then

begin

Index := j;

Break;

end;

if Index = -1 then Result := Result + Str[i]

else Result := Result + Win[Index];

end;

procedure SendFileOnSMTP(Host: String;

Port: Integer;

Subject,

FromAddress, ToAddress,

Body,

FileName: String);

var

NMSMTP: TNMSMTP;

begin

if DelSpace(ToAddress) = «» then Exit;

if ToAddress[1] = «» then Exit;

if (DelSpace(FileName) <> «») and not FileExists(FileName) then

raise Exception.Create(«SendFileOnSMTP: file not exist: « + FileName);

NMSMTP := TNMSMTP.Create(nil);

try

NMSMTP.Host := Host;

NMSMTP.Port := Port;

NMSMTP.Charset := «koi8-r»

NMSMTP.PostMessage.FromAddress := FromAddress;

NMSMTP.PostMessage.ToAddress.Text := ToAddress;

NMSMTP.PostMessage.Attachments.Text := FileName;

NMSMTP.PostMessage.Subject := Subject;

NMSMTP.PostMessage.Date := DateTimeToStr(Now);

NMSMTP.UserID := «netmaster»

NMSMTP.PostMessage.Body.Text := WinToKoi(Body);

NMSMTP.FinalHeader.Clear;

NMSMTP.TimeOut := 5000;

NMSMTP.Connect;

NMSMTP.SendMail;

NMSMTP.Disconnect;

finally

NMSMTP.Free;

end;

Взято с сайта http://blackman.wp-club.net/

Этот алгоритм позволяет перекодировать текст. Реализованы кодировки Windows-1251, KOI8-R, ISO-8859-5 и DOS. Кодировка – это таблица, в которой указано, например, что символ под номером 160 — это русская буква «а», а под номером 150 – «Ц» и т. д. Кодировки различаются номерами русских букв (как располагать английские буквы договорились). Разные компьютеры в Интернете используют разные кодировки. И поэтому, когда русский текст идет по Интернету, его многократно перекодируют.
Этот алгоритм обеспечивает высокую скорость перекодирования больших объемов данных.

procedure TForm1.Button1Click(Sender: TObject);

var

code1, code2: TCode;

s: string;

c: char;

i: integer;

chars: array [char] of char;

str: array [TCode] of string;

begin

case ComboBox1.ItemIndex of

1: code1 := koi;

2: code1 := iso;

3: code1 := dos;

else code1 := win;

end;

case ComboBox2.ItemIndex of

1: code2 := koi;

2: code2 := iso;

3: code2 := dos;

else code2 := win;

end;

s := Memo1.Text;

Str[win] := ‘АаБбВвГгДдЕеЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯя’;

Str[koi] := ‘юЮаАбБцЦдДеЕфФгГхХиИйЙкКлЛмМнНоОпПяЯрРсСтТуУжЖвВьЬыЫзЗшШэЭщЩчЧъЪ’;

Str[iso] := ‘РрСсТтУуФфХхЦцЧчШшЩщЪъЫыЬьЭэЮюЯяа?б?в?г?д?е?ж?з?и?йsк?лzм?н§о?пY’;

Str[dos] := ‘? ??‚???„¤…?†¦‡§??‰©S?‹»?¬?Z®???а’б‘в“г”д•е–ж—з?и™йsк›л?м?нzоYп’;

for c := #0 to #255 do

Chars[c] := c;

for i := 1 to Length(Str[win]) do

Chars[Str[code2][i]] := Str[code1][i];

for i := 1 to Length(s) do

s[i] := Chars[s[i]];

Memo2.Text := s;

end;

http://delphiworld.narod.ru/
DelphiWorld 6.0
Перекодировка текста DOS-Windows-Koi8

procedure WinToDos;

var

Src, Str: PChar;

begin

Src := Memo1.Lines.GetText; //Берем текст из TMemo как тип PChar

CharToOem(Src, Str); //API функция для перевода текста

Memo2.Lines.Text := StrPas(Str);//Записываем назад

end;

procedure DosToWin;

var

Src, Str: PChar;

begin

Src := Memo1.Lines.GetText; //Берем текст из TMemo как тип PChar

OemToChar(Src, Str); //API функция для перевода текста

Memo2.Lines.Text := StrPas(Str);//Записываем назад

end;

var

koi8toalt : array [0..127] of char = (

CHR($c4), Chr($b3), Chr($da), Chr($bf),

Chr($c0), Chr($d9), Chr($c3), Chr($b4),

Chr($c2), Chr($c1), Chr($c5), Chr($df),

Chr($dc), Chr($db), Chr($dd), Chr($de),

Chr($b0), Chr($b1), Chr($b2), Chr($f4),

Chr($fe), Chr($f9), Chr($fb), Chr($f7),

Chr($f3), Chr($f2), Chr($ff), Chr($f5),

Chr($f8), Chr($fd), Chr($fa), Chr($f6),

Chr($cd), Chr($ba), Chr($d5), Chr($f1),

Chr($d6), Chr($c9), Chr($b8), Chr($b7),

Chr($bb), Chr($d4), Chr($d3), Chr($c8),

Chr($be), Chr($bd), Chr($bc), Chr($c6),

Chr($c7), Chr($cc), Chr($b5), Chr($f0),

Chr($b6), Chr($b9), Chr($d1), Chr($d2),

Chr($cb), Chr($cf), Chr($d0), Chr($ca),

Chr($d8), Chr($d7), Chr($ce), Chr($fc),

Chr($ee), Chr($a0), Chr($a1), Chr($e6),

Chr($a4), Chr($a5), Chr($e4), Chr($a3),

Chr($e5), Chr($a8), Chr($a9), Chr($aa),

Chr($ab), Chr($ac), Chr($ad), Chr($ae),

Chr($af), Chr($ef), Chr($e0), Chr($e1),

Chr($e2), Chr($e3), Chr($a6), Chr($a2),

Chr($ec), Chr($eb), Chr($a7), Chr($e8),

Chr($ed), Chr($e9), Chr($e7), Chr($ea),

Chr($9e), Chr($80), Chr($81), Chr($96),

Chr($84), Chr($85), Chr($94), Chr($83),

Chr($95), Chr($88), Chr($89), Chr($8a),

Chr($8b), Chr($8c), Chr($8d), Chr($8e),

Chr($8f), Chr($9f), Chr($90), Chr($91),

Chr($92), Chr($93), Chr($86), Chr($82),

Chr($9c), Chr($9b), Chr($87), Chr($98),

Chr($9d), Chr($99), Chr($97), Chr($9a));

function Koi8toWin(const Data: PChar; DataLen: Integer): PChar;

var

PCh: PChar;

i: Integer;

begin

PCh := Data;

for i := 1 to DataLen do

begin

if Ord(Pch^) > 127 then

Pch^ := koi8toalt[Ord(Pch^) — 128];

Inc(PCh);

end;

PCh := Data;

OemToCharBuff(PCh, PCh, DWORD(DataLen));

Result := Data;

end;

http://delphiworld.narod.ru/
DelphiWorld 6.0

Перекодировка текста из Win1251 в KOI8-R и наоборот

type

TConvertChars = array [#128..#255] of char;

const

Win_KoiChars: TConvertChars = (

#128,#129,#130,#131,#132,#133,#134,#135,#136,#137,#060,#139,#140,#141,#142,#143,

#144,#145,#146,#147,#148,#169,#150,#151,#152,#153,#154,#062,#176,#157,#183,#159,

#160,#246,#247,#074,#164,#231,#166,#167,#179,#169,#180,#060,#172,#173,#174,#183,

#156,#177,#073,#105,#199,#181,#182,#158,#163,#191,#164,#062,#106,#189,#190,#167,

#225,#226,#247,#231,#228,#229,#246,#250,#233,#234,#235,#236,#237,#238,#239,#240,

#242,#243,#244,#245,#230,#232,#227,#254,#251,#253,#154,#249,#248,#252,#224,#241,

#193,#194,#215,#199,#196,#197,#214,#218,#201,#202,#203,#204,#205,#206,#207,#208,

#210,#211,#212,#213,#198,#200,#195,#222,#219,#221,#223,#217,#216,#220,#192,#209);

Koi_WinChars: TConvertChars = (

#128,#129,#130,#131,#132,#133,#134,#135,#136,#137,#138,#139,#140,#141,#142,#143,

#144,#145,#146,#147,#148,#149,#150,#151,#152,#153,#218,#155,#176,#157,#183,#159,

#160,#161,#162,#184,#186,#165,#166,#191,#168,#169,#170,#171,#172,#173,#174,#175,

#156,#177,#178,#168,#170,#181,#182,#175,#184,#185,#186,#187,#188,#189,#190,#185,

#254,#224,#225,#246,#228,#229,#244,#227,#245,#232,#233,#234,#235,#236,#237,#238,

#239,#255,#240,#241,#242,#243,#230,#226,#252,#251,#231,#248,#253,#249,#247,#250,

#222,#192,#193,#214,#196,#197,#212,#195,#213,#200,#201,#202,#203,#204,#205,#206,

#207,#223,#208,#209,#210,#211,#198,#194,#220,#219,#199,#216,#221,#217,#215,#218);

function Win_KoiConvert(const St: string): string;

var

i: integer;

begin

Result:=St;

for i:=1 to Length(St) do

if St[i]>#127 then

Result[i]:=Win_KoiChars[St[i]];

end;

http://delphiworld.narod.ru/
DelphiWorld 6.0

Быстрая навигация в FAQ

Новые файлы

Источник

UniConv

UniConv is a universal quick and compact library intended for conversion, comparison and change of the register of text in concordance with the latest standards of the Unicode Consortium. The library’s function greatly resembles ICU, libiconv and Windows.kernel which are de facto standard for popular operating systems. There are several reasons for design and use of UniConv:

None of the libraries supports the full list of byte order mark (BOM)
None of the libraries supports the full list of encodings, provided by XML and HTML standards
There is no universal «best-fit» behavior for single-byte character sets. The results of conversion differ not only for different libraries but also for different code pages within the same library
There are no comparison functions between strings in different codings «on-the-fly» (e.g. between UTF-16 and UTF-8, or Windows-1251 and Windows-1252).
Library interface is poorly adapted for the sequential processing of large text files
Libraries are constructed from considerations of universality but not the maximum performance
The identity of the transformations is not guaranteed (e.g. CFStringUppercase, u_strToUpper and CharUpperBuffW) process differently some characters. Even CharUpperBuffW on Windows XP and Windows 10 may produce different results

The examples of the library use you can find on demonstration projects: Demo.zip

Supported encodings

UniConv supports 50 encodings:

12 Unicode encodings: UTF-8, UTF-16(LE) ~ UCS2, UTF-16BE, UTF-32(LE) = UCS4, UTF-32BE, UCS4 unusual octet order 2143, UCS4 unusual octet order 3412, UTF-1, UTF-7, UTF-EBCDIC, SCSU, BOCU-1
10 ANSI code pages (may be returned by Windows.GetACP): CP874, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, CP1258
4 another multy-byte encodings, that may be specified as default in POSIX systems: shift_jis, gb2312, ks_c_5601-1987, big5
23 single/multy-byte encodings, that also can be defined as «encoding» in XML/HTML: ibm866, iso-8859-2, iso-8859-3, iso-8859-4, iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-10, iso-8859-13, iso-8859-14, iso-8859-15, iso-8859-16, koi8-r, koi8-u, macintosh, x-mac-cyrillic, x-user-defined, gb18030, hz-gb-2312, euc-jp, iso-2022-jp, euc-kr
Raw data

Conversion context

The main library type is TUniConvContext. It allows converting of text from one encoding into another changing, if needed, insensitive «on-the-fly». For identification of encoding the number of code page is used. And as for some encodings the code page number is not provided in the library there are defined several ‘fake’ code pages (e.g. for encoding UTF-1 and UCS-2143). The type TUniConv Context is an object, which means it does not require constructors and destructors. It is enough to declare as a usual variable and call necessary methods.

For initialization of TUniConvContext the Init (takes as a parameter code pages and case sensitivity) method is used. Alternative Init takes byte order mark (TBOM) what is convenient for reading and writing of text files. In addition initializing TBOM much less possible encodings are analyzed so that the size of the output binary file will be approximately 50 KB less. If the conversion takes place between the UTF-8, UTF-16 or a single-byte character set, you can initialize by such methods as the InitUTF16FromSBCS or InitUTF8FromSBCS.

To make the conversion, you need to assign the Source, SourceSize, Destination, DestinationSize fields and call the Convert function. After the conversion SourceRead and DestinationWritten fields will be filled. For convenience, there are two more species Convert functions, which assign the necessary fields automatically.

TUniConvContext allows sequential processing of large files, using small memory buffers. There may be occasions when converted characters do not fit in the Destination buffer or vice versa Source buffer is too small to read a character at the end of the buffer. In these cases, TUniConvContext will contain the latest stable state, and the Convert function will return integer value, by which it is possible to determine how the conversion process took place. Null means that the conversion was successful. Positive — Destination means that buffer is too small. Negative — Source means that buffer is too small to read a character at the end of the buffer. Some encodings (e.g. UTF-7, BOCU-1, iso-2022-jp) use «state», which is important for the conversion of text in parts. However, you can call ResetState if there is a need to start the conversion again. ModeFinalize property (default value is True) is important for the encodings that use «state», as in the case of the end of conversion into Destination a few bytes are being written. Do not forget to set ModeFinalize property to False value if it is assumed that the data of Source is not ended. In the case of ModeFinalize = True and successful conversion — ResetState is called automatically.

In some cases (e.g. when generating XML, HTML or JSON) it is necessary to determine whether it is possible to use the destination encoding to write a character. In these cases one of the kinds of Convertible functions can help you.

type
  // case sensitivity
  TCharCase = (ccOriginal, ccLower, ccUpper);

  // byte order mark
  TBOM = (bomNone, bomUTF8, bomUTF16, bomUTF16BE, bomUTF32, bomUTF32BE, bomUCS2143, bomUCS3412, bomUTF1, bomUTF7, bomUTFEBCDIC, bomSCSU, bomBOCU1, bomGB18030);

var
  // automatically defined default code page
  CODEPAGE_DEFAULT: Word;

const
  // non-defined (fake) code page identifiers
  CODEPAGE_UCS2143 = 12002;
  CODEPAGE_UCS3412 = 12003;
  CODEPAGE_UTF1 = 65002;
  CODEPAGE_UTFEBCDIC = 65003;
  CODEPAGE_SCSU = 65004;
  CODEPAGE_BOCU1 = 65005;
  CODEPAGE_USERDEFINED = $fffd;
  CODEPAGE_RAWDATA = $ffff;
  
type  
  TUniConvContext = object
  public
    // "constructors"
    procedure Init(const ADestinationCodePage, ASourceCodePage: Word; const ACharCase: TCharCase); 
    procedure Init(const ADestinationBOM, ASourceBOM: TBOM; const SBCSCodePage: Word; const ACharCase: TCharCase); 

    // context properties
    property DestinationCodePage: Word read
    property SourceCodePage: Word read
    property CharCase: TCharCase read
    property ModeFinalize: Boolean read/write
    procedure ResetState;

    // character convertibility
    function Convertible(const C: UCS4Char): Boolean;
    function Convertible(const C: UnicodeChar): Boolean;
    
    // conversion parameters
    property Destination: Pointer read/write
    property DestinationSize: NativeUInt read/write
    property Source: Pointer read/write
    property SourceSize: NativeUInt read/write
    
    // conversion
    function Convert: NativeInt;     
    function Convert(const ADestination: Pointer;
                     const ADestinationSize: NativeUInt;
                     const ASource: Pointer;
                     const ASourceSize: NativeUInt): NativeInt;
    function Convert(const ADestination: Pointer;
                     const ADestinationSize: NativeUInt;
                     const ASource: Pointer;
                     const ASourceSize: NativeUInt;
                     out ADestinationWritten: NativeUInt;
                     out ASourceRead: NativeUInt): NativeInt; 
                     
    // "out" information
    property DestinationWritten: NativeUInt read
    property SourceRead: NativeUInt read
  end;

Lookup tables

One of the key priorities of the UniConv library is the maximum performance. That is why these primitives are frequently used — hash and lookup tables. Some of them you can use directly in your algorithms. The most glaring example — UNICONV_CHARCASE lookup, when by simple table conversion, you can change the case of UnicodeChar. For example UNICONV_CHARCASE.LOWER['U'] = 'u', and UNICONV_CHARCASE.UPPER['n'] = 'N'. Another example of lookup table — UNICONV_UTF8CHAR_SIZE. UTF-8 is designed so that by the first byte you can determine the character length. The range from 1 to 6 is permitted, but the Unicode consortium has restricted the number of characters in a way that only values from 1 to 4 are relevant. Values of the first byte 128..191, 254 and 255 are not provide by UTF-8 encoding, their «length» in the UNICONV_UTF8CHAR_SIZE will be zero.

In the library UniConv special attention is given to single-byte character set (SBCS) encodings. In Delphi, to these encodings correspond AnsiChar and AnsiString types. For each supported SBCS corresponds TUniConvSBCS type, inside which there are several lookup tables, designed for quick conversion of characters. LowerCase and UpperCase allow you to change character case AnsiChar -> AnsiChar. To convert AnsiChar -> UnicodeChar UCS2, LowerCaseUCS2 and UpperCaseUCS2 are used. To convert AnsiChar -> UTF8Char (Cardinal) UTF8, LowerCaseUTF8 and UpperCaseUTF8 are used. The length of the destination of the character is from 1 to 3 and written in high byte (Cardinal shr 24). To convert UnicodeChar -> (best-fit) AnsiChar use a lookup table VALUES. To convert from one SBCS to another (AnsiChar --> AnsiChar) use the FromSBCS.

To find TUniConvSBCS by code page is possible with the help of UniConvSBCS and UniConvSBCSIndex functions. If SBCS is not found — default value returns (Raw data = code page $FFFF). In order to determine whether the code page is supported by SBCS — use the UniConvIsSBCS.

type
  TUniConvSBCS = object
  public
    // information
    property Index: Word read
    property CodePage: Word read

    // lower/upper single-byte tables
    property LowerCase: PUniConvSS
    property UpperCase: PUniConvSS

    // basic unicode tables
    property UCS2: PUniConvUS read
    property UTF8: PUniConvMS read
    property VALUES: PUniConvSBCSValues read

    // lower/upper unicode tables
    property LowerCaseUCS2: PUniConvUS read
    property UpperCaseUCS2: PUniConvUS read
    property LowerCaseUTF8: PUniConvMS read
    property UpperCaseUTF8: PUniConvMS read

    // single-byte lookup from another encoding
    function FromSBCS(const Source: PUniConvSBCS; const CharCase: TCharCase): PUniConvSS;  
  end;
  
var
  DEFAULT_UNICONV_SBCS: PUniConvSBCS;
  DEFAULT_UNICONV_SBCS_INDEX: NativeUInt;
  UNICONV_SUPPORTED_SBCS: array[0..28] of TUniConvSBCS;
  
  function UniConvIsSBCS(const CodePage: Word): Boolean;
  function UniConvSBCS(const CodePage: Word): PUniConvSBCS;
  function UniConvSBCSIndex(const CodePage: Word): NativeUInt;

Compiler independent char/string types

The library UniConv gives special attention to the UTF-8, UTF-16 and SBCS (Ansi) encodings, since they are used more often. There are several standard types to work with them, but on the mobile platforms (NEXTGEN compilers) there is only one string type — UnicodeString. For ease of programming on multiple platforms in the library announced such types as the AnsiChar, AnsiString, UTF8String, RawByteString, WideString and ShortString. Be careful when using them, because on mobile platforms they are emulated through static/dinamic arrays, characters enumeration can start from zero, and the character constant can be ordinal type.

String types conversion

The library provides a great number of functions to change the case of letters, as well as converting of strings in UTF-8, UTF-16 and SBCS (Ansi). Note that no matter procedure and function interface exist both, using function on code sections demanding performance is not recommended. This is due to the fact that the Delphi compiler generates for function: StringType which is not a very efficient code.

Besides, be careful when using the type AnsiString. If the code page is different from the default (e.g. AnsiString(1253)), calling convert functions use explicit conversion to AnsiString (e.g. utf16_from_sbcs(Result, AnsiString(MyGreekString));). This is due to the fact that Delphi compiler automatically converts AnsiString(1253) into AnsiString, which will lead to data and productivity loss. For the same reason, try to avoid conversions when AnsiString returns as a function result.

  // examples
  procedure utf16_from_utf8(var Dest: UnicodeString; const Src: UTF8String);
  function utf16_from_utf8(const Src: UTF8String): UnicodeString;
  procedure sbcs_from_utf16_upper(var Dest: AnsiString; const Src: UnicodeString; const CodePage: Word = 0);
  function sbcs_from_utf16_upper(const Src: UnicodeString; const CodePage: Word = 0): AnsiString;  
  procedure utf8_from_sbcs_lower(var Dest: UTF8String; const Src: AnsiString);
  function utf8_from_sbcs_lower(const Src: AnsiString): UTF8String;
  procedure utf16_from_utf16_upper(var Dest: UnicodeString; const Src: UnicodeString);
  function utf16_from_utf16_upper(const Src: UnicodeString): UnicodeString;

String types comparison

For the encodings of UTF-8, UTF-16 and SBCS(Ansi) UniConv library contains many functions that allow comparing strings among) themselves without preliminary conversion into a universal encoding. All comparison functions are divided into equal and compare, common and ignorecase. If you need to compare two strings for equality then use equal option function as it is faster than compare. If string comparison is necessary to make case insensitive — use ignorecase. The UniConv library allows comparison between SBCS(Ansi) strings in different encodings. However, if you are sure that the encoding of such strings are the same — it is recommended to use samesbcs-functions.

For AnsiString types with non-default code page (e.g. AnsiString(1253)), calling the comparing function, use explicit conversion in AnsiString (e.g. utf8_compare_sbcs_ignorecase(MyUTF8String, AnsiString(MyGreekString));).

  // examples
  function utf16_equal_utf8(const S1: UnicodeString; const S2: UTF8String): Boolean;
  function utf16_equal_utf8_ignorecase(const S1: UnicodeString; const S2: UTF8String): Boolean;
  function utf8_compare_sbcs(const S1: UTF8String; const S2: AnsiString): NativeInt;
  function utf8_compare_sbcs_ignorecase(const S1: UTF8String; const S2: AnsiString): NativeInt;  
  function sbcs_equal_samesbcs(const S1: AnsiString; const S2: AnsiString): Boolean;
  function sbcs_compare_samesbcs_ignorecase(const S1: AnsiString; const S2: AnsiString): NativeInt;

Источник

← →
Inna_Z

(2007-06-01 12:29)
[0]

Пробовала посылать при помощи стандартных компонент
TIdSMTP, TIdMessage но возникли проблемы с кодировкой.

Кто знает какие-то хорошие компоненты
или расскажите как можно затставить правильно работать эти?

P.S. Отправленную почту получала аутлуком

TIdMessagePart.ContentType

← →
Savek

(2007-06-01 12:43)
[2]

Я проблемы с кодировкой решил вот так:
IdMessage.From.Name:=EncodeHeader(eFName.Text,C,"Q",h,"windows-1251"); IdMessage.Subject:=EncodeHeader(eTema.Text,C,"Q",h,"windows-1251"); //и т.д.

TIdMessagePart.CharSet?

← →
Savek

(2007-06-01 12:44)
[4]

← →
Inna_Z

(2007-06-01 12:44)
[5]

Спасибо.
Разобралась

В аутлуке надо было выбрать кодировку
Cyrillic Windows

Оказывается всё было просто

← →
Inna_Z

(2007-06-01 12:49)
[6]

А что за функция EncodeHeader?
Откуда она?

> [6] Inna_Z (01.06.07 12:49)

у меня она тут живет
D:Delphi7Indy10ProtocolsIdCoderHeader.pas

у тебя наверно где-то в том же районе

> Оказывается всё было просто

Не так все просто и главное неправильно.
Что бы не было проблем со чтением писем, надо чтобы в письме были указаны кодировки, для темы и для текста.

> В аутлуке надо было выбрать кодировку
> Cyrillic Windows

аутлук должен сам понимать кодировку. Для сего ему надо ее объяснить.
Письмо шлешь чисто текстом или с аттачами ?

← →
Inna_Z

(2007-06-01 18:39)
[10]

Надо вобщем чтоб было и текст письма на русском и заголовок (с ним всё кстати нормально) и прекреплённые файлы

шлю я его пока так:

// Настраиваем SMTP
SMTP.Host := FHost;
SMTP.Port := FPort;

// Формируем содержимое письма
MailMessage.From.Address := «Inna@asbase.ntu-kpi.kiev.ua»;
MailMessage.Recipients.EMailAddresses := FEmail;
MailMessage.ContentType := «plain/text»;
MailMessage.Subject := «Проверка связи»;
MailMessage.Body.Text := ReportText;

//send mail
try
try
SMTP.Connect(1000);
SMTP.SendMsg(MailMessage);
except

end;
finally

if SMTP.Connected then SMTP.Disconnect;
end;

Как указать ему правильно кодировку?
есть свойство CharSet
в делфи перечисленны варианты

US-ASCII
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-8859-10

можно ли задавать другие?

← →
Inna_Z

(2007-06-01 18:40)
[11]

Пока без прикреплённых файлов шлю

Вот этот рабочий код, взятый откуда-то с форума:
uses Mapi, IniFiles, CommonUnit, TuneUnit, MsgFormUnit, HDDInfo;
const
FileName = «MailAddresses.dat»;
Section = «Main»;
Value = «ItemIndex»;
TextFileName = «Распечатка.txt»;

function SendEMail(Handle: THandle; Mail: TStrings): Cardinal;
type
TAttachAccessArray = array [0..0] of TMapiFileDesc;
PAttachAccessArray = ^TAttachAccessArray;
var
MapiMessage: TMapiMessage;
Receip: TMapiRecipDesc;
Attachments: PAttachAccessArray;
AttachCount: Integer;
i1: integer;
FileName: string;
dwRet: Cardinal;
MAPI_Session: Cardinal;
WndList: Pointer;
begin
dwRet := MapiLogon(Handle,
PChar(«»),
PChar(«»),
MAPI_LOGON_UI or MAPI_NEW_SESSION,
0, @MAPI_Session);

if (dwRet <> SUCCESS_SUCCESS) then
begin
MessageBox(Handle,
PChar(«Error while trying to send email»),
PChar(«Error»),
MB_ICONERROR or MB_OK);
end
else
begin
FillChar(MapiMessage, SizeOf(MapiMessage), #0);
Attachments := nil;
FillChar(Receip, SizeOf(Receip), #0);

if Mail.Values[«to»] <> «» then
begin
Receip.ulReserved := 0;
Receip.ulRecipClass := MAPI_TO;
Receip.lpszName := StrNew(PChar(Mail.Values[«to»]));
Receip.lpszAddress := StrNew(PChar(«SMTP:» + Mail.Values[«to»]));
Receip.ulEIDSize := 0;
MapiMessage.nRecipCount := 1;
MapiMessage.lpRecips := @Receip;
end;

AttachCount := 0;

for i1 := 0 to MaxInt do
begin
if Mail.Values[«attachment» + IntToStr(i1)] = «» then
break;
Inc(AttachCount);
end;

if AttachCount > 0 then
begin
GetMem(Attachments, SizeOf(TMapiFileDesc) * AttachCount);

for i1 := 0 to AttachCount — 1 do
begin
FileName := Mail.Values[«attachment» + IntToStr(i1)];
Attachments[i1].ulReserved := 0;
Attachments[i1].flFlags := 0;
Attachments[i1].nPosition := ULONG($FFFFFFFF);
Attachments[i1].lpszPathName := StrNew(PChar(FileName));
Attachments[i1].lpszFileName :=
StrNew(PChar(ExtractFileName(FileName)));
Attachments[i1].lpFileType := nil;
end;
MapiMessage.nFileCount := AttachCount;
MapiMessage.lpFiles := @Attachments^;
end;

if Mail.Values[«subject»] <> «» then
MapiMessage.lpszSubject := StrNew(PChar(Mail.Values[«subject»]));
if Mail.Values[«body»] <> «» then
MapiMessage.lpszNoteText := StrNew(PChar(Mail.Values[«body»]));

WndList := DisableTaskWindows(0);
try
Result := MapiSendMail(MAPI_Session, Handle,
MapiMessage, MAPI_DIALOG, 0);
finally
EnableTaskWindows( WndList );
end;

for i1 := 0 to AttachCount — 1 do
begin
StrDispose(Attachments[i1].lpszPathName);
StrDispose(Attachments[i1].lpszFileName);
end;

if Assigned(MapiMessage.lpszSubject) then
StrDispose(MapiMessage.lpszSubject);
if Assigned(MapiMessage.lpszNoteText) then
StrDispose(MapiMessage.lpszNoteText);
if Assigned(Receip.lpszAddress) then
StrDispose(Receip.lpszAddress);
if Assigned(Receip.lpszName) then
StrDispose(Receip.lpszName);
MapiLogOff(MAPI_Session, Handle, 0, 0);
end;
end;

Источник

Delphi 7 Indy 10. Кодировка IdHTTP1.Post и IdHTTP1.Get

После установки последней десятой версии Indy возникли проблемы с кодировкой. Когда отправляешь Get или Post запрос на сайт, то в ответ получаешь за место русских символов знаки вопроса ?????

После долгих поисков решения этой проблемы в интернете, я понял, что придется разбираться самому. Было ясно, что ответ кроется в исходниках Indy. Наверняка в исходниках написано что-то типо: «Если неизвестный символ, то заменить на ‘?'». Начал поиски в исходных текстах по знаку вопроса. И результат не заставил себя ждать!

if UInt16(P^) > $007F then begin

ABytes^ := Byte(Ord(‘?’));

end else begin

ABytes^ := Byte(P^);

end;

Для работы с сайтами в кодировке utf-8 меняем команду ABytes^ := Byte(Ord(‘?’)); на ABytes^ := Byte(P^)+176;. Почему именно +176? Если вы делали запрос на сайт в кодировке utf-8, то русские символы возвращаются со смещением на 176 знаков. Восстановив это смещение, мы получим исходный русский текст.

Для работы с сайтами в кодировке windows-1251 или вообще без кодировки меняем команду ABytes^ := Byte(Ord(‘?’)); на ABytes^ := Byte(P^);. Но в данном случае потребуется дополнительно результат запроса провести через функцию UTF8Decode();. Например так: Memo1.Text := UTF8Decode(IdHTTP1.Get(‘www.site_windows-1251.ru’));.

Возникает вопрос, как и где все эти изменения сделать? Все просто. Открываем в Delphi свой проект. Переходит к разделу «uses». Находим там слово «IdHTTP». И нажимаем на него, при этом зажав клавишу «Ctrl». В открывшемся окошке также спускаемся к разделу «uses». Находим там слово «IdGlobal». И нажимаем на него, при этом зажав клавишу «Ctrl». В открывшемся окошке находим код, о котором писалось выше, и вносим необходимые изменения. После чего сохраняем и запускаем свой проект. Проверяем — теперь русские буквы распознаются!

Вторая часть вопроса, как передать русский текст в параметрах через IdHTTP1.Post(); используя TIdMultiPartFormDataStream.AddFormField();. Решение этого вопроса было найдено частично. В программе ничего не меняем, а со стороны сервера в php коде пишем следующее: quoted_printable_decode($_POST[‘ru_text’]); и в результате получаем чистый русский текст.

И последнее, как передать русское имя отправляемого файла через IdHTTP1.Post(); используя TIdMultiPartFormDataStream.AddFile();. Решение этого вопроса также было найдено частично, но мне было этого достаточно. Во первых со стороны сервера все также придется использовать: quoted_printable_decode($_FILES[‘upload_ru_file’][‘name’]);. Но в дополнение к этому в исходниках Indy придется внести изменения.

Открываем в Delphi свой проект. Переходит к разделу «uses». Находим там слово «IdHTTP». И нажимаем на него, при этом зажав клавишу «Ctrl». В открывшемся окошке также спускаемся к разделу «uses», проматываем дальше и ищем второй раздел «uses». Там находим слово «IdGlobalProtocols» и нажимаем на него, при этом зажав клавишу «Ctrl». Ищем в этом окошке фразу LANG_RUSSIAN: Result := idcs_KOI8_R;. Меняем ее на LANG_RUSSIAN: Result := idcs_ISO_8859_1;. После чего сохраняем и запускаем свой проект. Теперь при совместной работе Delphi + php мы получаем корректные русские имена файлов.

Надеюсь эта статья поможет Вам решить проблему с кодировкой. Не знаю как все эти изменения в исходниках Indy отразятся на работе других функций в дальнейшем. Но конкретно в моем проекте мне это помогло. Используйте данную статью на свой страх и риск.

Источник

Наиболее читаемое

Перекодирование из одних кодировок в другие

Похожие материалы

Быстрая навигация в FAQ

Новые файлы

UniConv

Supported encodings

Conversion context

Lookup tables

Compiler independent char/string types

String types conversion

String types comparison

Вот еще несколько интересных статей: