Php mb convert encoding windows 1251 utf 8 - Доктор Windows

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP

mb_convert_encoding — Преобразует строку из одной кодировки символов в другую

Описание

Список параметров

string

Строка (string) или массив (array), для преобразования.

to_encoding

Требуемая кодировка результата.

from_encoding

Текущая кодировка, используемая для интерпретации строки string.
Несколько кодировок могут быть указаны в виде массива (array) или в виде строки через запятую,
в этом случае правильная кодировка будет определена по тому же алгоритму,
что и в функции mb_detect_encoding().

Если параметр from_encoding равен null или не указан,
то будет использоваться mbstring.internal_encoding setting,
если она установлена, иначе кодировка по умолчанию.

Допустимые значения to_encoding и from_encoding
указаны на странице поддерживаемые кодировки.

Возвращаемые значения

Преобразованная строка (string) или массив (array) или false в случае возникновения ошибки.

Ошибки

Начиная с PHP 8.0.0, если значение to_encoding или
from_encoding является недопустимой кодировкой, выбрасывается ValueError.
До PHP 8.0.0 вместо этого выдавалась ошибка уровня E_WARNING.

Список изменений

Версия	Описание
8.0.0	mb_convert_encoding() теперь выбрасывает ValueError, если передана недопустимая кодировка в `to_encoding`.
8.0.0	mb_convert_encoding() теперь выбрасывает ValueError, если передана недопустимая кодировка в `from_encoding`.
8.0.0	Теперь `from_encoding` может быть `null`.
7.2.0	Функция теперь также принимает массив (array) в `string`. Ранее поддерживались только строки (string).

Примеры

Пример #1 Пример использования mb_convert_encoding()

<?php /* Преобразует строку в кодировку SJIS */ $str = mb_convert_encoding($str, "SJIS");/* Преобразует из EUC-JP в UTF-7 */ $str = mb_convert_encoding($str, "UTF-7", "EUC-JP");/* Автоматически определяется кодировка среди JIS, eucjp-win, sjis-win, затем преобразуется в UCS-2LE */ $str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");/* Если mbstring.language равен "Japanese", "auto" используется для обозначения "ASCII,JIS,UTF-8,EUC-JP,SJIS" */ $str = mb_convert_encoding($str, "EUC-JP", "auto"); ?>

Смотрите также

mb_detect_order() — Установка/получение списка кодировок для механизмов определения кодировки
UConverter::transcode() — Преобразует строку из одной кодировки символов в другую
iconv() — Преобразует строку из одной кодировки символов в другую

josip at cubrad dot com ¶

9 years ago

For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:


function w1250_to_utf8($text) {
    // map based on:
    // http://konfiguracja.c0.pl/iso02vscp1250en.html
    // http://konfiguracja.c0.pl/webpl/index_en.html#examp
    // http://www.htmlentities.com/html/entities/
    $map = array(
        chr(0x8A) => chr(0xA9),
        chr(0x8C) => chr(0xA6),
        chr(0x8D) => chr(0xAB),
        chr(0x8E) => chr(0xAE),
        chr(0x8F) => chr(0xAC),
        chr(0x9C) => chr(0xB6),
        chr(0x9D) => chr(0xBB),
        chr(0xA1) => chr(0xB7),
        chr(0xA5) => chr(0xA1),
        chr(0xBC) => chr(0xA5),
        chr(0x9F) => chr(0xBC),
        chr(0xB9) => chr(0xB1),
        chr(0x9A) => chr(0xB9),
        chr(0xBE) => chr(0xB5),
        chr(0x9E) => chr(0xBE),
        chr(0x80) => '&euro;',
        chr(0x82) => '&sbquo;',
        chr(0x84) => '&bdquo;',
        chr(0x85) => '&hellip;',
        chr(0x86) => '&dagger;',
        chr(0x87) => '&Dagger;',
        chr(0x89) => '&permil;',
        chr(0x8B) => '&lsaquo;',
        chr(0x91) => '&lsquo;',
        chr(0x92) => '&rsquo;',
        chr(0x93) => '&ldquo;',
        chr(0x94) => '&rdquo;',
        chr(0x95) => '&bull;',
        chr(0x96) => '&ndash;',
        chr(0x97) => '&mdash;',
        chr(0x99) => '&trade;',
        chr(0x9B) => '&rsquo;',
        chr(0xA6) => '&brvbar;',
        chr(0xA9) => '&copy;',
        chr(0xAB) => '&laquo;',
        chr(0xAE) => '&reg;',
        chr(0xB1) => '&plusmn;',
        chr(0xB5) => '&micro;',
        chr(0xB6) => '&para;',
        chr(0xB7) => '&middot;',
        chr(0xBB) => '&raquo;',
    );
    return html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8', 'ISO-8859-2'), ENT_QUOTES, 'UTF-8');
}

regrunge at hotmail dot it ¶

12 years ago

I've been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i've found it in this way:



<?php


$text = "A strange string to pass, maybe with some ø, æ, å characters.";

foreach(
mb_list_encodings() as $chr){


        echo mb_convert_encoding($text, 'UTF-8', $chr)." : ".$chr."<br>";   


 }


?>





The line that looks good, gives you the encoding it was written in.




  


                       
  





Hope can help someone

volker at machon dot biz ¶

15 years ago

Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:


public function encodeToUtf8($string) {
     return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}



  


                       
  




public function encodeToIso($string) {
     return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}
For me these functions are working fine. Give it a try

francois at bonzon point com ¶

14 years ago

aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:





  


                       
  




mbstring.substitute_character = "none"
in your php.ini. Be sure to include the quotes around none. Or at run-time with
<?php
ini_set('mbstring.substitute_character', "none");
?>

eion at bigfoot dot com ¶

16 years ago

many people below talk about using <?php mb_convert_encode($s,'HTML-ENTITIES','UTF-8'); ?> to convert non-ascii code into html-readable stuff. Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.



So [insert korean characters here] turned into ?????.


I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)


so instead of


<?php


    mb_convert_encode(&$s,'HTML-ENTITIES','UTF-8');


?>


which worked perfectly until I upgraded, so I had to use


<?php


    call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));


?>





Hope it helps someone else out

aaron at aarongough dot com ¶

14 years ago

My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)


Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding. 
<?php
function convert_to ( $source, $target_encoding )
    {
    // detect the character encoding of the incoming file
    $encoding = mb_detect_encoding( $source, "auto" );// escape all of the question marks so we can remove artifacts from
    // the unicode conversion process
    $target = str_replace( "?", "[question_mark]", $source );// convert the string to the target encoding
    $target = mb_convert_encoding( $target, $target_encoding, $encoding);// remove any question marks that have been introduced because of illegal characters
    $target = str_replace( "?", "", $target );// replace the token string "[question_mark]" with the symbol "?"
    $target = str_replace( "[question_mark]", "?", $target );
       return 
$target;
    }
?>

Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)
-A

vasiliauskas dot agnius at gmail dot com ¶

4 years ago

When you need to convert from HTML-ENTITIES, but your UTF-8 string is partially broken (not all chars in UTF-8) - in this case passing string to mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES'); - corrupts chars in string even more. In this case you need to replace html entities gradually to preserve character good encoding. I wrote such closure for this job : <?php $decode_entities = function($string) { preg_match_all("/&#?w+;/", $string, $entities, PREG_SET_ORDER); $entities = array_unique(array_column($entities, 0)); foreach ($entities as $entity) { $decoded = mb_convert_encoding($entity, 'UTF-8', 'HTML-ENTITIES'); $string = str_replace($entity, $decoded, $string); } return $string; }; ?>

Stephan van der Feest ¶

17 years ago

To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:





  


                       
  




function htmltoflash($htmlstr)
{
  return str_replace("&lt;br /&gt;","n",
    str_replace("<","&lt;",
      str_replace(">","&gt;",
        mb_convert_encoding(html_entity_decode($htmlstr),
        "UTF-8","ISO-8859-1"))));
}

Daniel Trebbien ¶

13 years ago

Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape ''', '"', '<', '>', or '&'.

Julian Egelstaff ¶

2 months ago

If you have what looks like ISO-8859-1, but it includes "smart quotes" courtesy of Microsoft software, or people cutting and pasting content from Microsoft software, then what you're actually dealing with is probably Windows-1252. Try this:


<?php
$cleanText = mb_convert_encoding($text, 'UTF-8', 'Windows-1252');
?>

The annoying part is that the auto detection (ie: the mb_detect_encoding function) will often think Windows-1252 is ISO-8859-1. Close, but no cigar. This is critical if you're then trying to do unserialize on the resulting text, because the byte count of the string needs to be perfect.

Rainer Perske ¶

5 months ago

Text-encoding HTML-ENTITIES will be deprecated as of PHP 8.2.


To convert all non-ASCII characters into entities (to produce pure 7-bit HTML output), I was using:
<?php
echo mb_convert_encoding( htmlspecialchars( $text, ENT_QUOTES, 'UTF-8' ), 'HTML-ENTITIES', 'UTF-8' );
?>

I can get the identical result with:
<?php
echo mb_encode_numericentity( htmlentities( $text, ENT_QUOTES, 'UTF-8' ), [0x80, 0x10FFFF, 0, ~0], 'UTF-8' );
?>

The output contains well-known named entities for some often used characters and numeric entities for the rest.

bmxmale at qwerty dot re ¶

11 months ago

/** * Convert Windows-1250 to UTF-8 * Based on https://www.php.net/manual/en/function.mb-convert-encoding.php#112547 */ class TextConverter { private const ENCODING_TO = 'UTF-8'; private const ENCODING_FROM = 'ISO-8859-2';


    private array $mapChrChr = [
        0x8A => 0xA9,
        0x8C => 0xA6,
        0x8D => 0xAB,
        0x8E => 0xAE,
        0x8F => 0xAC,
        0x9C => 0xB6,
        0x9D => 0xBB,
        0xA1 => 0xB7,
        0xA5 => 0xA1,
        0xBC => 0xA5,
        0x9F => 0xBC,
        0xB9 => 0xB1,
        0x9A => 0xB9,
        0xBE => 0xB5,
        0x9E => 0xBE
    ];
    private array $mapChrString = [
        0x80 => '&euro;',
        0x82 => '&sbquo;',
        0x84 => '&bdquo;',
        0x85 => '&hellip;',
        0x86 => '&dagger;',
        0x87 => '&Dagger;',
        0x89 => '&permil;',
        0x8B => '&lsaquo;',
        0x91 => '&lsquo;',
        0x92 => '&rsquo;',
        0x93 => '&ldquo;',
        0x94 => '&rdquo;',
        0x95 => '&bull;',
        0x96 => '&ndash;',
        0x97 => '&mdash;',
        0x99 => '&trade;',
        0x9B => '&rsquo;',
        0xA6 => '&brvbar;',
        0xA9 => '&copy;',
        0xAB => '&laquo;',
        0xAE => '&reg;',
        0xB1 => '&plusmn;',
        0xB5 => '&micro;',
        0xB6 => '&para;',
        0xB7 => '&middot;',
        0xBB => '&raquo;'
    ];
    /**
     * @param $text
     * @return string
     */
    public function execute($text): string
    {
        $map = $this->prepareMap();
        return html_entity_decode(
            mb_convert_encoding(strtr($text, $map), self::ENCODING_TO, self::ENCODING_FROM),
            ENT_QUOTES,
            self::ENCODING_TO
        );
    }



  


                       
  




    /**
     * @return array
     */
    private function prepareMap(): array
    {
        $maps[] = $this->arrayMapAssoc(function ($k, $v) {
            return [chr($k), chr($v)];
        }, $this->mapChrChr);
        $maps[] = $this->arrayMapAssoc(function ($k, $v) {
            return [chr($k), $v];
        }, $this->mapChrString);
        return array_merge([], ...$maps);
    }
    /**
     * @param callable $function
     * @param array $array
     * @return array
     */
    private function arrayMapAssoc(callable $function, array $array): array
    {
        return array_column(
            array_map(
                $function,
                array_keys($array),
                $array
            ),
            1,
            0
        );
    }
}

urko at wegetit dot eu ¶

10 years ago

If you are trying to generate a CSV (with extended chars) to be opened at Exel for Mac, the only that worked for me was: <?php mb_convert_encoding( $CSV, 'Windows-1252', 'UTF-8'); ?> I also tried this:



<?php


//Separado OK, chars MAL


iconv('MACINTOSH', 'UTF8', $CSV);


//Separado MAL, chars OK


chr(255).chr(254).mb_convert_encoding( $CSV, 'UCS-2LE', 'UTF-8');


?>





But the first one didn't show extended chars correctly, and the second one, did't separe fields correctly

me at gsnedders dot com ¶

13 years ago

It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".

chzhang at gmail dot com ¶

14 years ago

instead of ini_set(), you can try this


mb_substitute_character("none");

katzlbtjunk at hotmail dot com ¶

15 years ago

Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful.


$fileName = 'Test:!"$%&/()=ÖÄÜöäü<<';
echo strtr(mb_convert_encoding($fileName,'ASCII'), 
    ' ,;:?*#!§$%&/(){}<>=`´|\'"', 
    '____________________________');

mac.com@nemo ¶

16 years ago

For those wanting to convert from $set to MacRoman, use iconv():


<?php
$string 
= iconv('UTF-8', 'macintosh', $string);?>

('macintosh' is the IANA name for the MacRoman character set.)

Tom Class ¶

17 years ago

Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:


HTML-ENTITIES
Example:
$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");

nicole ¶

7 years ago

// convert UTF8 to DOS = CP850 // // $utf8_text=UTF8-Formatted text; // $dos=CP850-Formatted text;


// have fun
$dos = mb_convert_encoding($utf8_text, "CP850", mb_detect_encoding($utf8_text, "UTF-8, CP850, ISO-8859-15", true));

Daniel ¶

7 years ago

If you are attempting to convert "UTF-8" text to "ISO-8859-1" and the result is always returning in "ASCII", place the following line of code before the mb_convert_encoding:


mb_detect_order(array('UTF-8', 'ISO-8859-1'));
It is necessary to force a specific search order for the conversion to work

lanka at eurocom dot od dot ua ¶

19 years ago

Another sample of recoding without MultiByte enabling. (Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)


<?php
  // 0 - win
  // 1 - koi
  function detect_encoding($str) {
    $win = 0;
    $koi = 0;
    for(
$i=0; $i<strlen($str); $i++) {
      if( ord($str[$i]) >224 && ord($str[$i]) < 255) $win++;
      if( ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++;
    }
    if( 
$win < $koi ) {
      return 1;
    } else return 0;
  }
// recodes koi to win
  function koi_to_win($string) {$kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218);
    $wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242,  243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209);$end = strlen($string);
    $pos = 0;
    do {
      $c = ord($string[$pos]);
      if ($c>128) {
        $string[$pos] = chr($kw[$c-128]);
      }
    } while (++
$pos < $end);
    return 
$string;
  }
  function 
recode($str) {$enc = detect_encoding($str);
    if ($enc==1) {
      $str = koi_to_win($str);
    }
    return 
$str;
  }
?>

nospam at nihonbunka dot com ¶

14 years ago

rodrigo at bb2 dot co dot jp wrote that inconv works better than mb_convert_encoding, I find that when converting from uft8 to shift_jis $conv_str = mb_convert_encoding($str,$toCS,$fromCS); works while $conv_str = iconv($fromCS,$toCS.'//IGNORE',$str); removes tildes from $str.

David Hull ¶

16 years ago

As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:


<?php
$text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text);
?>

The only disadvantage is that it does not convert "ä" to "ae", but it handles punctuation and other special characters better.
-- 
David

aofg ¶

15 years ago

When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them. Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.

jamespilcher1 — hotmail ¶

19 years ago

be careful when converting from iso-8859-1 to utf-8.


even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed. 
for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.
it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).
this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()  
IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.
so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.

gullevek at gullevek dot org ¶

12 years ago

If you want to convert japanese to ISO-2022-JP it is highly recommended to use ISO-2022-JP-MS as the target encoding instead. This includes the extended character set and avoids ? in the text. For example the often used "1 in a circle" ① will be correctly converted then.

StigC ¶

14 years ago

For the php-noobs (like me) - working with flash and php.


Here's a simple snippet of code that worked great for me, getting php to show special Danish characters, from a Flash email form:
<?php
// Name Escape
$escName = mb_convert_encoding($_POST["Name"], "ISO-8859-1", "UTF-8");// message escape
$escMessage = mb_convert_encoding($_POST["Message"], "ISO-8859-1", "UTF-8");// Headers.. and so on...
?>

rodrigo at bb2 dot co dot jp ¶

15 years ago

For those who can´t use mb_convert_encoding() to convert from one charset to another as a metter of lower version of php, try iconv().


I had this problem converting to japanese charset:
$txt=mb_convert_encoding($txt,'SJIS',$this->encode);
And I could fix it by using this:
$txt = iconv('UTF-8', 'SJIS', $txt);
Maybe it´s helpfull for someone else! ;)

phpdoc at jeudi dot de ¶

16 years ago

I'd like to share some code to convert latin diacritics to their traditional 7bit representation, like, for example,



- &agrave;,&ccedil;,&eacute;,&icirc;,... to a,c,e,i,...


- &szlig; to ss


- &auml;,&Auml;,... to ae,Ae,...


- &euml;,... to e,...


(mb_convert &quot;7bit&quot; would simply delete any offending characters).


I might have missed on your country's typographic


conventions--correct me then.


&lt;?php


/**


 * @args string $text line of encoded text


 *       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)


 *


 * @returns 7bit representation


 */


function to7bit($text,$from_enc) {


    $text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);


    $text = preg_replace(


        array('/&szlig;/','/&amp;(..)lig;/',


             '/&amp;([aouAOU])uml;/','/&amp;(.)[^;]*;/'),


        array('ss',&quot;$1&quot;,&quot;$1&quot;.'e',&quot;$1&quot;),


        $text);


    return $text;


}  


?&gt;


Enjoy :-)


Johannes


==


[EDIT BY danbrown AT php DOT net: Author provided the following update on 27-FEB-2012.]


==


An addendum to my &quot;to7bit&quot; function referenced below in the notes.


The function is supposed to solve the problem that some languages require a different 7bit rendering of special (umlauted) characters for sorting or other applications. For example, the German &szlig; ligature is usually written &quot;ss&quot; in 7bit context. Dutch &yuml; is typically rendered &quot;ij&quot; (not &quot;y&quot;).


The original function works well with word (alphabet) character entities and I've seen it used in many places. But non-word entities cause funny results:


E.g., &quot;&copy;&quot; is rendered as &quot;c&quot;, &quot;&shy;&quot; as &quot;s&quot; and &quot;&amp;rquo;&quot; as &quot;r&quot;.


The following version fixes this by converting non-alphanumeric characters (also chains thereof) to '_'.


&lt;?php


/**


 * @args string $text line of encoded text


 *       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)


 *


 * @returns 7bit representation


 */


function to7bit($text,$from_enc) {


    $text = preg_replace(/W+/,'_',$text);


    $text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);


    $text = preg_replace(


        array('/&szlig;/','/&amp;(..)lig;/',


             '/&amp;([aouAOU])uml;/','/&yuml;/','/&amp;(.)[^;]*;/'),


        array('ss',&quot;$1&quot;,&quot;$1&quot;.'e','ij',&quot;$1&quot;),


        $text);


    return $text;


} 


?&gt;


Enjoy again,


Johannes

qdb at kukmara dot ru ¶

11 years ago

mb_substr and probably several other functions works faster in ucs-2 than in utf-8. and utf-16 works slower than utf-8. here is test, ucs-2 is near 50 times faster than utf-8, and utf-16 is near 6 times slower than utf-8 here: <?php header('Content-Type: text/html; charset=utf-8'); mb_internal_encoding('utf-8');$s='укгезәөшөхзәхөшк2049һһлдябчсячмииюсит.июбҗрарэ' .'лдэфвәәуүйәуйүәу034928348539857әшаыдларорашһһрлоавы'; $s.=$s; $s.=$s; $s.=$s; $s.=$s; $s.=$s; $s.=$s; $s.=$s;$t1=microtime(true); $i=0; while($i<mb_strlen($s)){ $a=mb_substr($s,$i,2); $i+=2; if($i==10)echo$a.'. '; //echo$a.'. '; } echo$i.'. '; echo(microtime(true)-$t1);


echo
'<br>';
$s=mb_convert_encoding($s,'UCS-2','utf8');
mb_internal_encoding('UCS-2');
$t1=microtime(true);
$i=0;
while($i<mb_strlen($s)){
    $a=mb_substr($s,$i,2);
    $i+=2;
    if($i==10)echo mb_convert_encoding($a,'utf8','ucs2').'. ';
    //echo$a.'. ';
}
echo$i.'. ';
echo(microtime(true)-$t1);
echo

'<br>'; $s=mb_convert_encoding($s,'utf-16','ucs-2'); mb_internal_encoding('utf-16'); $t1=microtime(true); $i=0; while($i<mb_strlen($s)){ $a=mb_substr($s,$i,2); $i+=2; if($i==10)echo mb_convert_encoding($a,'utf8','utf-16').'. '; //echo$a.'. '; } echo$i.'. '; echo(microtime(true)-$t1);?> output: өх. 12416. 1.71738100052 өх. 12416. 0.0211279392242 өх. 12416. 11.2330229282

DanielAbbey at Hotmail dot co dot uk ¶

8 years ago

When using the Windows Notepad text editor, it is important to note that when you select 'Save As' there is an Encoding selection dropdown. The default encoding is set to ANSI, with the other two options being Unicode and UTF-8. Since most text on the web is in UTF-8 format it could prove vital to save the .txt file with this encoding, since this function does not work on ANSI-encoded text.

Stephan van der Feest ¶

17 years ago

Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.


Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:
function utf8html($utf8str)
{
  return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));
}

Edward ¶

14 years ago

If mb_convert_encoding doesn't work for you, and iconv gives you a headache, you might be interested in this free class I found. It can convert almost any charset to almost any other charset. I think it's wonderful and I wish I had found it earlier. It would have saved me tons of headache.


I use it as a fail-safe, in case mb_convert_encoding is not installed. Download it from http://mikolajj.republika.pl/
This is not my own library, so technically it's not spamming, right? ;)
Hope this helps.

jackycms at outlook dot com ¶

8 years ago

// mb_convert_encoding($input,'UTF-8','windows-874'); error : Illegal character encoding specified // so convert Thai to UTF-8 is better use iconv instead


<?php
iconv
("windows-874","UTF-8",$input);?>

mightye at gmail dot com ¶

15 years ago

To petruzanauticoyahoo?com!ar


If you don't specify a source encoding, then it assumes the internal (default) encoding.  ñ is a multi-byte character whose bytes in your configuration default (often iso-8859-1) would actually mean Ã±.  mb_convert_encoding() is upgrading those characters to their multi-byte equivalents within UTF-8.
Try this instead:
<?php
print mb_convert_encoding( "ñ", "UTF-8", "UTF-8" );
?>
Of course this function does no work (for the most part - it can actually be used to strip characters which are not valid for UTF-8).

Источник

Проблема кодировок часто возникает при написании парсеров, чтении данных из xml и CSV файлов. Ниже представлены способы эту проблему решить.

windows-1251 в UTF-8

$text = iconv('windows-1251//IGNORE', 'UTF-8//IGNORE', $text);
echo $text;

PHP

$text = mb_convert_encoding($text, 'UTF-8', 'windows-1251');
echo $text;

PHP

UTF-8 в windows-1251

$text = iconv('utf-8//IGNORE', 'windows-1251//IGNORE', $text);
echo $text;

PHP

$text = mb_convert_encoding($text, 'windows-1251', 'utf-8');
echo $text;

PHP

Когда ни что не помогает

$text = iconv('utf-8//IGNORE', 'cp1252//IGNORE', $text);
$text = iconv('cp1251//IGNORE', 'utf-8//IGNORE', $text);
echo $text;

PHP

Иногда доходит до бреда, но работает:

$text = iconv('utf-8//IGNORE', 'windows-1251//IGNORE', $text);
$text = iconv('windows-1251//IGNORE', 'utf-8//IGNORE', $text);
echo $text;

PHP

File_get_contents / CURL

Бывают случаи когда file_get_contents() или CURL возвращают иероглифы (ÐÐ»Ð¼Ð°Ð·Ð½ÑÐµ Ð±Ð¾ÑÑ) – причина тут не в кодировке, а в отсутствии BOM-метки.

$text = file_get_contents('https://example.com');
$text = "xEFxBBxBF" .  $text;
echo $text;

PHP

Ещё бывают случаи, когда file_get_contents() возвращает текст в виде:

�mw�Ƒ0��&IkAI��f��j4/{�</�&�h�� ({�񌝷o��:/��<g��g��(�=�9�Paɭ

Это сжатый текст в GZIP, т.к. функция не отправляет правильные заголовки. Решение проблемы через CURL:

function getcontents($url){
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
	curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
	curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
	$output = curl_exec($ch);
	curl_close($ch);
	return $output;
}

echo getcontents('https://example.com');

PHP

12.01.2017, обновлено 02.11.2021

Другие публикации

Отправка e-mail в кодировке UTF-8 с вложенными файлами и возможные проблемы.

JSON (JavaScript Object Notation) – текстовый формат обмена данными, основанный на JavaScript, который представляет собой набор пар {ключ: значение}. Значение может быть массивом, числом, строкой и…

Описание значений глобального массива $_SERVER с примерами.

Так как Instagram и Fasebook ограничили доступ к API, а фото с открытого аккаунта всё же нужно периодически получать и…

В статье представлены различные PHP-расширения для чтения файлов XLS, XLSX, описаны их плюсы и минусы, а также примеры…

Примеры как зарегистрировать бота в Телеграм, описание и взаимодействие с основными методами API.

Источник

mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5, PHP 7)

mb_convert_encoding — Преобразует кодировку символов

Описание

string mb_convert_encoding
( string $str
, string $to_encoding
[, mixed $from_encoding = mb_internal_encoding()
] )

Преобразует символы строки string str
в кодировку to_encoding.
Также можно указать необязательный параметр from_encoding.

Список параметров

str

Строка (string), которая преобразуется.

to_encoding

Кодировка, в которую будет преобразована строка str.

from_encoding

Параметр для указания исходной кодировки строки. Это может быть
массив (array), или строка со списком кодировок через запятую.
Если параметр from_encoding не указан, то
кодировка определяется автоматически.

Смотри поддерживаемые
кодировки.

Возвращаемые значения

Преобразованная строка.

Примеры

Пример #1 Пример использования mb_convert_encoding()

<?php /* Преобразует строку в кодировку SJIS */ $str = mb_convert_encoding($str, "SJIS");/* Преобразует из EUC-JP в UTF-7 */ $str = mb_convert_encoding($str, "UTF-7", "EUC-JP");/* Автоматически определяется кодировка среди JIS, eucjp-win, sjis-win, затем преобразуется в UCS-2LE */ $str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");/* "auto" используется для обозначения "ASCII,JIS,UTF-8,EUC-JP,SJIS" */ $str = mb_convert_encoding($str, "EUC-JP", "auto"); ?>

Смотрите также

mb_detect_order() — Установка/получение списка кодировок для механизмов определения кодировки

Вернуться к: Функции для работы с Многобайтными строками

Источник

Содержание

Заметки Лёвика
web программирование, администрирование и всякая всячина, которая может оказаться полезной
Если не работает iconv
iconv array для массива
mb_convert_encoding
Описание
Список параметров
Возвращаемые значения
Ошибки
Список изменений
Примеры
Смотрите также
User Contributed Notes 32 notes
iconv
Описание
Список параметров
Возвращаемые значения
Примеры
User Contributed Notes 39 notes
Технарь
Блог о программировании и околопрограммерских штуках.
Конвертация строки из utf-8 в win-1251 на PHP
Конвертация строки из utf-8 в win-1251 на PHP : 5 комментариев
utf8_encode
Описание
Список параметров
Возвращаемые значения
Список изменений
Смотрите также
User Contributed Notes 23 notes

Заметки Лёвика

web программирование, администрирование и всякая всячина, которая может оказаться полезной

При помощи функции php iconv (строго говоря, это не совсем функция PHP, она использует стороннюю библиотеку (есть iconv.dll и php_iconv.dll или iconv.so), которой может не быть на хостинге) легко преобразовать кодировку (например, из windows-1251 в utf-8 и наоборот:

Если не работает iconv

Т.е. чтобы преобразовать текст из кодировки windows-1251 в UTF-8 следует выполнить:
mb_convert_encoding($s,»UTF-8″,»windows-1251″);

iconv array для массива

Метки: iconv

Опубликовано Пятница, Октябрь 21, 2011 в 15:02 в следующих категориях: Без рубрики. Вы можете подписаться на комментарии к этому сообщению через RSS 2.0. Вы можете оставить комментарий. Пинг отключен.

Автор будет признателен, если Вы поделитесь ссылкой на статью, которая Вам помогла:
BB-код (для вставки на форум)

html-код (для вставки в ЖЖ, WP, blogger и на страницы сайта)

ссылка (для отправки по почте)

Как быть с запросом select к базе mssql не понимает кирилицу
“select
[Название]
,[номер]
, [Removed]
from imdb.dbo. Оконечное оборудование “;

Следует привести столбцы (или всю базу данных сразу) к соответствующему сравнению (кодировке)
ALTER DATABASE COLLATE Cyrillic_General_CI_AS

Или использовать Nvarchar

declare @test TABLE
(
Col1 varchar(40),
Col2 varchar(40),
Col3 nvarchar(40),
Col4 nvarchar(40)
)
INSERT INTO @test VALUES
(‘иытание’,N’иытание’,’иытание’,N’иытание’)
SELECT * FROM @test

Если изменяю версию php 5.6 то не перекодируется. Не подскажете?

Источник

mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP

mb_convert_encoding — Преобразует кодировку символов

Описание

Список параметров

Строка ( string ) или массив ( array ), для преобразования.

Параметр для указания исходной кодировки строки. Это может быть массив ( array ), или строка со списком кодировок через запятую. Если параметр from_encoding не указан, то кодировка определяется автоматически.

Возвращаемые значения

Преобразованная строка ( string ) или массив ( array ) или false в случае возникновения ошибки.

Ошибки

Список изменений

Примеры

Пример #1 Пример использования mb_convert_encoding()

Смотрите также

User Contributed Notes 32 notes

For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:

I’ve been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i’ve found it in this way:

= «A strange string to pass, maybe with some ø, æ, å characters.» ;

Hope can help someone

in your php.ini. Be sure to include the quotes around none. Or at run-time with

Hey guys. For everybody who’s looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here’s your solution:

public function encodeToUtf8($string) <
return mb_convert_encoding($string, «UTF-8», mb_detect_encoding($string, «UTF-8, ISO-8859-1, ISO-8859-15», true));
>

public function encodeToIso($string) <
return mb_convert_encoding($string, «ISO-8859-1», mb_detect_encoding($string, «UTF-8, ISO-8859-1, ISO-8859-15», true));
>

For me these functions are working fine. Give it a try

My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)

Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding.

Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)

Источник

iconv

(PHP 4 >= 4.0.5, PHP 5, PHP 7, PHP

iconv — Преобразование строки в требуемую кодировку

Описание

Список параметров

Кодировка входной строки.

Требуемая на выходе кодировка.

Строка, которую необходимо преобразовать.

Возвращаемые значения

Возвращает преобразованную строку или false в случае возникновения ошибки.

Примеры

Пример #1 Пример использования iconv()

Результатом выполнения данного примера будет что-то подобное:

User Contributed Notes 39 notes

The «//ignore» option doesn’t work with recent versions of the iconv library. So if you’re having trouble with that option, you aren’t alone.

That means you can’t currently use this function to filter invalid characters. Instead it silently fails and returns an empty string (or you’ll get a notice but only if you have E_NOTICE enabled).

[UPDATE 15-JUN-2012]
Here’s a workaround.

ini_set(‘mbstring.substitute_character’, «none»);
$text= mb_convert_encoding($text, ‘UTF-8’, ‘UTF-8’);

That will strip invalid characters from UTF-8 strings (so that you can insert it into a database, etc.). Instead of «none» you can also use the value 32 if you want it to insert spaces in place of the invalid characters.

Interestingly, setting different target locales results in different, yet appropriate, transliterations. For example:

//some German
$utf8_sentence = ‘Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz’ ;

to test different combinations of convertions between charsets (when we don’t know the source charset and what is the convenient destination charset) this is an example :

Like many other people, I have encountered massive problems when using iconv() to convert between encodings (from UTF-8 to ISO-8859-15 in my case), especially on large strings.

The main problem here is that when your string contains illegal UTF-8 characters, there is no really straight forward way to handle those. iconv() simply (and silently!) terminates the string when encountering the problematic characters (also if using //IGNORE), returning a clipped string. The

workaround suggested here and elsewhere will also break when encountering illegal characters, at least dropping a useful note («htmlentities(): Invalid multibyte sequence in argument in. «)

I have found a lot of hints, suggestions and alternative methods (it’s scary and in my opinion no good sign how many ways PHP natively provides to convert the encoding of strings), but none of them really worked, except for this one:

If you are getting question-marks in your iconv output when transliterating, be sure to ‘setlocale’ to something your system supports.

Some PHP CMS’s will default setlocale to ‘C’, this can be a problem.

use the «locale» command to find out a list..

For those who have troubles in displaying UCS-2 data on browser, here’s a simple function that convert ucs2 to html unicode entities :

Here is how to convert UCS-2 numbers to UTF-8 numbers in hex:

echo strtoupper ( ucs2toutf8 ( «06450631062D0020» ));

Input:
06450631062D
Output:
D985D8B1D8AD

If you want to convert to a Unicode encoding without the byte order mark (BOM), add the endianness to the encoding, e.g. instead of «UTF-16» which will add a BOM to the start of the string, use «UTF-16BE» which will convert the string without adding a BOM.

I just found out today that the Windows and *NIX versions of PHP use different iconv libraries and are not very consistent with each other.

Here is a repost of my earlier code that now works on more systems. It converts as much as possible and replaces the rest with question marks:

I use this function that does’nt need any extension :

I have not tested it extensively, hope it may help.

I have used iconv to convert from cp1251 into UTF-8. I spent a day to investigate why a string with Russian capital ‘Р’ (sounds similar to ‘r’) at the end cannot be inserted into a database.

The problem is not in iconv. But ‘Р’ in cp1251 is chr(208) and ‘Р’ in UTF-8 is chr(208).chr(106). chr(106) is one of the space symbol which match ‘s’ in regex. So, it can be taken by a greedy ‘+’ or ‘*’ operator. In that case, you loose ‘Р’ in your string.

For example, ‘ГР ‘ (Russian, UTF-8). Function preg_match. Regex is ‘(.+?)[s]*’. Then ‘(.+?)’ matches ‘Г’.chr(208) and ‘[s]*’ matches chr(106).’ ‘.

Although, it is not a bug of iconv, but it looks like it very much. That’s why I put this comment here.

Here is an example how to convert windows-1251 (windows) or cp1251(Linux/Unix) encoded string to UTF-8 encoding.

Источник

Технарь

Блог о программировании и околопрограммерских штуках.

Конвертация строки из utf-8 в win-1251 на PHP

Для конвертации на php строки из utf-8 в windows-1251 и наоборот, можно использовать следующую функцию:

если необходимо обратное действие, то:

Описание функции iconv:
string iconv ( string from_kodirovka, string to_kodirovka, string str )
Производит преобразование кодировки символов строки str из начальной кодировки from_kodirovka в конечную to_kodirovka. Возвращает строку в новой кодировке, или FALSE в случае ошибки.

Если добавить //TRANSLIT к параметру out_charset будет включена транслитеризация. Это означает, что вслучае, когда символа нет в конечной кодировке, он заменяется одним или несколькими аналогами. Если добавить //IGNORE, то символы, которых нет в конечной кодировке, будут опущены. Иначе, будет возвращена строка str, обрезанная до первого недопустимого символа.

В случае, если ваш хостинг не поддерживает iconv, для конвертации из utf-8 в win-1251 и наоборот можно использовать следующие функции:

Конвертация строки из utf-8 в win-1251 на PHP : 5 комментариев

Ой, большое спасибо! Были большие проблемы с кодировками при использовании аякса, с Вашей функцией все встало нормально

ну просто восхитительные функции — поставил и забыл про конвертацию)

почему-то после конвертации из utf8 в win1251 вместо букв вопросительные знаки:
.

а у меня не работает 🙁

ругается на вот эту строку:

Источник

utf8_encode

(PHP 4, PHP 5, PHP 7, PHP

utf8_encode — Кодирует строку ISO-8859-1 в кодировке UTF-8

Описание

Эта функция конвертирует строку string из кодировки ISO-8859-1 в UTF-8

Список параметров

Возвращаемые значения

Список изменений

Версия	Описание
7.2.0	Функция была перенесена в ядро PHP, таким образом отменив требование модуля XML для использования этой функции.

Смотрите также

User Contributed Notes 23 notes

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be «iso88591_to_utf8». If your text is not encoded in ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

Here’s some code that addresses the issue that Steven describes in the previous comment;

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
the UTF-8 encoding of the non-control characters that Windows-1252 places
at the equivalent code points. */

Walk through nested arrays/objects and utf8 encode all strings.

If you need a function which converts a string array into a utf8 encoded string array then this function might be useful for you:

My version of utf8_encode_deep,
In case you need one that returns a value without changing the original.

I tried a lot of things, but this seems to be the final fail save method to convert any string to proper UTF-8.

If your string to be converted to utf-8 is something other than iso-8859-1 (such as iso-8859-2 (Polish/Croatian)), you should use recode_string() or iconv() instead rather than trying to devise complex str_replace statements.

If you are looking for a function to replace special characters with the hex-utf-8 value (e.g. für Webservice-Security/WSS4J compliancy) you might use this:

$textstart = «Größe»;
$utf8 =»;
$max = strlen($txt);

I was searching for a function similar to Javascript’s unescape(). In most cases it is OK to use url_decode() function but not if you’ve got UTF characters in the strings. They are converted into %uXXXX entities that url_decode() cannot handle.
I googled the net and found a function which actualy converts these entities into HTML entities (&#xxx;) that your browser can show correctly. If you’re OK with that, the function can be found here: http://pure-essence.net/stuff/code/utf8RawUrlDecode.phps

But it was not OK with me because I needed a string in my charset to make some comparations and other stuff. So I have modified the above function and in conjuction with code2utf() function mentioned in some other note here, I have managed to achieve my goal:

// Validate Unicode UTF-8 Version 4
// This function takes as reference the table 3.6 found at http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
// It also flags overlong bytes as error

This function may be useful do encode array keys and values [and checks first to see if it’s already in UTF format]:

[NOTE BY danbrown AT php DOT net: Original function written by (cmyk777 AT gmail DOT com) on 28-JAN-09.]

Avoiding use of preg_match to detect if utf8_encode is needed:

I recommend using this alternative for every language:

Don’t forget to set all your pages to «utf-8» encoding, otherwise just use HTML entities.

This function I use convert Thai font (iso-8859-11) to UTF-8. For my case, It work properly. Please try to use this function if you have a problem to convert charset iso-8859-11 to UTF-8.

$iso8859_11 = array(
«xa1» => «xe0xb8x81»,
«xa2» => «xe0xb8x82»,
«xa3» => «xe0xb8x83»,
«xa4» => «xe0xb8x84»,
«xa5» => «xe0xb8x85»,
«xa6» => «xe0xb8x86»,
«xa7» => «xe0xb8x87»,
«xa8» => «xe0xb8x88»,
«xa9» => «xe0xb8x89»,
«xaa» => «xe0xb8x8a»,
«xab» => «xe0xb8x8b»,
«xac» => «xe0xb8x8c»,
«xad» => «xe0xb8x8d»,
«xae» => «xe0xb8x8e»,
«xaf» => «xe0xb8x8f»,
«xb0» => «xe0xb8x90»,
«xb1» => «xe0xb8x91»,
«xb2» => «xe0xb8x92»,
«xb3» => «xe0xb8x93»,
«xb4» => «xe0xb8x94»,
«xb5» => «xe0xb8x95»,
«xb6» => «xe0xb8x96»,
«xb7» => «xe0xb8x97»,
«xb8» => «xe0xb8x98»,
«xb9» => «xe0xb8x99»,
«xba» => «xe0xb8x9a»,
«xbb» => «xe0xb8x9b»,
«xbc» => «xe0xb8x9c»,
«xbd» => «xe0xb8x9d»,
«xbe» => «xe0xb8x9e»,
«xbf» => «xe0xb8x9f»,
«xc0» => «xe0xb8xa0»,
«xc1» => «xe0xb8xa1»,
«xc2» => «xe0xb8xa2»,
«xc3» => «xe0xb8xa3»,
«xc4» => «xe0xb8xa4»,
«xc5» => «xe0xb8xa5»,
«xc6» => «xe0xb8xa6»,
«xc7» => «xe0xb8xa7»,
«xc8» => «xe0xb8xa8»,
«xc9» => «xe0xb8xa9»,
«xca» => «xe0xb8xaa»,
«xcb» => «xe0xb8xab»,
«xcc» => «xe0xb8xac»,
«xcd» => «xe0xb8xad»,
«xce» => «xe0xb8xae»,
«xcf» => «xe0xb8xaf»,
«xd0» => «xe0xb8xb0»,
«xd1» => «xe0xb8xb1»,
«xd2» => «xe0xb8xb2»,
«xd3» => «xe0xb8xb3»,
«xd4» => «xe0xb8xb4»,
«xd5» => «xe0xb8xb5»,
«xd6» => «xe0xb8xb6»,
«xd7» => «xe0xb8xb7»,
«xd8» => «xe0xb8xb8»,
«xd9» => «xe0xb8xb9»,
«xda» => «xe0xb8xba»,
«xdf» => «xe0xb8xbf»,
«xe0» => «xe0xb9x80»,
«xe1» => «xe0xb9x81»,
«xe2» => «xe0xb9x82»,
«xe3» => «xe0xb9x83»,
«xe4» => «xe0xb9x84»,
«xe5» => «xe0xb9x85»,
«xe6» => «xe0xb9x86»,
«xe7» => «xe0xb9x87»,
«xe8» => «xe0xb9x88»,
«xe9» => «xe0xb9x89»,
«xea» => «xe0xb9x8a»,
«xeb» => «xe0xb9x8b»,
«xec» => «xe0xb9x8c»,
«xed» => «xe0xb9x8d»,
«xee» => «xe0xb9x8e»,
«xef» => «xe0xb9x8f»,
«xf0» => «xe0xb9x90»,
«xf1» => «xe0xb9x91»,
«xf2» => «xe0xb9x92»,
«xf3» => «xe0xb9x93»,
«xf4» => «xe0xb9x94»,
«xf5» => «xe0xb9x95»,
«xf6» => «xe0xb9x96»,
«xf7» => «xe0xb9x97»,
«xf8» => «xe0xb9x98»,
«xf9» => «xe0xb9x99»,
«xfa» => «xe0xb9x9a»,
«xfb» => «xe0xb9x9b»
);

// Reads a file story.txt ascii (as typed on keyboard)
// converts it to Georgian character using utf8 encoding
// if I am correct(?) just as it should be when typed on Georgian computer
// it outputs it as an html file
//
// http://www.comweb.nl/keys_to_georgian.html
// http://www.comweb.nl/keys_to_georgian.php
// http://www.comweb.nl/story.txt

keys to unicode code

// this meta tag is needed

// note the sylfean font seems to be standard installed on Windows XP
// It supports Georgian

Re the previous post about converting GB2312 code to Unicode code which displayed the following function:

In the original function, the first latin chacter was dropped and it was not converting the first non-latin character after the latin text (everything was shifted one character too far to the right). Reversing those two lines makes it work correctly in every example I have tried.

Also, the source of the gb2312.txt file needed for this to work has changed. You can find it a couple places:

Someday they might be hardcoded into PHP.
*/

The following Perl regular expression tests if a string is well-formed Unicode UTF-8 (Broken up after each | since long lines are not permitted here. Please join as a single line, no spaces, before use.):

Источник

mb_convert_encoding — Convert a string from one character encoding to another.

Use the strval() Function You can simply use type casting or the strval() function to convert an integer to a string in PHP.

UTF-8 (UCS Transformation Format is the World Wide Web’s most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.

Choose a different encoding for one document Click Options in the lower-left corner of the window. Click the Plain Text Encoding pop-up menu and choose an encoding. If you don’t see the encoding you want, choose Customize Encodings List, then select the encodings to include. Click Open.

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP

mb_convert_encoding — Convert a string from one character encoding to another

Description

mb_convert_encoding(array|string $string, string $to_encoding, array|string|null $from_encoding = null): array|string|false

Converts string from from_encoding, or the current internal encoding, to to_encoding. If string is an array, all its string values will be converted recursively.

Parameters

string

The string or array to be converted.

to_encoding

The desired encoding of the result.

from_encoding

The current encoding used to interpret string. Multiple encodings may be specified as an array or comma separated list, in which case the correct encoding will be guessed using the same algorithm as mb_detect_encoding().

If from_encoding is null or not specified, the mbstring.internal_encoding setting will be used if set, otherwise the default_charset setting.

See supported encodings for valid values of to_encoding and from_encoding.

Return Values

The encoded string or array on success, or false on failure.

Errors/Exceptions

As of PHP 8.0.0, a ValueError is thrown if the value of to_encoding or from_encoding is an invalid encoding. Prior to PHP 8.0.0, a E_WARNING was emitted instead.

Changelog

Version	Description
8.0.0	mb_convert_encoding() will now throw a ValueError when `to_encoding` is passed an invalid encoding.
8.0.0	mb_convert_encoding() will now throw a ValueError when `from_encoding` is passed an invalid encoding.
8.0.0	`from_encoding` is nullable now.
7.2.0	This function now also accepts an array as `string`. Formerly, only strings have been supported.

Examples

Example #1 mb_convert_encoding() example

<?php

$str = mb_convert_encoding($str, "SJIS");


$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");


$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");


$str = mb_convert_encoding($str, "EUC-JP", "auto");
?>

Описание

Список параметров

Возвращаемые значения

Ошибки

Список изменений

Примеры

Смотрите также

windows-1251 в UTF-8

UTF-8 в windows-1251

Когда ни что не помогает

File_get_contents / CURL

Другие публикации

mb_convert_encoding

Описание

Список параметров

Возвращаемые значения

Примеры

Смотрите также

Заметки Лёвика

web программирование, администрирование и всякая всячина, которая может оказаться полезной

Если не работает iconv

iconv array для массива

mb_convert_encoding

Описание

Список параметров

Возвращаемые значения

Ошибки

Список изменений

Примеры

Смотрите также

User Contributed Notes 32 notes

iconv

Описание

Список параметров

Возвращаемые значения

Примеры

User Contributed Notes 39 notes

Технарь

Блог о программировании и околопрограммерских штуках.

Конвертация строки из utf-8 в win-1251 на PHP

Конвертация строки из utf-8 в win-1251 на PHP : 5 комментариев

utf8_encode

Описание

Список параметров

Возвращаемые значения

Список изменений

Смотрите также

User Contributed Notes 23 notes

Description

Parameters

Return Values

Errors/Exceptions

Changelog

Examples

See Also

Вот еще несколько интересных статей: