Unsupported Screen Size: The viewport size is too small for the theme to render properly.

Data Encoding & Synch

  • This topic is empty.
Viewing 10 reply threads
  • Author
    Posts
    • #9888
      Kochiro
      Member

      Hello,

      I'm running SQLyog Enterprise version 5.02. I have two mysql databases running version 4.1.12 on the source and version 4.1.11 on the target. When I run the synchronization any characters such as trade marks, copyrights or any of MS Word's proprietary ASCII character extensions such as the extended dash are output as strange characters that are rendered as question marks by a web browser running against the target db.

      I honestly don't know what format the source table data is in, I'm assuming that it is in the default format for mysql 4.1.12 as the person who created the tables said that they didn't change any default options when creating the tables. If I look at the “Advanced Properties” of the tables in the GUI I can see that the default collation is “latin1_swedish_ci” so that hasn't been changed. The tables are also of type MyISAM.

      From what I've read on mysql.com the default character set should be latin1. However, if I do a manual mysqldump and specify either utf8 or latin1 as the encoding the special characters still come through incorrect. Those special characters are rendered properly by a web browser running against the source db. It is only the export/synchronization that causes the characters to be converted to something that isn't understood.

      So my questions are:

      1. How can I verify exactly what character set the source db is using?

      2. What do I need to do to export those characters properly so that they can be imported exactly the same way (through both a manual mysqldump and through the gui in SQLyog).

      Thanks,

      Jay

    • #22460
      peterlaursen
      Participant

      1)

      To see all character set-realatet information execute

      Code:
      USE some_database;
      SHOW variables LIKE '%character%';

      2)

      Both SQLyog export and mysqldump (recnet versions of both) encode the saved file as UFT8 as default no matter how it is stored. I do not think you need to do anything!

      3)

      As far a SQLyog data sync goes it is not possible to sync data in latin1 and utf8. Numbers and ASCII-characters will sync but any row with national, special characters will fail, as they are encoded using 1 byte in latin1 and 2 bytes in utf8. It is not possible for us to convert before executeing concat_ws(). We do SET NAMES but it only affects the client and the client-server communication.

      I think we should CHECK for it however!

      This “It is only the export/synchronization that causes the characters to be converted to something that isn't understood.” .. could be wrong! I think your script may do one error when it saves the the 'reverse error' when reading.

      roadmaps for exploring into this:

      1)

      Try creating a very small table with the problem. And copy the .MYD, MYI and .frm files in here (zip into one!). VERY SMALL I wrote. 3 columns – 2 rows for instance. Dont forget any special characters that you use!

      2)

      With SQLyog 5.2 beta3 (5.19 wont work here!) back up using 'scheduled backup powertool' with UTF8 option unchecked and inspect data in a HEX-editor. HEX-editor will show every character/byte-value as it is stored.

    • #22461
      Kochiro
      Member

      Thanks Peter, I think I understand a bit more now.

      When I ran the show variables command here is what was returned:

      Code:
      +————————–+—————————-+
      | Variable_name | Value |
      +————————–+—————————-+
      | character_set_client | latin1 |
      | character_set_connection | latin1 |
      | character_set_database | latin1 |
      | character_set_results | latin1 |
      | character_set_server | latin1 |
      | character_set_system | utf8 |
      | character_sets_dir | /usr/share/mysql/charsets/ |
      +————————–+—————————-+

      Here's a specific example, if you still need an example table let me know.

      This is what is displayed in the web browser when running against the source database:

      Code:
      How’s it going?

      Note that the apostrophe is the special character, instead of a standard one it is one of those special MS Word ones that have the thicker “bulb” on the top portion of the symbol.

      In the database that is getting stored as (copied directly out of SQLyog):

      Code:
      How’s it going?

      I don't fully understand where or how this is happening. My thought was that it was due to the fact that the default encoding of php (utf-8 I believe) understands the MS Word character and interprets it properly. But when it is saved to the database, mysql doesn't understand it and stores it as the strange characters. I can live with that as I am the only one who will be viewing the data in the database directly.

      The issue that I run into is that when I export that data it exports just as it is stored. But when the browser looks at the same record on the target database it just shows the garbage instead of converting it into an apostrophe like it does on the source database.

      I hope that makes some sense and helps. It might not really be an export issue but I don't understand why the same data in both databases doesn't display the same way when viewed with a web browser.

      When this is done manually with a mysqldump is there a change that I can make to the mysql import statement that will make the data display the same way?

      When this is done with a scheduled synch in SQLyog is there a change that I can make to make the data display the same way?

      Thanks,

      Jay

    • #22462
      peterlaursen
      Participant

      It will not work with communication in a browser here. The webserver/browser does encoding of their own!

      zip everything and attach that you want to communicate! I need to investigate every single character in a HEX-editor!

      But I think that 'latin1' as charset is a bad choice here. I would try using utf8 and disable this PHP feature if possible! Or maybe even better save as VARBINARY/BINARY/BLOB and not VARCHAR/CHAR/TEXT (and so simply store as BYTES, not CHARACTERS). It likely works OK with PHP but as soon as you use other clients ….

      I'll concentrate on the sync issue as it is 'our business'.

      You write: “The issue that I run into is that when I export that data it exports just as it is stored. But when the browser looks at the same record on the target database it just shows the garbage instead of converting it into an apostrophe like it does on the source database.”

      I do not think it saves as stored! SQLyog export tool does utf8-encoding! But if you import it should converted back.

      First:

      =====

      are you

      1) dumping/restoring

      2) COPYing

      3) or SYNCing?

      With DATA SYNC there should be no issue/difference if the same charset is used on source and target!

      Second:

      ======

      Is the source and target database on the same server? if not are you then sure that webserver/PHP configuration is identical?

    • #22463
      Kochiro
      Member
      peterlaursen wrote on Sep 19 2006, 03:55 PM:
      First:

      =====

      are you

      1) dumping/restoring

      2) COPYing

      3) or SYNCing?

      With DATA SYNC there should be no issue/difference if the same charset is used on source and target!

      Second:

      ======

      Is the source and target database on the same server? if not are you then sure that webserver/PHP configuration is identical?

      Peter,

      Thanks again for the quick response. To answer your questions, I am running a scheduled synch. The source and target databases are on different servers. I believe that the general server and php configurations are the same. As I said originally the source database is 0.0.1 version higher than the target. I ran the show variables on the target and source again and they are identical.

      I will check into all the things you mentioned and reply back with anything that I find.

      Thanks,

      Jay

    • #22464
      peterlaursen
      Participant

      I simply think that your special apostrophe gets stored/encoded as HEX(E2 80 99). That displays as '’' in latin1. And your source PHP understands to decode this – but not on the target.

      Now .. you could try to change the connection string in the PHP script on source to point to the target (if you can connect). What does it now display as?

      Try

      Code:
      SELECT HEX(yourcolumn) from yourtable where into outfile ;

      zip the file and attach it here and lets us see the HEX values of how that apostrophe is stored!

      (don't forget to quote the file name and escape ''s in file names like 'c:\test.txt')

      or just

      Code:
      SELECT HEX(yourcolumn) from yourtable where ;

      to display the HEX-values on the screen

    • #22465
      peterlaursen
      Participant

      BTW: you should consider upgrading SQLyog. At least when 5.2 gets FINAL.

      hello ..

      I also want to see

      Code:
      SHOW variables LIKE '%character%';

      on BOTH hosts! the one you showed me was that source or target?

    • #22466
      Kochiro
      Member

      Peter,

      The “show variables” output was the same on both hosts so the one I showed you does represent both.

      I think that you are right about this but it is going to take me some time to test all the options you have given me. I also just had something else dumped in my lap that has a higher priority.

      I will go through all of your suggestions and reply back with what I find.

      Thanks again,

      Jay

    • #22467
      Kochiro
      Member

      Well…it's been quite a while since I have had time to visit this issue. I'm still trying to understand some aspects of what is setup where and why certain things happen. I apologize if this isn't the best place to ask, but the people here seem to be the most helpful. If there is a more appropriate mysql/php forum where I could post this, please feel free to direct me there.

      Here is an example similar to what Peter was requesting in the last post.

      This is what is output on the frontend of the source database and what was entered into the database by a user:

      Marchés

      This is what I see in SQLYog when viewing the same record:

      Marchés

      This is the hex output of the same record given by the query “select hex(field) from table where condition”

      4D61726368C3A973

      What I am trying to understand is this:

      In the source database the character “é” is stored as “é” but yet still output as “é”. When that data is pushed to the target server (again nearly identical setups) the character “é” is actually displayed as “é” and not “é”.

      I don't understand what is changing during the process of moving the data from one server to another.

      Thank you,

      Jay

    • #22468
      peterlaursen
      Participant

      “In the source database the character “é” is stored as “é” but yet still output as “é”.”

      How can you tell that it is stored like that and it is not only a client displaying like that?

      I think there might be two errors here.  “é” might not be encoded correctly when sent to the server (and thus not stored correctly), but the error is 'reversed' when you retrive data again.  But of course data shold be stored correctly.  Because if it is not it might work with one client but not others (including website scripts).

      Basically please consider upgrading to SQLyog 6.0.  We cannot be responsible for any 'client side encoding' done by PHP.  We will not support issues with 5.02 anymore eitehr.  You can try 6.0 RC (even ENTERPRISE) for free. Just install to the same system as 5.02.

      If you still use MySQL 4.1.12 you should also consider upgrading the server!

      I think that this post is now so old and the involved software versions are so old too, that I must request a fresh report with recent software version!

    • #22469
      Kochiro
      Member

      Actually, please disregard this. I finally found the issue after searching all day and compiling a few different resources together from more google page results than I care to admit, as well as some posts from this site.

      What I found was that there was a slightly different order to the charsets in the apache configurations which was causing the differences. Also, when doing manual mysqldump exports I had to specify the default charset as latin1, even though that's not what I get when doing the tests on the database to see what the charsets are. I'm not sure of the reason for that, but as long as I can work around it, I really don't care at the moment.

      Thank you for reading and helping.

      Jay

Viewing 10 reply threads
  • You must be logged in to reply to this topic.