Help With Where During Sync

This topic is empty.

Viewing 13 reply threads

Author

Posts
- May 25, 2011 at 7:53 am #32286
  
  peterlaursen
  Participant
  
  I wonder if you have an index on this DATETIME column? If not a lot of full table scans could happen.
  
  Anyway it is a little difficult for us to understand. Maybe you could post the SHOW CREATE TABLE output for this table as well as the exact WHERE.clauses you use. Next we can frame queries that you can profile/EXPLAIN.
- May 25, 2011 at 2:00 pm #32287
  
  Rich
  Member
  
  'peterlaursen' wrote:
  
  I wonder if you have an index on this DATETIME column? If not a lot of full table scans could happen.
  
  Anyway it is a little difficult for us to understand. Maybe you could post the SHOW CREATE TABLE output for this table as well as the exact WHERE.clauses you use. Next we can frame queries that you can profile/EXPLAIN.
  
  I meant to make clear that a primary key is made up of the 'Datatime' (which is timestamp) and 'Tagname' (which is varchar) columns. I am under the impression that they are then indexed.
  
  The SHOW CREATE TABLE has the following result:
  
  CREATE TABLE `tag_data15mins` (
  
  `TAGNAME` varchar(255) NOT NULL,
  
  `DATATIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  `VALUE` float DEFAULT NULL,
  
  `EXPIRY` timestamp NULL DEFAULT NULL,
  
  `LASTFETCHED` timestamp NULL DEFAULT NULL,
  
  `ERROR_DESC` varchar(512) DEFAULT NULL,
  
  `QUALITY` float DEFAULT NULL,
  
  `WWRESOLUTION` float DEFAULT NULL,
  
  `WWRETRIEVALMODE` varchar(255) DEFAULT NULL,
  
  PRIMARY KEY (`TAGNAME`,`DATATIME`),
  
  CONSTRAINT `TAG_DATA15MINS_FK` FOREIGN KEY (`TAGNAME`) REFERENCES `tagnames` (`TAGNAME`) ON DELETE CASCADE
  
  ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  
  I tried the the following query in sqlYog, which would be ok for my sync job (but extended the time as mentioned in original post), however I would really like to always be looking back a full 24 hours:
  
  SELECT * , TO_DAYS(NOW()) – TO_DAYS(datatime) AS diff FROM tag_data15mins
  
  WHERE tagname LIKE 'WBSF_INCIN1_BREACHOUTLET'
  
  AND TO_DAYS(NOW()) – TO_DAYS(datatime) < 1
  
  ORDER BY datatime DESC
- May 25, 2011 at 3:48 pm #32288
  
  peterlaursen
  Participant
  
  I am not sure if this index will work with such where clause. I think it will not. The DATETIME column is not first/leftmost in the index.
  
  Please refer http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html “MySQL cannot use an index if the columns do not form a leftmost prefix of the index.”
  
  Could you please try to add an index on that column alone? I know it could take some time to build the index.
  
  Also try to EXPLAIN a query using a WHERE clause on the column. Is the index used?
- May 25, 2011 at 9:48 pm #32289
  
  Rich
  Member
  
  'peterlaursen' wrote:
  
  I am not sure if this index will work with such where clause. I think it will not. The DATETIME column is not first/leftmost in the index.
  
  Please refer http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html “MySQL cannot use an index if the columns do not form a leftmost prefix of the index.”
  
  Could you please try to add an index on that column alone? I know it could take some time to build the index.
  
  Also try to EXPLAIN a query using a WHERE clause on the column. Is the index used?
  
  OK, the following shows the index present (i.e., dateTimeNdx):
  
  CREATE TABLE `tag_data5mins` (
  
  `TAGNAME` varchar(255) NOT NULL,
  
  `DATATIME` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  
  `VALUE` float DEFAULT NULL,
  
  `EXPIRY` timestamp NULL DEFAULT NULL,
  
  `LASTFETCHED` timestamp NULL DEFAULT NULL,
  
  `ERROR_DESC` varchar(512) DEFAULT NULL,
  
  `QUALITY` float DEFAULT NULL,
  
  `WWRESOLUTION` float DEFAULT NULL,
  
  `WWRETRIEVALMODE` varchar(255) DEFAULT NULL,
  
  PRIMARY KEY (`TAGNAME`,`DATATIME`),
  
  KEY `dateTimeNdx` (`DATATIME`),
  
  CONSTRAINT `TAG_DATA5MINS_FK` FOREIGN KEY (`TAGNAME`) REFERENCES `tagnames` (`TAGNAME`) ON DELETE CASCADE
  
  ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  
  and in the 'possible_keys' column of the EXPLAIN result, the following
  
  PRIMARY,dateTimeNdx
  
  indicates that yes, it is “possibly” being used
- May 31, 2011 at 2:50 pm #32290
  
  Rich
  Member
  
  I have placed an index on the datatime column in both source and target database, yet still the sync time for a table with 2251881 rows is 595s using the WHERE clause when only 68 rows are different.
  
  A sync without using the WHERE clause is 153s.
  
  Is there anything else you can help me do to diagnose and figure out a way to speed this up by limiting the rows?
  
  Thanks
  
  Rich
- May 31, 2011 at 3:06 pm #32291
  
  peterlaursen
  Participant
  
  We will reply wiht priority tomorrow.
  
  Did you ever tell your server version? I think I remember some bugs in some MySQL version with indexes on DATETIME columns.
- June 1, 2011 at 8:10 am #32292
  
  vishal.pr
  Member
  
  Hi Rich,
  
  Please tell what all columns from the table `tag_data5mins` is used in the Columns to Compare option. Also the exact where clause you supplied in the SQL Where (nothing more, nothing less). Once you provide these details we can frame a query that gets the rows to sync and you can execute the same in your environment to see how much time it takes. We can then help you to optimize the WHERE clause used.
  
  Regards,
  
  Vishal PR
- June 1, 2011 at 12:45 pm #32293
  
  Rich
  Member
  
  'vishal.pr' wrote:
  
  Hi Rich,
  
  Please tell what all columns from the table `tag_data5mins` is used in the Columns to Compare option.
  
  Also the exact where clause you supplied in the SQL Where (nothing more, nothing less).
  
  TO_DAYS(NOW()) – TO_DAYS(datatime) <= 1
  
  Once you provide these details we can frame a query that gets the rows to sync and you can execute the same in your environment to see how much time it takes. We can then help you to optimize the WHERE clause used.
  
  Regards,
  
  Vishal PR
  
  Hi,
  
  Only two columns: Datatime and Tagname
  
  Where Clause: TO_DAYS(NOW()) – TO_DAYS(datatime) <= 1
  
  Also, the mySQL version is 5.1 on both servers
- June 1, 2011 at 2:17 pm #32294
  
  peterlaursen
  Participant
  
  Even if it may not be relevant please provide the *full* version (not 5.1 but 5.1.57 or whatever).
  
  But now I understand (I should before) that the WHERE uses a calculation. I think this is the problem. Because the result of the calculation is not stored and thus not indexed either. Some databases have 'virtual columns' (that can be indexed) for handling this. The standard MySQL server does not (but latest MariaDB has).
  
  Anyway I'll let Vishal and our testers comment too.
- June 1, 2011 at 2:32 pm #32295
  
  Rich
  Member
  
  'peterlaursen' wrote:
  
  Even if it may not be relevant please provide the *full* version (not 5.1 but 5.1.57 or whatever).
  
  But now I understand (I should before) that the WHERE uses a calculation. I think this is the problem. Because the result of the calculation is not stored and thus not indexed either. Some databases have 'virtual columns' (that can be indexed) for handling this. The standard MySQL server does not (but latest MariaDB has).
  
  Anyway I'll let Vishal and our testers comment too.
  
  5.1.43
- June 1, 2011 at 3:05 pm #32296
  
  vishal.pr
  Member
  
  Can you execute the following query in your environment and see how much time it is taking?
  
  Code:
  
  SELECT
  `TAGNAME`,
  `DATATIME`,
  `VALUE`,
  `EXPIRY`,
  `LASTFETCHED`,
  `ERROR_DESC`,
  `QUALITY`,
  `WWRESOLUTION`,
  `WWRETRIEVALMODE`,
  MD5(
  CONCAT_WS('#', `TAGNAME`, `DATATIME`)
  ) AS __crc
  FROM
  `tag_data5mins`
  WHERE (1 = 1)
  AND (TO_DAYS(NOW()) – TO_DAYS(DATATIME) <= 1)
  ORDER BY `TAGNAME` ASC,
  `DATATIME` ASC;
  
  **Query formatted using SQLyog SQL Formatter.
  
  Regards,
  
  Vishal PR
- June 1, 2011 at 5:24 pm #32297
  
  Rich
  Member
  
  Hi Vishal,
  
  It takes ~ 3.78s consistently
  
  Rich
- June 2, 2011 at 11:43 am #32298
  
  peterlaursen
  Participant
  
  OK .. I think we can conclude here.
  
  1)
  
  The query that Vishal asked you to execute is a simplified query that Data Sync executes. It may be executed multiple times. It will calculate such checksums on CHUNKS of rows. Let us say CHUNK = 2000 rows. If The first CHUNK reveals no difference of data (as per the checksum) we will proceed to next chunk. So it you have 200,000 rows in table it will be executed 100 times even if data are completely identical. And if each execution take 4 seconds it wil be 400 seconds in total to get the information from MySQL. Moreover: if the calculation in a CHUNK reveals differences of data the query will be executed again in a recursive manner for smaller CHUNKs in order to identify exactly what rows are different on source and target. So in conclusion: how many times such query is executed depends on 'how much different' source and target are and how the differences are distributed in the table.
  
  2)
  
  The execution time for the sample query show that it is not optimized well by the MySQL server. Because it uses a calculation in the WHERE clause, the result of the calculation is not stored and thus not indexed. Even if EXPLAIN tells that an index is used I believe it will only be for one/first operand of the calculation. I think taht for each (indexed) 1st operand it will still do a full table scan for comparing with the second operand. If you have SQLyog ULTIMATE profiling the query will reveal more information (in particular check the increase of 'full table scans' for the sample query).
  
  3)
  
  Option for improvement (as I can see) include:
  
  * If WHERE is less efficient than NOWHERE, don't use WHERE.
  
  * explore the COLUMNS option in Data Sync. The COLUMNS option defines on what columns checksums shoule be calculated (and NOT what columns are sync'ed. All columns are always synced). Specifying only a few and small columns necessary to ensure uniqueness will improve performance of checksums calculation.
  
  * Tuning of global server parameters could improve a lot. Some buffer settings etc. in MySQL are ridculously low with default parameters and it may result in swapping.
  
  * If you have access (and can manage) to rewrite the application you may store (and index) the TIMEDIFF in the table. If you want to avoid redundant information you may drop one of the existing columns.
  
  * Consider replacing MySQL 5.1 with MariaDB 5.2 (it is a drop-in replacement), define and index a 'virtual column' on the TIMEDIFF. However I cannot promise how mcuh it would help in this scenario. You will have to try for yourself in your environment.
  
  * Consider Replication rather than Data Sync for this.
- June 2, 2011 at 7:48 pm #32299
  
  Rich
  Member
  
  'peterlaursen' wrote:
  
  OK .. I think we can conclude here.
  
  1)
  
  The query that Vishal asked you to execute is a simplified query that Data Sync executes. It may be executed multiple times. It will calculate such checksums on CHUNKS of rows. Let us say CHUNK = 2000 rows. If The first CHUNK reveals no difference of data (as per the checksum) we will proceed to next chunk. So it you have 200,000 rows in table it will be executed 100 times even if data are completely identical. And if each execution take 4 seconds it wil be 400 seconds in total to get the information from MySQL. Moreover: if the calculation in a CHUNK reveals differences of data the query will be executed again in a recursive manner for smaller CHUNKs in order to identify exactly what rows are different on source and target. So in conclusion: how many times such query is executed depends on 'how much different' source and target are and how the differences are distributed in the table.
  
  I guess I am naive in how I thought this would be implemented. I would have expected the sync tool to get an initial result set by executing the query with the where condition, then do the sync'ing operations based strictly on that result set, ignoring all rows not meeting the condition.
  
  'peterlaursen' wrote:
  
  2)
  
  The execution time for the sample query show that it is not optimized well by the MySQL server. Because it uses a calculation in the WHERE clause, the result of the calculation is not stored and thus not indexed. Even if EXPLAIN tells that an index is used I believe it will only be for one/first operand of the calculation. I think taht for each (indexed) 1st operand it will still do a full table scan for comparing with the second operand. If you have SQLyog ULTIMATE profiling the query will reveal more information (in particular check the increase of 'full table scans' for the sample query).
  
  3)
  
  Option for improvement (as I can see) include:
  
  * If WHERE is less efficient than NOWHERE, don't use WHERE.
  
  * explore the COLUMNS option in Data Sync. The COLUMNS option defines on what columns checksums shoule be calculated (and NOT what columns are sync'ed. All columns are always synced). Specifying only a few and small columns necessary to ensure uniqueness will improve performance of checksums calculation.
  
  * Tuning of global server parameters could improve a lot. Some buffer settings etc. in MySQL are ridculously low with default parameters and it may result in swapping.
  
  * If you have access (and can manage) to rewrite the application you may store (and index) the TIMEDIFF in the table. If you want to avoid redundant information you may drop one of the existing columns.
  
  * Consider replacing MySQL 5.1 with MariaDB 5.2 (it is a drop-in replacement), define and index a 'virtual column' on the TIMEDIFF. However I cannot promise how mcuh it would help in this scenario. You will have to try for yourself in your environment.
  
  * Consider Replication rather than Data Sync for this.
  
  I use only the key columns (2) for compare.
  
  I'm not really disposed to tune the server params although I guess that's how people eventually get educated.
  
  I would like to use Replication but this operation has to be done over an ssh port with the slave outside of the secure network such that it cannot be pointed to the master. If I could find a source of info that would let me replicate such that the master points to the slaves rather than vice versa, I would do that (if can do over ssh)
  
  Thanks for putting significant effort into helping investigate this.
  
  Rich
Author

Posts

Viewing 13 reply threads

You must be logged in to reply to this topic.

Search Forums

Recent Topics

Recent Replies

Help With Where During Sync

Search Forums

Recent Topics

Recent Replies

Tags