[bug] Upgraded: No More Auto-reconnect

This topic is empty.

Viewing 14 reply threads

Author

Posts
- August 31, 2007 at 7:19 am #10516
  
  Marjolein Katsma
  Member
  
  I just upgraded SQLyog Enterprise from version 5.22 to 6.05.
  
  I'm connecting to my new host using the SSH tunnel feauture.
  
  Now, while with 5.22 I could leave SQLyog open overnight and still do any queries the next day (apparently doing a transparent auto-reconnect), with 6.05 I get “MySQL server has gone away” after a while. Reconnect won't work either: it tells me the port is already in use…
  
  Note: my SSH server does not time out, so that's not the problem. MySQL server version is 5.0.45 (in case it matters).
  
  It would be great if SQLyog could do auto-reconnect again, even over an SSH tunnel!
  
  Regards,
- August 31, 2007 at 9:28 am #24803
  
  peterlaursen
  Participant
  
  There is no intentional change between the two versions in this respect. Only reconnection in Structure Sync in know to be broken in 6.0 to 6.05 (fixed in 6.06).
  
  However we did rebuild the PLINK in 6.0 (technically it is plink.exe and not sqlyog.exe that loses connection).
  
  It will take a little time to research this. I will PM you a download link to version 5.32 ENTERPRISE with registration details. Please install this one (to another folder) open a connection with it and let be open overnight and report how this one behaves.
- August 31, 2007 at 12:27 pm #24804
  
  Marjolein Katsma
  Member
  
  peterlaursen wrote on Aug 31 2007, 11:28 AM:
  
  There is no intentional change between the two versions in this respect. Only reconnection in Structure Sync in know to be broken in 6.0 to 6.05 (fixed in 6.06).
  
  However we did rebuild the PLINK in 6.0 (technically it is plink.exe and not sqlyog.exe that loses connection).
  
  It will take a little time to research this. I will PM you a download link to version 5.32 ENTERPRISE with registration details. Please install this one (to another folder) open a connection with it and let be open overnight and report how this one behaves.
  
  Downloaded, installed (indeed it didn't ask me for a registration code but I have 5.22 still installed as well), and set up with another connection to my host. I'll now let it sit idle for a while, at least until tomorrow morning (if nothing happens that forces me to reboot, that is)…
  
  Interestingly though, although the connection dialog in 5.32 looks (almost) like that in 6.05 but different from 5.22, a version compare shows the plink.exe bundled with all three versions is exactly the same, and they are binary identical, too. There is a difference with 5.12 (which I still have installed) though, but I never used it with that version: I just never had a need for tunneling until now, so I used it a bit in 5.22 and then upgraded to 6.05.
  
  Anyway, I'll report back what happens, if anything, or if nothing. 😉
- September 1, 2007 at 7:11 pm #24805
  
  Marjolein Katsma
  Member
  
  OK, here's a result:
  
  I had version 5.32 running since yesterday afternoon (my post was just after I started it up). Late this morning I tried something, but there was still a connection – no auto-reconnect either, I looked at the history tab since in a few other cases I'd noted much longer response times which I suspect was a transparent auto-reconnect.
  
  Meanwhile I had also 6.05 running all the time, using it on and off. Off actually since late morning, I was doing other things on the new server.
  
  Just now I went to have a look at something – and 5.32 gave me a “MySQL server has gone away” – twice. So I went to my instance of 6.05 – and had exactly the same result. (Both versions are using a binary-identical version of plink.exe.)
  
  Then I went to look at my server, and find that both SSH and MySQL are, in fact, running (I'd been using SSH intensively during most of the interval).
  
  Now, I can do a little digging to see if I can find if in the meantime either of the two services had stopped and auto-restarted. (Will report back if I find anything – I'm not familiar yet with where which logs are to be found so that may take me a while.)
  
  Still, even if one of the two had stopped and restarted, when both are running shouldn't SQLyog (plink.exe) be able to auto-reconnect when an event happens that requires it to go to the server (like clicking on another database in the object browser)? A click is all I did in both cases.
  
  And if it's not a matter of a server “going away”and coming back, is there some sort of built-in time-out in plink.exe?
- September 1, 2007 at 9:53 pm #24806
  Marjolein Katsma
  Member
  Marjolein Katsma wrote on Sep 1 2007, 09:11 PM:
  
  Now, I can do a little digging to see if I can find if in the meantime either of the two services had stopped and auto-restarted. (Will report back if I find anything – I'm not familiar yet with where which logs are to be found so that may take me a while.)
  
  Update – I found all the relevant log files:
  - boot.log – nothing related to either sshd or mysqld for the relevant period of time
  - messages – only relevant is a close of sshd session 4012 @ 21:06, which session was started
  - mysql's .err – shows mysqld has been running continuously since 2007-08-29 @ 22:55
  So… neither mysqld nor sshd actually stopped on the server side during the relevant period; nothing on the server side (that I can find) that could cause this – and yet both clients (with the same plink.exe) report that “MySQL server has gone away”. Looks to me like plink.exe may be timing out after a (longish) period of idleness. But why can't it reconnect even then?
- September 2, 2007 at 4:18 pm #24807
  c64audio
  Member
  Marjolein Katsma wrote on Sep 1 2007, 10:53 PM:
  
  Update – I found all the relevant log files:
  
  boot.log – nothing related to either sshd or mysqld for the relevant period of time
  
  messages – only relevant is a close of sshd session 4012 @ 21:06, which session was started
  
  yesterday @ 8:55: that was me killing the session in SQLyog *after* MySQL had
  
  “gone away”
  
  mysql's .err – shows mysqld has been running continuously since 2007-08-29 @ 22:55
  
  So… neither mysqld nor sshd actually stopped on the server side during the relevant period; nothing on the server side (that I can find) that could cause this – and yet both clients (with the same plink.exe) report that “MySQL server has gone away”. Looks to me like plink.exe may be timing out after a (longish) period of idleness. But why can't it reconnect even then?
  I've been complaining about “Gone Away” messages in another thread, mostly concerned with its behaviour over SSH tunnelling. I got no satisfactory answer then, after they'd fixed structure sync.
  
  Chris
- September 2, 2007 at 5:34 pm #24808
  
  Marjolein Katsma
  Member
  
  c64audio wrote on Sep 2 2007, 06:18 PM:
  
  I've been complaining about “Gone Away” messages in another thread, mostly concerned with its behaviour over SSH tunnelling. I got no satisfactory answer then, after they'd fixed structure sync.
  
  Hi Chris,
  
  I've been poking around the forum, of course, and noted the structure sync problem. but I understood that was either fixed or at least logged as a bug to be fixed.
  
  The “Gone away” happens for me after I've been doing “nothing” for a long time (many hours, but it's hard to give a limit) – when using SSH tunneling. Are you saying you're seeing the same thing, or that it happened only with structure sync for you? Has your SQLyog maybe been sitting idle before (attempting to) start a structure sync?
  
  I'm wondering if we could be seeing symptoms of the same underlying problem.
- September 2, 2007 at 8:23 pm #24809
  
  peterlaursen
  Participant
  
  let me explain:
  
  When we originally introduced reconnection it was only implemented in the 'main' GUI. For structure sync, dublicate/copy operations and batch jobs it was added later. Structure sync was added in 5.22 as the last. Structure sync was the programmatically most difficult one to solve.
  
  It was temporarily disabled again in internal builds 'on the way to 6.0' and by mistake it was simply forgotten. Only when we had 2 complaints almost simultaneously (the one by Chris and one in the ticket system) it was sorted out what had happened and fixed in 6.06.
  
  Versions 6.0 to 6.05 will not reconnect automatically in structure sync no matter the connection method, This bug is NOT SSH-related!
  
  It looks now like some issue with SSH and reconnection (not in Structure Sync but everywhere!) has occured. We asked Marjolein to compare different versions becuase we judged that she had a setup fit for such test, just as we have a similar test running at office since Friday. If this is an issue introduced in 6.0 or in late 5.x versions I hope Marjolein's information as well as similar information from our own test will reveal.
  
  When Chris had no saticfactying answer, it is simply because we could not reproduce it. Of course SQLyog will stop trying to reconnect if it has tried with no succes (but actually I am not quite sure if it tries 1 or 2 times – I remember we discussed but I am not sure about the conclusion).
  
  The culprit is if there is consistently reproducable difference between 6.x and various 5.x versions. Marjolein seems to have clear indications of this and this we take seriously!
- September 3, 2007 at 7:33 am #24810
  
  Marjolein Katsma
  Member
  
  peterlaursen wrote on Sep 2 2007, 10:23 PM:
  
  We asked Marjolein to compare different versions becuase we judged that she had a setup fit for such test, just as we have a similar test running at office since Friday. If this is an issue introduced in 6.0 or in late 5.x versions I hope Marjolein's information as well as similar information from our own test will reveal.
  
  (…)
  
  The culprit is if there is consistently reproducable difference between 6.x and various 5.x versions. Marjolein seems to have clear indications of this and this we take seriously!
  
  By now I have seen similar behavior in 5.32 and 6.05 – both exhibit the “Gone away” problem. But I worked only for a bit with 5.22 combined with tunneling – so now I'm so not sure that really is different, the problem just didn't manifest itself, it's hard to reproduce on purpose… Those three versions have an identical plink.exe though, so if it's anything to do with plink, it's likely they behave the same, I just didn't see it with 5.22.
  
  The difficulty is determining what is triggering the behavior, especially when it doesn't seem like it's anything server-side: you can only be sure it happens when it happens, but when it doesn't happen you cannot be sure it couldn't happen, it just happened not to happen. 😉
  
  But I still have 5.12 installed as well (I'm weird that way ;)) which has a different version of plink.exe so I've now started that up, to see what happens – or does not happen…
- September 3, 2007 at 6:03 pm #24811
  
  Marjolein Katsma
  Member
  
  Marjolein Katsma wrote on Sep 3 2007, 08:33 AM:
  
  I still have 5.12 installed as well (I'm weird that way ;)) which has a different version of plink.exe so I've now started that up, to see what happens – or does not happen…
  
  Well, letting both versions (connected to the same server) sit idle for 9 or 10 hours and then trying a few clicks didn't show anything exciting – certainly no “Gone away” dialog. This is hard to reproduce – if it even has to do with sitting idle at all…
  
  So, I tried something different. I followed a few actions in parallel on both clients (after the idle period) by a deliberate stop and start of MySQL server on the host, and then a few actions in parallel again. Still no “gone away” from either client version, but I do see an interesting difference between the two: it looks like v5.12 does a deliberate reconnect while v6.05 doesn't. (unless I'm interpreting the results incorrectly).
  
  Here are the results:
  
  v. 5.12 history
  
  Code:
  
  (after being idle since 09:16:20 today)
  /*[18:54:04][ 50 ms]*/ Set names 'utf8'
  /*[18:54:04][ 370 ms]*/ set sql_mode=''
  /*[18:54:06][2153 ms]*/ show full fields from `travelblog`.`itinerary`
  /*[18:54:06][ 20 ms]*/ show keys from `travelblog`.`itinerary`
  /*[18:54:06][ 60 ms]*/ select * from `travelblog`.`itinerary` limit 0, 50
  /*[19:18:18][ 20 ms]*/ show full fields from `travelblog`.`link`
  /*[19:18:18][ 10 ms]*/ show keys from `travelblog`.`link`
  /*[19:18:18][ 30 ms]*/ select * from `travelblog`.`link` limit 0, 50
  /*[19:25:53][ 10 ms]*/ Set names 'utf8'
  /*[19:25:53][ 10 ms]*/ set sql_mode=''
  /*[19:25:53][ 311 ms]*/ show full fields from `travelblog`.`news`
  /*[19:25:53][ 20 ms]*/ show keys from `travelblog`.`news`
  /*[19:25:53][ 50 ms]*/ select * from `travelblog`.`news` limit 0, 50
  
  v. 6.05 history
  
  Code:
  
  (last evening)
  /*[21:39:05][ 10 ms]*/ Set names 'utf8'
  /*[21:39:05][ 20 ms]*/ set sql_mode=''
  …
  (after being idle since 08:10:22 today)
  /*[18:54:25][ 51 ms]*/ show full fields from `travelblog`.`itinerary`
  /*[18:54:25][ 20 ms]*/ show keys from `travelblog`.`itinerary`
  /*[18:54:25][ 10 ms]*/ show create table `travelblog`.`itinerary`
  /*[18:54:35][ 30 ms]*/ show full fields from `travelblog`.`itinerary`
  /*[18:54:35][ 20 ms]*/ show keys from `travelblog`.`itinerary`
  /*[18:54:35][ 70 ms]*/ select * from `travelblog`.`itinerary` limit 0, 100
  /*[19:19:11][ 20 ms]*/ show full fields from `travelblog`.`link`
  /*[19:19:11][ 20 ms]*/ show keys from `travelblog`.`link`
  /*[19:19:11][ 20 ms]*/ select * from `travelblog`.`link` limit 0, 100
  /*[19:26:06][ 71 ms]*/ show full fields from `travelblog`.`news`
  /*[19:26:06][ 10 ms]*/ show keys from `travelblog`.`news`
  /*[19:26:06][ 20 ms]*/ show create table `travelblog`.`news`
  
  mysql log .err
  
  Code:
  
  070903 19:25:14 [Note] /usr/sbin/mysqld: Normal shutdown
  
  070903 19:25:16 InnoDB: Starting shutdown…
  070903 19:25:17 InnoDB: Shutdown completed; log sequence number 0 43634
  070903 19:25:17 [Note] /usr/sbin/mysqld: Shutdown complete
  
  070903 19:25:17 mysqld ended
  
  070903 19:25:25 mysqld started
  070903 19:25:28 [Warning] Asked for 196608 thread stack, but got 126976
  070903 19:25:28 InnoDB: Started; log sequence number 0 43634
  070903 19:25:28 [Note] /usr/sbin/mysqld: ready for connections.
  Version: '5.0.45-community' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Edition (GPL)
  
  Another thing I could try is deliberately stopping/starting sshd on the server. I'll do that when I have a moment.
- September 3, 2007 at 8:13 pm #24812
  
  peterlaursen
  Participant
  
  Just FYI: we cannot either consistently reproduce any different behaviour with the different versions in our own test.
  
  Currently reconnection is attempted only once. We discussed:
  
  1) providing a configuration option for user to set the # of reconnects
  
  2) if first reconnect is not succesfull we could wait/sleep() for some time (say 200-ms or 500 ms) before trying again. The philosophy is that it will not make much sense to try again only a few milliseconds after failure. The remote host or or some network gear (routers, switches, transmitters) in the connection chain might need a little time …
  
  Also reconnection with SSH tunnel assumes that PLINK is alive. We do not re-instantiate PLINK. But I also do not think it is the problem that PLINK dies (except if it is explicitly killed by user, of course!).
- September 4, 2007 at 6:45 am #24813
  Marjolein Katsma
  Member
  peterlaursen wrote on Sep 3 2007, 09:13 PM:
  
  Just FYI: we cannot either consistently reproduce any different behaviour with the different versions in our own test.
  
  Currently reconnection is attempted only once. We discussed:
  
  1) providing a configuration option for user to set the # of reconnects
  
  2) if first reconnect is not succesfull we could wait/sleep() for some time (say 200-ms or 500 ms) before trying again. The philosophy is that it will not make much sense to try again only a few milliseconds after failure. The remote host or or some network gear (routers, switches, transmitters) in the connection chain might need a little time …
  
  Both of those would be good to have. Still, they'd mostly address symptoms, not the actual cause of the “gone away” problem – whatever that cause is.
  
  peterlaursen wrote:
  
  Also reconnection with SSH tunnel assumes that PLINK is alive. We do not re-instantiate PLINK. But I also do not think it is the problem that PLINK dies (except if it is explicitly killed by user, of course!).
  
  Well, things are getting quite interesting here…
  
  I remembered one of many the little tools I have installed on my machine: Process explorer (from Sysinternals). It's like Task manager on steroids, and one of the useful things is does is show how processes are related, by showing “parent” and “child” processes in a tree. For each, it can show an enormous amount of detail, like event, key, port, process, thread, etc. tree in the top pane, and either dlls or handles in the lower pane. In this case, the handles view is the most useful.
  
  So yesterday I fired it up, and left that running as well as my two instances of SQLyog. What immediately struck me was that both had started two instances of plink.exe (matching the two consecutive popups of the “gone away” dialog, if it happens?). But what was more striking was that in the detailed list both had a number of [####] (with #### the original PID) mentioned as “process”, but my 6.05 instance (which also had been running longer) had many more of those than 5.12. I looked a few times at it yesterday, but didn't notice any change.
  
  Now, I just had another look. 5.12 looks pretty much the same, with in its detailed process list still two and two plink.exe, just like I saw all the time yesterday. But now 6.05 is different: it has noting but listed (20!) and no plink.exe. I definitely did not kill those processes!
  
  Now I can make a prediction: if I go to 5.12 and click on a new table, it will just show it to me; if I do the same in 6.05, it will give me the dreaded “gone away” dialog, twice: once for each plink.exe that's gone missing. So let's try… (I'm actually writing this before trying!)
  
  v. 5.12:
  - one click on new table while history tab is in view – nothing happens
  - switch to data tab … long wait … and I'm getting the “gone away” dialog – once
  - the data tab is empty
  - looking at Process explorer: what has actually gone away is one of the two plink.exe processes that still were there just before!
  - going back at the history tab, I see three new statements:
    
    Code:
    
    /*[07:50:16][ 50 ms]*/ show full fields from `travelblog`.`itinerary`
  - switching back to the data tab, I get another wait… and the “gone away” dialog again
  - looking at Process explorer again… the second copy of plink.exe is still there (Task manager confirms this); in the detailed list, the original plink.exe item (recognizable by the PID which I noted) has now turned into a as well (3 of those now)
  v.6.05:
  - one click on new table while history tab is in view long wait … and I'm getting the “gone away” dialog – once
  - switching to the data tab, I get another wait… and the “gone away” dialog again
  - the history tab now shows one new statement for each attempt:
    
    Code:
    
    /*[07:59:14][ 131 ms]*/ show full fields from `travelblog`.`itinerary`
  - Process explorer still shows 20
  So… my prediction was close, if not exact.
  
  What's important to note is though that plink.exe can “go away”! Your assumption that it doesn't clearly isn't correct – and I definitely did not kill those processes. That leaves three possibilities:
  - SQLyog is somehow killing them itself
  - plink.exe decides to go away all by itself ('nothing to do here…')
  - the operating system is killing them
  I think the latter is actually more likely, and there may be a connection with “idle time” here as well; plink.exe going away by itself might fit, too – but remember I'm looking at two different versions of plink.exe now. I should add that I've had my machine pretty busy overnight (and right now) – it might be a case of Windows itself deciding to kill a child process when it's doing nothing anyway; but I'm not sure Windows is supposed to be doing something like that. Maybe plink.exe is simply crashing in some circumstances.
  
  Whatever, the assumption that plink.exe is is always there is clearly false.
  
  That should give you guys something to investigate. 😉
  
  (If you don't have it yet, get Process explorer – it can show a ton of useful information.)
  
  Meanwhile, I'll let both instances of SQLyog run, but in each close the connection and then re-start it. I should see two fresh copies of plink.exe for each right?
  
  Wrong! For v.6.05 I see two (both in the tree and in the detailed list), but for 5.12 I see three in the tree but four in the detailed list; a little later there are only three in the detailed list as well – as if there was a fourth instance only briefly but it has already gone away! :huh: And 5.12 has two new in the process handles list while the two for the previous two copies of plink.exe are still there (apparently they are not closed when closing a connection!). 6.05 has one new – as if there briefly was a third copy of plink.exe that has gone missing already…
  
  Task manager confirms: three of the older, two of the newer version of plink.exe running (easily recognized even there since they have different sizes resulting in different memory footprints).
  
  Plenty of food for thought here, I think.
- September 4, 2007 at 9:00 am #24814
  
  Marjolein Katsma
  Member
  
  At 10:37 I noted that one of v.5.12's three plinks has already gone away…
- September 4, 2007 at 1:18 pm #24815
  
  peterlaursen
  Participant
  
  OK .. it really looks like we should consider to re-initiate PLINK too.
  
  For most of this week we will be busy 'merging' code from 4 contributors into the first 6.1 build. After that we will to schedule this!
- September 4, 2007 at 2:44 pm #24816
  
  Marjolein Katsma
  Member
  
  peterlaursen wrote on Sep 4 2007, 03:18 PM:
  
  OK .. it really looks like we should consider to re-initiate PLINK too.
  
  For most of this week we will be busy 'merging' code from 4 contributors into the first 6.1 build. After that we will to schedule this!
  
  Great! 😀
  
  That said, it would also be a good idea, I think, to investigate what makes plink “go away” (or crash?). (As a programmer, I wouldn't be happy to know it sometimes “goes away” and not know why!) But it certainly would be more robust to to re-initiate plink.exe anyway, since at least the user may (intentionally or not) have killed the process.
  
  Meanwhile, by now I have a gut feeling that plink “going away” (or crashing) is not dependent on idle time but somehow on how busy the system is. Maybe in such circumstances Windows takes away resources from plink that it assumes it still has, and it crashes as a result. Or something along those lines. My machine was really busy this morning with lots of software loaded and big backup jobs running, sometimes concurrently, with CPU maxing out at 100% for long periods.
  
  Another thing to consider would be to close all handles used by a connection when that connection is closed. Same for threads (I see lots of those with in Process explorer). I also see a large number of Sections BaseNamedObjectsPlinkSharedErrorMsg in 6.05 (but not in 5.12 which apparently uses a different mechanism to communicate with its plink.exe) – I cannot imagine really 17 of those are needed with just two plink.exe processes doing nothing?
Author

Posts

Viewing 14 reply threads

You must be logged in to reply to this topic.

Search Forums

Recent Topics

Recent Replies

[bug] Upgraded: No More Auto-reconnect

Search Forums

Recent Topics

Recent Replies

Tags