-
Notifications
You must be signed in to change notification settings - Fork 548
Connection reset by peer in Linux #513
Comments
I've just forgot to say that the following command works fine in the same Linux machine:
|
I can see where the problem is happening, if not what. When you create the exec you get back a stream, and when you call In my experience doing a similar task, reading docker events off the stream, I found that A. the docker API will leave streams open and ready to send more even when there is no more to send, and B. the apache http library that docker-client uses to read the stream does not handle that situation gracefully. I am not certain that's the same problem you’re having. I don't know why the connection would be reset during that process. I can't give you any real advice or direction. Can you reproduce the problem reliably? Is it just with your own images, or can you run some command in busybox or whatever that you can get to trigger the same error? Which CI service are you using? |
Hi @johnflavin, Thanks for the quick response. My CI service is Jenkins. But even running the tests manually in that machine (Debian Wheezy) gives the same results. I tried to get rid of all the readFully() and make sure that I close all the LogStreams in my code. But still the same issue. Moreover, the error happens almost immediately. I will try to find a way to reproduce in problem in a way I can share with you. Nevertheless, I am starting to think that the main issue is I am not using your library for the right purpose. Do you have any suggestion of how to run integration tests with Docker in an isolated and multi platform way? We are considering running the tests inside one of the containers as an alternative. Thanks |
Are you able to run your I can't say whether or not you’re using this library for the "right" purpose. I don't know specifically what you’re trying to accomplish. But I can tell you that the purpose of this library is very simple: to let you communicate with your docker server from within java code (or scala, or groovy, etc.). If you wanted to, you could eschew the use of |
I've got the error with
The only potential issue is that I have to terminate the commands, since both kafka-console-producer and kafka-console-consumer will never finish. I've just noticed that another test that I have with Cassandra is working fine (I use docker-client to run some CQL statements). So definitely it has to be an issue with Kafka. I will try to investigate further and let you know if I manage to solve the issues. |
For what it's worth, I'm seeing the same "java.io.IOException: Connection reset by peer" error. It seems intermittent, though. Probably happening every one out of three times. I can reproduce it with the following client application:
Here's the stack trace I see after running it 3-4 times:
Docker Version: 1.12.5, build 8eab29e Is there anything else you think I should try? |
I'm facing a similar issue, sounds like a race condition or something. Was having the issue in an ubuntu server, and once I enabled docker daemon debug mode it stopped for the time being, but at least whenever it appears I hope to have more data to be able to trace further the issue. In my local machine with docker for mac, I was not able to reproduce the issue. I am running Docker Version: 1.13.0, with docker client library 7.0.2. |
I'm also experiencing the same issue, but it happens more than half of the times I run it. Docker Version: 1.12.6 |
I'm experiencing this issue very frequently as well (more often than not). Docker Version: 1.13.0 |
Same problem for me (version 1.12.3) Like johnflavin said in his first post, it's due to the stream from Any solution to parse the logs ? |
I also have the same problem but I didn't know there was an issue created about it (just found out since now I added myself as a watcher). What I did as a workaround was to wrap the It does not solve the issue, but maybe it helps you to move on while a solution is found. |
I've taken another look at this thread, and I think I see two different but related issues:
*I did want to try one possible solution to the problem that reading from a still-open-but-no-data connection will block: check the input stream's |
@johnflavin It seems that the LogStream class don't have access to the input stream's I use a cassandra container for my tests and i execute some statement into, so i want to check out the log for an error and crashed when it fails, so i don't mind having a |
I'm not sure if it would work for everyone in this thread, but I suspect that some of the reasons that people are using |
As of Docker
with version |
@GameScripting What command are you trying to I'm digging into your comment to try to answer some questions:
|
@rinscy Sorry, meant to respond to this a while ago. The call to |
@johnflavin Yes i just notice that. |
I'm not sure if anyone else has tried this, but I noticed that this issue seems be revolve around using the UNIX socket that the Docker daemon sets up by default. I was wondering why I wasn't running into the same issue, so I tried to replicate it on my end. In my environment and use case, I need to communicate with local and remote Docker hosts. As such, I've configured my daemons to listen on TCP port 2375 (no TLS). Using the code snippet provided by @jalbr74, I replaced the string literal for the URL so that it uses the TCP port on the box. The result is that the provided snippet works. To double check, I used the original snippet with the UNIX socket, and received the same failures as others here. Including my snippet:
EDIT: For purposes of clarity, I'm using the 8.1.1 JAR with Docker 1.12.6 and 1.13.1, both locally and remotely, on Fedora 25 Server, Ubuntu 16.04 Server, and openSUSE LEAP 42.2. |
@ArcticPheenix Ok, it sounds like you’re able to reliably reproduce the error, and when you switch out the default The test you have provided is a good start, but that by itself doesn't let me reproduce the behavior. We already have a test that creates a simple exec like this: DefaultDockerClientTest#testExec. Plus, your test code doesn't appear to be complete. If that test is supposed to work, it looks like you have to start a container named I think what I want to do is figure out if we can reproduce this error by testing on different operating systems. You mentioned several: Fedora 25 Server, Ubuntu 16.04 Server, and openSUSE LEAP 42.2. But I can't tell exactly which were working, and whether they worked all the time or whether they failed using the socket and worked using tcp. So could you give more detailed information of all the configurations you have tried, socket vs. tcp, server OS, what works, what doesn't, etc.? What other variables have been mentioned in the thread?
If anyone else wants to report in, please provide the following:
Now that I've typed this all out, it makes me strongly suspect that this is an issue in docker's API and not in docker-client. Still, if someone can help track it down and reproduce it, that will make it all the easier to report to docker. |
@johnflavin I'm about to head out to lunch, but I'll get you the info you need. The root of the issue may or may not be twofold. Since I can reliably use these calls on versions of Docker that others are having issue with, that would seem to suggest that the issue(s) may be centered around UNIX sockets, and possibly the updated Docker API. Give me a bit to get some more data about versions, platforms, etc. I can also test on Mac, if you want another data point. I work with @jalbr74, and only started looking at this project as of yesterday. |
@johnflavin I wanted to apologize for the delay. Things at work have become quite busy. I'll paste my code snippet for a test class that allows me to reliably reproduce the issue on Fedora 25 Server with Docker 1.12.6, 1.13.1, and the new 17.03.0-ce. API versions range from 1.24 (Docker 1.12.6) through 1.26 (Docker 17.03.0-ce). In each instance, if I use the UNIX socket, I will get the issue. If I connect via TCP, I won't. Doing a cursory glance at the code, I don't think the issue lies with the |
@johnflavin I don't know if it will be useful but for me, switch from the default connection |
I agree with @ArcticPheenix; it does not look to me like the source of this problem is inside any docker-client code. Here are my thoughts:
|
Exactly i know for sure now that the probleme is not docker-client code (I forgot to tell you that in my previous message, but i forgot). |
I am getting similar issue in my case following exception occurs:-
My java class is 👎 public class RunRubyDocker {
} Docker version:- Client: Server: Please help. I am getting this issue error every 3 time when i am executing my code. |
@johnflavin I am not able to fix this issue please suggest. |
@ajmalrehman Can you give some more information by answering the questions at the bottom of this comment above? |
I've created a repro program and put it here: https://github.com/dan-v/spotify-docker-client-issue-513. In order to reproduce I have found that I must be run on the following
Here is example of the issue
The same program but run against TCP instead of unix socket
I also tested out the docker-java library which doesn't have this issue.
I then enable debug mode for docker daemon and looked at what each library was doing in terms of API calls. The spotify library was making calls a little differently. Spotify library
Docker-java library
I attempted to break it down into curl commands to see if I could reproduce it at this level - but it doesn't appear to work.
I found a way for me to work around the issue for now by adding DockerClient.ExecStartParameter.DETACH to the execStart call, waiting for exec to finish, and then looking at exit code. I don't need the output anyways for my use case.
|
@johnflavin - let me know if there is any additional debugging I can help with on this issue. |
This is great! Thanks. One additional piece of information I would like to see is logs of the timing. Could you make a tweak to your reproduction that logs the times of each operation? I have a suspicion (I won't call it a hypothesis, because it is based on very little) that the cause of the problem is making too many exec calls too fast. It seems that everything that can cause the issue to disappear is something that slows down some part of the process:
It is such a shame that this issue doesn't occur when docker debug logging is on. I reeeeeally want to see what is going on inside docker when this error happens. I will try this out myself when I get some time. Thanks again, this looks really helpful. |
I added some timestamps. It doesn't appear that the speed in which the exec calls are executed matters. I increased the sleep command to 20 seconds and it still reproduces.
|
I also tried to look at the traffic being sent on the unix socket using socat. Amazingly, as soon as I run with this setup I'm again unable to reproduce the issue. Setup a "mitm" socket. Here you will see all the traffic happening.
Tell our program to use /tmp/fake socket for docker
|
After playing around with this a fair bit - my suspicion (also based on very little) is that this is an issue with this client library (or one of the libraries it depends on (e.g. jersey)). Using socat I was able to send the exact byte for byte repro data to the docker unix socket, and I did not see any errors or issues in the response that would indicate an issue with the docker daemon. Anyways, I have a workaround - but let me know if I can help in any way - at this point I'm just very curious to know the root cause. |
oops posting to right place. @mattnworb - I'm not using multiple threads in my test program. |
I am able to produce the issue with the example usage that is provided in the readme. What is the status on this? |
I have tried and failed to reproduce the issue, so I haven't been able to dig any deeper into the cause. The only thing I can do right now is to point you to the workarounds that others have mentioned above. I hope one or more can work for your use case.
|
I've reproduced this bug as well with the snippet from @ArcticPheenix . I've only been able to confirm a lot of the same things mentioned by others, and possibly a simpler way to work around the issue entirely on the client-side (without modifying the daemon itself, or changing to a tcp endpoint).
The only useful thing is that the issue seems to go away for me when passing the ExecCreateParam.attachStdin() parameter to the execCreate() call in addition to the stdout/stderr options. My guess is that when STDIN is specified, the daemon drops any responsibility it may have with respect to that particular connection, and so it never sends the RST packet. |
I faced with issue and unfortunately switched to another library - docker-java. Here is sample of code which I use:
|
@framebassman
You docker cliient is listening on unix socket or local tcp ?
Because with docker-driver it's only on unix socket that occurred this
issue.
Regards
|
@saidbouras This issue is reproduced with docker client on my ubuntu machine only which listening unix socket |
@framebassman |
@saidbouras Yep, all works fine with code above |
Used this example as inspiration: spotify/docker-client#513 (comment) However, the clone-and-push.sh test was failing. It's unnecessary so might as well remove it due to some other possible defect. It's not great to remove a test like this (without finding out why it's failing) but that test is really not necessary right now.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Description
I am trying to run commands in the docker containers by using the docker-client library in my integration tests.
This approach is working fine in my local machine (MAC OS X with Docker for Mac). When I try to do the same in the CI machine (Debian Wheezy with kernel v3.16.7) most of the commands fail saying that the socket connection has been reset by peer.
Do you have any idea why this is happening?
How to reproduce
My code looks something like this (Scala):
What do you expect
I would expect to be able to run commands in the docker containers from my Scala code in a similar way as I do with the
docker exec
command line.What happened instead
A few commands are run successfully and then I start getting the error of connection reset by peer.
Software:
docker version
: 1.12.1Full backtrace
The text was updated successfully, but these errors were encountered: