-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. #102
Comments
Do you get any error messages from your IOC about scan processing overruns? |
I cannot find them. |
Sorry, I forgot to upload the above files. It has now been uploaded to the attachment db and proto and cmd ZipFiles.zip The IOC looks normal and there are no errors reported. As follows.
|
I notice all your inputs use "I/O Intr". Does this device send its data automatically, independent of your sendMsg? |
yes, the device sends device status message automatically to the IOC at 100ms interval after the device start. No need to send a request. |
You can download these files and test in your computer. Looking forward to your reply. |
How did you generate the communications trace? I see you set the asynTrace settings, but that does not look like the output of asynTrace, which should have "write" and "read" messages. |
I used a network debugging assistant to simulate communication between an X-ray source and the IOC, so the messages look unchanged. Yes, I have set up asynTrace, and the output is saved in a txt file, which I will upload. The file is somewhat large. |
Some buffer accumulating input or output data over time would explain the degradation if more and more old data is searched (maybe for terminators or 0 bytes) or copied (when buffer is growing). |
Can you run "top" to look at the resources use by the IOC process and see if CPU or memory use are growing when the time between writes gets long? |
Keep observing your IOC process. You may also look at /proc/[PID of your IOC]/statm regularly (every hour or so) and see if any of the numbers keep growing. |
OK,Is the PID of st.cmd of my IOC? |
If you have started the IOC as 'st.cmd', then it is that one (PID 2072). |
If you have started the IOC as 'st.cmd', then it is that one (PID 2072). If you have started the IOC as '[programname] st.cmd', then look for the programname. In your case probably "XraySource", according to what I see in st.cmd. I started this IOC by type './st.cmd' on terminator, and I don't find a process named 'XraySource' or 'Xray'..., so this PID of my IOC should be 2072. This is the output that I type 'ps aux | grep 'Xray'*'.
|
This is the output of "cat /proc/2072/statm".
|
@1458861693 I don't think it is a problem with asyn because others would have reported this problem. The thing that you are doing that is less typical is having I/O Intr scanned input records and periodic output records. I suspect that could be causing an issue in StreamDevice. I don't think you really need to run the network monitoring. The asyn trace output should be sufficient. You can send that output to a file. |
None of the memory numbers is going up. Thus my hypothesis about a buffer accumulating more and more data to process seems to be wrong. |
@MarkRivers , I will stop running the network monitoring, use only asyn trace, and send the output to a file. Please wait for my feedback. |
@dirk-zimoch @MarkRivers @ The Input record have I/O Intr scanned should be normal according to the StreamDevice document, but about output record, if I want to send message to a port periodically and the interval must be very short, how should I handle this? |
What you are doing should work fine. It seems like there may be a problem we were not aware of. |
@MarkRivers @dirk-zimoch And the IOC Shell also displayed the following warning messages.
Reviewing the trace files during these 20 hours, the data sent by the IOC was mostly normal; although there were slight fluctuations in the intervals between data transmissions, there were basically no instances of several-second-long gaps. |
Meanwhile, I am also running another IOC as mentioned above, without using the network monitor, only utilizing the IOC to send data with the scan field set to 0.1 seconds. After 5 hours, based on the timestamps of the data sent from the IOC, the intervals have remained largely consistent with the scan period. I have also uploaded the asyn trace file to the attachment. I will continue to run this IOC indefinitely and will provide feedback here if there are any issues. |
I find it a bit strange that the problems start so suddenly and then affect all the scan threads, even up to "2 second". It is as if the whole IOC is suddenly frozen for at least 24 seconds (12 overruns in a row of the 2 second scan). The sending record is scanned with ".1 second" and the actual I/O is done in the asyn thread "L0". So I would have expected that only those threads would be affected. Problems with CSS on the same host points to a global problem. Also I noticed that you have 518% CPU load (5 of 8 cores?) for the "node" (Node.js probably) program (plus some percentage for python3 and java). Can it be that whatever your JS program is doing kills the host performance? Is the your CentOS8 system a real machine or virtual? In case it is virtual, maybe you need to tune your VM settings? Is your machine accessing NFS mounts? In case there is something wrong with the NFS server, several processes may be affected. In particular if the machine is virtual with an NFS root file system. If the IOC running on raspberrypi is more stable than running on your CentOS8, then the problem is most probably systematic to your CentOS8 machine and not related to the IOC. |
This command will show the CPU time of each thread. It can be useful for finer scale information on what is using CPU. top -h |
Sorry for the late reply as it was the Chinese Spring Festival recently. My CentOS 8 is running on a real machine without mounting NFS. My Control System Studio makes extensive use of jython and Python scripts. Could this affect the operation of the IOC? The IOC running on the RaspberryPi might be more stable than the one on CentOS8 may because only one IOC runs on the RaspberryPi, whereas multiple IOCs and the Control System Studio are running on CentOS8. Is it recommended to use another Linux distribution such as Rocky Linux 8 instead of CentOS 8? This is currently my plan. |
No, CentOS 8 should work fine, even with several IOCs. You should run "top" to see what is the load on the system. How much free CPU and memory capacity is there? Mark |
I have developed an EPICS IOC to control an X-ray source. The IOC needs to continuously send messages to the X-ray source controller at a 100ms interval to maintain the connection with the X-ray source. These messages serve as heartbeat packets. If the time interval between two consecutive heartbeat packets exceeds 3 seconds, the high voltage of the X-ray source will be automatically set to 0kV.
Additionally, upon power-up, the X-ray source sends unsolicited operational status data to the IOC at a 100ms interval. This unsolicited data provides continuous updates on the working status of the X-ray source without requiring any explicit requests from the IOC.
I have developed an IOC using StreamDevice and ASYN, which successfully receives and parses the status data from the X-ray source. Now, I can monitor the status of the X-ray source in real-time. However, there are some issues with sending heartbeat packets to maintain the connection with the X-ray source.
In the db file, I set "SCAN=0.1", which means processing the record every 100ms. As follows:
Initially, the IOC runs well, and the message transmission intervals are around 100ms. As shown in the figure below, data starting with "5A" is sent by the IOC, and data starting with "A5" is sent by the X-ray source. Initially, everything works fine.
However, after running for a while, approximately 2 to 3 hours, I noticed that the message transmission intervals become unstable, sometimes being 100ms, sometimes 400ms, and sometimes even 2000ms. The following figure shows the IOC running after 5 hours.
I am not sure if the message transmission intervals will continue to increase as the IOC keeps running, but I cannot allow this interval to exceed 3 seconds because the X-ray source controller will automatically shut down the high voltage.
I am not sure what is causing this issue. Currently, I suspect that there might be a problem with the ASYN configuration, but after reviewing the documentation, I still haven't found the root cause. I need help to resolve this issue.
The versions of ASYN and StreamDevice I am using are asyn-R4-44 and StreamDevice-master, and the EPICS version is base-7.0.8.
The db, protocol files and st.cmd file are attached.
Thanks very much.
The text was updated successfully, but these errors were encountered: