intermittent error with webservice client

We are seeing an intermittent issue on some of our production servers. By intermittent I mean this is currently affecting less than 1% of our total jobs running, and only shows up in 2 of our ~20 servers (where we've noticed this at least).

Our setup is this: We have a custom piece of software which is a bastardized version of old VB6 and C#.net code. The program is a webscraping engine for our own in-house scripts. The program is executed across a server park where each server is running 50-150 instances at a time, each with an individual script.

What happens is that sometime after initial loading the program in questions will attempt to contact a webservice to get a collection of settings. Once in a while, we get this problem:

System.IO.FileNotFoundException: 
Could not find file 'C:Documents and SettingsccrunLocal SettingsTempdriumfrd.dll'.  File name: 'C:Documents and SettingsccrunLocal SettingsTempdriumfrd.dll'     
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)     
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy)     
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share)     
at Microsoft.CSharp.CSharpCodeGenerator.FromFileBatch(CompilerParameters options, String[] fileNames)     
at Microsoft.CSharp.CSharpCodeGenerator.FromSourceBatch(CompilerParameters options, String[] sources)     
at Microsoft.CSharp.CSharpCodeGenerator.System.CodeDom.Compiler.ICodeCompiler.CompileAssemblyFromSourceBatch(CompilerParameters options, String[] sources)     
    ...

Our logging limit is hit after this. The .dll name is different at every execution. The is 2 layers of indirection away from the VB6 code, so I'm fairly certain this is a purely C# issue What I've been able to find on Google so far, is that this is related to the dynamic compilation of the web service client code. Where my google-fu stops short is in finding out why we don't get this error all the time. Permissions can't be wrong, since not all jobs are failing. The exact same job will complete without any errors when restarted on the very same server.

The only indicator we've been able to discern is that jobs usually fail in clusters where most, but not all jobs started at the same time (and on the same server), will fail. Other than that, we don't really have anything good to go by here.

Best link I've found so far is this: http://social.msdn.microsoft.com/Forums/en-US/asmxandxml/thread/d7ea81e7-8fea-4056-ad21-f2fee1887bcc

Edit: This is very very odd, after some additional investigations I noticed that the error messages in our logs had the wrong error code.

public entry_function()
{
    try
    {
        do stuff..
        main_function();
    }
    catch (Exception exp)
    {
        // General error
        _log.EventID = 57051;
        _log.WriteToErrorLog(Log.Level.ERROR, "Unhandled exception", exp);
    }
}

public main_function()
{
    do more stuff...
    helper function();
}

public helperfunction()
{
    try
    {
        switch()
        {
            ...
            case WebServices.WSMarkAsInvalid:
            {
                // Info logger
                _log.EventID = 57114;
                _log.WriteToInfoLog(Log.Level.INFO, "Call WSMarkAsInvalid start");

                new WSSystem.WSSystem().WSSystemMarkAsInvalid((string)parameters[0], (string)parameters[1], (int)parameters[2]);

                // Info logger
                _log.EventID = 57115;
                _log.WriteToInfoLog(Log.Level.INFO, "Call WSMarkAsInvalid end");

                return null;
            }
        }                           
    }
    catch(Exception exp)
    {   
        _log.EventID = 57120;
        _log.WriteToErrorLog(Log.Level.WARN, "Error communicating with webservice", exp);
    }
}

Ignoring the obvious pseudocode bits, I'm seeing 4 cases where a 57114 is followed by a 57120 Warning, and 39 cases where 57114 is follow by 57051!

I'm totally at a loss here, for all I can tell, the inner try/catch isn't getting hit, despite matching "any" Exception.


My initial guess based on the stacktrace that you provided, would be to say that the temp folder is getting filled to capacity and that file is not getting written to the temp folder and that is why you are seeing the IO error. You may need to check to see if your application is generating too many temp files and work out a mechanism for purging them. But of course, it is early and I may be totally wrong :)


Our final solution was to move away from Webservices completely and instead query the databases directly through SQL. Not the most elegant solution, but better than having critical executions fail on a daily basis in a totally unpredictable manner.

链接地址: http://www.djcxy.com/p/91874.html

上一篇: 带有android.os.NetworkOnMainThreadException错误的Webservice客户端

下一篇: 与web服务客户端间歇性错误