SERVICE Implementation?

I was writing an asynchronous logging framework, where I had multiple threads dumping data. I started playing around Boost asio because it offered some easy ways to enforce serialization and ordering. Since I am a beginner, I started my design with a thread safe (used boost::mutex and boost:condition_variable ) circular bounded_buffer (which was actually vector).

I wrote a small simple benchmark to measure the performance. The benchmark is just a single thread logging a million messages (pushing it into the buffer) , and my worker thread would just grab the messages from the queue to log to file/console/list of loggers. (PS The usage of mutex and CV were correct, and pointers to messages were being moved around, so from that perspective everything was fine/efficient ).

When I changed my implementation to instead using boost::asio::io_service and and having a single thread executing run() the performance really improved (actually it scaled really well on increasing the number of messages being logged as opposed to degrading performance in my initial simple model)

Here are few questions that I want to clear.

  • Why performance improvement? (I thought boost::asio::io_service internal implementation has thread safe queue for handlers , what makes it so much more efficient than my own initial simple thread safe queue design ). Please take note that my design was well reviewed and had no faults as such (the skeleton code was based on proved examples), could someone shed more light on internal details of how io_service implements this.

  • The second interesting observation was that on increasing threads, my initial implementation performance improved but at the cost of losing serialization/ordering, but performance degraded (very slightly) with boost::asio (i think that is because my handlers were doing very simplistic task and context switching overhead was degrading, i will try putting more complex task and post my observations later).

  • I would really like to know if boost::asio is just meant for i/o and network operations or is my usage of using it for doing concurrent task (parallel) through a thread pool is a good design approach. Is io_service object just meant to be used for i/o objects (as written in documentation) , but I found it a really interesting way of helping me solving concurrent tasks (not just i/o or networking related) in serialized way ( sometimes enforcing ordering using strands). I am new to boost, and really curious why the basic model didn't perform/scale as well as when i used boost asio.

  • Results: (in both i just had 1 worker thread )

  • 1000 task : 10 micro sec/task in both cases
  • 10000 task : 80 micro sec (bounded buffer) , 10 micro sec in boost asio
  • 100000 task : 250 micro sec (bounde buffer) , 10 micro sec in boost asio
  • It would be interesting to know how boost solves thread safe problem in io_service thread safe queue for handlers (i always thought at some level of implementation they also have to use locks and cv ).


    I'm afraid I can't help much with (1), but with respect to the other two questions:

    (2) I have found that there are some overheads in the boost::asio architecture that are non-deterministic, ie that the delays between data coming in (or getting sent to an IO service object) can vary from virtually instant response up to the order of hundreds of milliseconds. I have attempted to quantify this as part of another problem I was trying to solve with respect to logging and timestamping RS232 data, but haven't gotten any conclusive results or ways to stabilise the latency. I would not be surprised at all to find that similar issues existed with the context switching component.

    (3) As far as using boost::asio for tasks other than asynchronous I/O, it is now my standard tool for the majority of asynchronous operations. I use boost::asio timers all the time for asynchronous processes, and to generate timeouts for other tasks. The ability to add multiple worker threads into the pool means that you can scale the solution well for other asynchronous high-load tasks as-well. My simplest and favourite class I have written in the last year is a tiny little worker thread class for boost::asio IO services (apologies if there are any typos, this is a transcription from memory rather than a cut & paste):

    class AsioWorker
    {
    public:
      AsioWorker(boost::asio::io_service * service):
      m_ioService(service), m_terminate(false), m_serviceThread(NULL)
      {
        m_serviceThread = new boost::thread( boost::bind( &AsioWorker::Run, this ) )
      }
      void Run( void )
      {
        while(!m_terminate)
          m_ioService->poll_one();
          mySleep(5); // My own macro for cross-platform millisecond sleep
      }
      ~AsioWorker( void )
      {
        m_terminate = true;
        m_serviceThread->join();
      }
    private:
      bool m_terminate;
      boost::asio::io_service *m_ioService;
      boost::thread *m_serviceThread;
    }
    

    This class is a great little toy, just add new ones as needed, and delete some when you're done with them. Stick a std::vector<AsioWorker*> m_workerPool into a device class that uses boost::asio and you can wrap even further the thread-pool management stuff. I've always been tempted to write an intelligent pool auto-manager based on timing to grow the thread pool as appropriate, but I haven't had a project where it was necessary yet.

    With respect to satisfying your curiosity on thread safety, it is possible to dig into the guts of boost to find out exactly how they do what they're doing. Personally I have always taken most of the boost stuff at face value and assumed from past experience that it's pretty well-optimised under the hood.


    I've also found boost::asio to be excellent infrastructure for a general multi-core processing engine. I measured it's performance on a fine-grained task with lot's of synchronization and found that it outperformed a "classic" implementation I wrote using the C++11 threads and condition variables.

    It also outperformed TBB but not by as much. I dug into their code to try to find the "secret". The only think I can see is that their queue is a classic linked list, not an stl container.

    For all that, I'm not sure how well asio would scale on a massively threaded architecture like the Xeon Phi. The two things that seem to be missing are:

  • a priority queue and
  • a work stealing queue.
  • I suspect that adding these features would bring it down to the TBB performance level.

    链接地址: http://www.djcxy.com/p/58418.html

    上一篇: CoCreateInstance返回“未注册的类”

    下一篇: 服务实施?