home Forums # Technical Support Execution speed

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
    Posts
  • #2018
    Unknown
    Member

    Hi,
    Is there a way to improve execution speed of the fuzzylite library ? I can’t see bottlenecks of this library because symbols are not loaded from PDB file in debug mode.
    A simple solution is to decrease resolution of defuzzifier.
    I have commented FL_DBG instructions in fuzzylite source code, is it the good way to disable debugging output ?
    Is there a mean to avoid dynamical allocation in function modify of Consequent class to increase execution speed ?:
    Activated* term = new Activated(_conclusions.at(i)->term, activationDegree, activation);

    #2019

    Hi,

    thank you for your post. I am not familiar with loading symbols from PDB file in debug mode, but you can always recompile fuzzylite. If you do, please let me know if I can add something to the CMake configuration to allow this. Also, let me know of any bottlenecks you may find.

    I think removing some dynamic_cast<> (or replacing them with static_cast<>) can improve performance, not sure by how much, though. For example, if you are not going to utilise rule chaining, you could remove dynamic_cast<OutputVariable*> in Antecedent::activationDegree(). Alternatively, you could add method to Variable to determine whether it is Input or Output, hence replacing dynamic_cast<>s to type() == OutputVariable and then using static_cast. In addition, you could get rid of other dynamic_cast<> in Antecedent::activationDegree() by adding boolean method to Proposition. I will review these cases in the current version in progress, but would be very helpful to know if you find significant performance improvements in doing so.

    Also, could you please post an example of your controller? Maybe I could suggest something on its design.

    Cheers.

    #2020
    Unknown
    Member

    Hi,

    I have loaded symbols from pdb files in visual studio 2013 to do performance analysis with cpu sampling.
    see below for most time consuming functions:

    Nom de la fonction	Échantillons inclusifs	Échantillons exclusifs	% d'échantillons inclusifs	% d'échantillons exclusifs
    std::_Lockit::_Lockit	                7 216	7 216	3,07	3,07
    operator delete	                        7 145	7 145	3,04	3,04
    std::_Lockit::~_Lockit	                7 121	7 121	3,03	3,03
    operator new	                        5 456	5 456	2,32	2,32
    fl::Antecedent::activationDegree	78 861	5 094	33,56	2,17
    _RTC_CheckStackVars	                3 684	3 684	1,57	1,57
    _RTC_CheckEsp	                        3 654	3 654	1,56	1,56
    std::_Iterator_base12::_Orphan_me	3 216	3 216	1,37	1,37
    fl::Operation::isEq	                7 520	3 131	3,20	1,33
    __RTDynamicCast	                        3 105	3 105	1,32	1,32
    fl::Accumulated::membership	        19 031	2 979	8,10	1,27
    free	                                2 970	2 970	1,26	1,26
    fl::Ramp::membership	                13 160	2 931	5,60	1,25
    std::_Iterator_base12::_Adopt	        11 232	2 875	4,78	1,22
    _BitBlt@36	                        2 814	2 814	1,20	1,20
    fl::Operation::isNaN<double>	        2 700	2 700	1,15	1,15
    fl::Activated::membership	        12 797	2 531	5,45	1,08
    std::_Iterator_base12::operator=	11 659	1 616	4,96	0,69
    std::_Iterator_base12::~_Iterator_base12	11 895	1 460	5,06	0,62
    _VEC_memset	                        1 226	1 226	0,52	0,52
    std::_Iterator_base12::_Iterator_base12	12 937	1 199	5,51	0,51
    std::_Container_base12::_Orphan_all	1 127	1 127	0,48	0,48
    fl::Centroid::defuzzify	                20 968	954	8,92	0,41
    std::vector<fl::Activated *,std::allocator<fl::Activated *> >::size	827	827	0,35	0,35
    std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >	14 616	730	6,22	0,31
    std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >::~_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >	14 033	669	5,97	0,28
    std::reverse_iterator<std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > >::reverse_iterator<std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > > >	16 290	596	6,93	0,25
    std::_Iterator012<std::random_access_iterator_tag,fl::Hedge *,int,fl::Hedge * const *,fl::Hedge * const &,std::_Iterator_base12>::_Iterator012<std::random_access_iterator_tag,fl::Hedge *,int,fl::Hedge * const *,fl::Hedge * const &,std::_Iterator_base12>	13 700	586	5,83	0,25
    fl::Operation::isLt	2 929	574	1,25	0,24

    I have deleted lines related to other libraries.
    Dynamical allocation is very time consuming.
    I have replaced dynamic_cast by static_cast for Variable but there is no significant improvement.

    In Antecedent::activationDegree function, bottlenecks are at this lines :

    28.9%: return conjunction->compute(
                        this->activationDegree(conjunction, disjunction, fuzzyOperator->left),
                        this->activationDegree(conjunction, disjunction, fuzzyOperator->right));
    10.4%: if (fuzzyOperator->name == Rule::andKeyword()) {
    
    6.9%: for (std::vector<Hedge*>::const_reverse_iterator rit = proposition->hedges.rbegin();
    8.9%:                    rit != proposition->hedges.rend(); ++rit) {
    
    2.1%: result = proposition->term->membership(inputVariable->getInputValue());

    FFLL library is basic but fast, there are some explanations here :

    I need a fast library because I optimize a parameter so there are a lot of call to engine::process function.
    see below my controller:

    engine = new fl::Engine("fuzzy_engine");
    	
    osc = new fl::InputVariable("OSC", -300, 300);
    osc->setInputValue(0);
    osc->setEnabled(true);
    osc->addTerm(new fl::Ramp("UP", -134, -300));
    osc->addTerm(new fl::Ramp("DOWN", 134, 300));
    osc->addTerm(new fl::Ramp("EXIT_UP", -100, -300));
    osc->addTerm(new fl::Ramp("EXIT_DOWN", 100, 300));
    engine->addInputVariable(osc);
    	
    dosc = new fl::InputVariable();
    dosc->setName("DOSC");
    dosc->setRange(-300, 300);
    dosc->setEnabled(true);
    dosc->addTerm(new fl::Ramp("DOWN", 0,  -300));
    dosc->addTerm(new fl::Ramp("UP", 0, 300));
    engine->addInputVariable(dosc );
    
    dvar2 = new fl::InputVariable("DVAR2", -100, 100);
    dvar2->setInputValue(0);
    dvar2->addTerm(new fl::Ramp("DOWN", -4.3, -100));
    dvar2->addTerm(new fl::Ramp("UP", 4.3, 100));
    engine->addInputVariable(dvar2);
    
    dvar3 = new fl::InputVariable();
    dvar3->setName("DVAR3");
    dvar3->setRange(-5, 5);
    dvar3->setEnabled(true);
    dvar3->addTerm(new fl::Ramp("DOWN", -0.8, -5));
    dvar3->addTerm(new fl::Ramp("UP", 0.8, 5));
    engine->addInputVariable(dvar3 );
    
    dvar12 = new fl::InputVariable();
    dvar12->setName("DVAR12");
    dvar12->setRange(-10, 10);
    dvar12->setEnabled(true);
    dvar12->addTerm(new fl::Ramp("DOWN", 0, -10));
    dvar12->addTerm(new fl::Ramp("UP", 0, 10));
    engine->addInputVariable(dvar12);
    
    dvar1a = new fl::InputVariable();
    dvar1a->setName("DVAR1A");
    dvar1a->setRange(-30, 30);
    dvar1a->setEnabled(true);
    dvar1a->addTerm(new fl::Ramp("DOWN", -2, -10));
    dvar1a->addTerm(new fl::Ramp("UP", 2, 10));
    dvar1a->addTerm(new fl::Ramp("EXIT_DOWN", -17, -30));
    dvar1a->addTerm(new fl::Ramp("EXIT_UP", 17, 30));
    engine->addInputVariable(dvar1a );
    
    dvarb1 = new fl::InputVariable();
    dvarb1->setName("DVARB1");
    dvarb1->setRange(-30, 30);
    dvarb1->setEnabled(true);
    dvarb1->addTerm(new fl::Ramp("UP", -2, -10));
    dvarb1->addTerm(new fl::Ramp("DOWN", 2, 10));
    dvarb1->addTerm(new fl::Ramp("EXIT_UP", -17, -30));
    dvarb1->addTerm(new fl::Ramp("EXIT_DOWN", 17, 30));
    engine->addInputVariable(dvarb1 );
    
    entry = new fl::OutputVariable("ENTRY", -1, 1);
    entry->setDefaultValue(0.0);
    entry->setEnabled(true);
    entry->fuzzyOutput()->setAccumulation(new fl::Maximum);
    entry->setDefuzzifier(new fl::Centroid(200));
    entry->addTerm(new fl::Ramp("ENTRY1", 0, -1));
    entry->addTerm(new fl::Ramp("ENTRY2", 0, 1));
    engine->addOutputVariable(entry);
    
    exit = new fl::OutputVariable("EXIT", -1, 1);
    exit->setDefaultValue(0.0);
    exit->setEnabled(true);
    exit->fuzzyOutput()->setAccumulation(new fl::Maximum);
    exit->setDefuzzifier(new fl::Centroid(200));
    exit->addTerm(new fl::Ramp("EXIT2", 0, -1));
    exit->addTerm(new fl::Ramp("EXIT1", 0, 1));
    engine->addOutputVariable(exit);
    
    ruleblock = new fl::RuleBlock("rules");
    ruleblock->setEnabled(true);
    ruleblock->setConjunction(new fl::Minimum());
    ruleblock->setDisjunction(new fl::Maximum);
    ruleblock->setActivation(new fl::Minimum);
    ruleblock->addRule(fl::Rule::parse("if OSC is DOWN and DOSC is DOWN and DVAR2 is DOWN and DVAR3 is DOWN and DVARB1 is DOWN then ENTRY is ENTRY2", engine));
    ruleblock->addRule(fl::Rule::parse("if OSC is UP and DOSC is UP and DVAR2 is UP and DVAR3 is UP and DVAR1A is UP then ENTRY is ENTRY1", engine));
    ruleblock->addRule(fl::Rule::parse("if OSC is EXIT_DOWN and DOSC is DOWN and DVARB1 is EXIT_DOWN then EXIT is EXIT1", engine));
    ruleblock->addRule(fl::Rule::parse("if OSC is EXIT_UP and DOSC is UP and DVAR1A is EXIT_UP then EXIT is EXIT2", engine));
    
    engine->addRuleBlock(ruleblock);

    Thanks a lot for your help!

    Cheers

    #2021

    Hi,

    thank you for your performance check. I will check this in detail for the next version. However, I can see the problem is in Antecedent::activationDegree(), where there are a few dynamic_cast<> that could be removed by adding the necessary methods to check the type of Expression. I am not sure what dynamic_cast you changed for static_cast, and this should be done very carefully. If you are interested in performing some changes that I have in mind, and then measuring the performance, I could provide more details. Let me know. The changes I am thinking of involve removing the dynamic_cast from Antecedent::activationDegree(). They should improve the performance.

    Cheers.

    #2022

    Hi,

    I have removed the dynamic_cast I suggested, and got an average performance improvement of 7%. I further followed your suggestions, and I changed the Accumulated::vector<Activated*> to Accumulated::vector<Activated>, achieving a further 10% of performance. Overall, the changes I am testing have improved the average performance by 15%. I will continue reviewing the performance and I expect to push the changes later today to the master branch (v6.0).

    Thanks for your help.

    Cheers.

    #2024
    Unknown
    Member

    Hi,

    I have only replaced dynamic_cast by static_cast for Variable class. Yes, I can perform some change to do performance analysis
    Thanks

    Cheers

    #2025

    Hi,

    please check the following performance improvements: https://github.com/fuzzylite/fuzzylite/commit/2661878364685e17a2fc286e41b6d647066722b2.

    However, have in mind that I need to undo many changes performed in commit https://github.com/fuzzylite/fuzzylite/commit/33119032d12e3337cbe8efa984086ce7379f1081, where I chose some methods over access to properties, which have had a significant impact in performance. I am still working on this.

    Also, I am making these changes available for version 6.0, not for 5.x.

    #2026

    Oh, and I forgot to mention: a quick way to significantly improve performance would be to recompile fuzzylite with the definition -DFL_USE_FLOAT=ON, as it will convert every scalar value from double to float. I will measure its performance later.

    #2028
    Unknown
    Member

    Hi
    Thanks for your help. The changes have improved the performance by 6% and 11% with -DFL_USE_FLOAT=ON.
    see below for the new performance report:

    Nom de la fonction	Échantillons inclusifs	Échantillons exclusifs	% d'échantillons inclusifs	% d'échantillons exclusifs
    std::_Lockit::_Lockit	164 400	164 400	3,86	3,86
    std::_Lockit::~_Lockit	154 973	154 973	3,64	3,64
    fl::Antecedent::activationDegree	1 827 003	142 519	42,92	3,35
    std::_Iterator_base12::_Adopt	258 770	71 718	6,08	1,68
    std::_Iterator_base12::_Orphan_me	69 654	69 654	1,64	1,64
    fl::Aggregated::membership	370 004	57 360	8,69	1,35
    fl::Operation::isEq	199 016	51 960	4,68	1,22
    fl::Operation::isNaN<float>	47 240	47 240	1,11	1,11
    fl::Ramp::membership	292 940	43 361	6,88	1,02
    fl::Activated::membership	237 920	42 495	5,59	1,00
    std::_Iterator_base12::operator=	251 390	36 560	5,91	0,86
    std::vector<fl::Activated,std::allocator<fl::Activated> >::size	30 772	30 772	0,72	0,72
    std::_Container_base12::_Orphan_all	29 263	29 263	0,69	0,69
    std::_Iterator_base12::~_Iterator_base12	260 649	29 018	6,12	0,68
    std::_Iterator_base12::_Iterator_base12	280 520	26 367	6,59	0,62
    std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<fl::Hedge *> > >	318 347	16 826	7,48	0,40
    

    In Antecedent::activationDegree function, bottlenecks are at this lines :
    `37.6%: return conjunction->compute(
    this->activationDegree(conjunction, disjunction, fuzzyOperator->left),
    this->activationDegree(conjunction, disjunction, fuzzyOperator->right));

    14%: if (fuzzyOperator->name == Rule::andKeyword()) {

    8.6%: for (std::vector<Hedge*>::const_reverse_iterator rit = proposition->hedges.rbegin();
    11.3%: rit != proposition->hedges.rend(); ++rit) {

    3.3%: result = proposition->term->membership(inputVariable->getInputValue());`

    Cheers

    #2029

    Hi,

    thanks for your feedback. The only way to further improve performance in activationDegree is as follows.

    (1) Change if (fuzzyOperator->name == Rule::andKeyword()) { for if (fuzzyOperator->name == "and") {. After the performance studies I have performed, I was surprised to see a performance impact when using methods instead of class properties. For example, see this commit, where I changed the method getHeight() for a call to Term::_height and improved performance. However, I will not change the Rule::andKeyword() method over performance given the flexibility it provides to rename the and keyword.

    (2) if you are not using Hedges, you could enclose the for-loop statement `for (std::vector::const_reverse_iterator rit = proposition->hedges.rbegin();
    11.3%: rit != proposition->hedges.rend(); ++rit) {into anif` statement. This could improve performance a bit more. See latest commit 1759d9159d9a046f3eff9855399f5da2ca5d0ff2

    (3) if you check the commit I mentioned earlier (namely 053052a850a81971e207d84ef88573a1c9543aea) you could improve performance by declaring FL_IS_NAN(x) instead of calling the method Op::isNaN(x). This will slightly improve performance, too.

    Lastly, I am not sure how you are measuring the performance of your controller, but you could also check the code in Console::benchmarkExamples. Basically, measure the average of ten runs exporting your controller with FldExporter for a resolution specified based on the number of input variables and amount of time you are willing to wait. Compiling with FL_CPP11=ON in v5.1 (from branch release), from console I just run fuzzylite benchmarks path/to/examples 10 and it will measure the performance over ten runs of almost every example included in the examples/original folder.

    Thanks again for your feedback, and let me know if you find other ways to further improve performance.

    Cheers.

Viewing 10 posts - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.