« Antirez: You Need to Think in Terms of Organizing Your Data for Fetching | Main | How UltraDNS Handles Hundreds of Thousands of Zones and Tens of Millions of Records »
Tuesday
Oct092012

Batoo JPA - The new JPA Implementation that runs over 15 times faster...

This post is by Hasan Ceylan, an Open Source software enthusiast from Istanbul.

I loved the JPA 1.0 back in early 2000s. I started using it together with EJB 3.0 even before the stable releases. I loved it so much that I contributed bits and parts for JBoss 3.x implementations.

Those were the days our company was considerably still small in size. Creating new features and applications were more priority than the performance, because there were a lot of ideas that we have and we needed to develop and market those as fast as we can. Now, we no longer needed to write tedious and error prone xml descriptions for the data model and deployment descriptors. Nor we needed to use the curse called “XDoclet”.

On the other side, our company grew steadily, our web site has become the top portal in the country for live events and ticketing. We now had the performance problems! Although the company grew considerably, due to the economics in the industry, we did not make a lot of money. The challenge we had was our company was a ticketing company. Every e-commerce business has high and low seasons. But for ticketing there is low seasons and high hours. While you sell avarage x tickets an hour, when a blockbuster event goes on sale suddenly demand becomes 1000s of xs for an hour. Welcome to hell!

We worked day and night to tweak and enhance the application to use whatever available to keep it up on a big day. To be frank there was always a bigger event that was capable of bringing the site down no matter how hard we tried.

The dream was over, I came to realize that developing applications on top of frameworks is a bit “be careful!” along with “fun”.

I Kept Learning

I loved programming, I loved Java, I loved opensource. I developed almost every possible type applications on every possible platform I could. For the rest I went in and discovered stuff. I learned a lot from masters thanks to open source. In contrast to most, I read articles and codes written by great programmers like Linus Torvalds, Gavin King, Ed Merks and so many others.

With the experiences I gathered, I quit the ticketing company I loved and became a Software Consultant. This opened a new era in front of me that there were a lot of industries and a lot of different platforms and industries.

In each project I became the performance police of the application.

I am now the performance freak!

I Took The Red Pill!

One day I said to myself, could JPA be faster? If yes, how fast can it be. I spent about two weeks to create an entitymanager that persisted and loaded entities. Then I ran it and compared the results to ones off of Hibernate. The results were not really promising I was only about %50 faster than Hibernate in persisting and finding the entities. I spent another week to tweak the loops, cached metamodel chunks, changed access to classes from interfaces to abstract classes, modified the lists to arrays and so many other things. Suddenly I had a prototype that were 50+ times faster than Hibernate!

Development of Batoo JPA

I was astonished by how drastically performance went up by just paying attention to performance centric coding. By then I was using Visual VM to measure the times spent in the JPA layer. I got down and wrote a self profiling tool that measured the CPU resources spent at the JPA Layer and started implementing every aspect of the JPA 2.0 Specification. After each iteration I re-run the benchmark and when the performance dropped considerably I went back to changes and inspected the new code line by line - the profiling tool I created reported performance hit of every line of the JPA Stack.

It took about 6 months to implement the specification as a whole, on top of it, I introduced a Maven Plugin to create bytecode instrumentation at build time and a complementary Eclipse Plugin to allow use of instrumentation in Eclipse IDE.

After a carriage of 6 months Batoo JPA was born in August 2012. it measured over 15 times faster than Hibernate.

Benchmark

As stated earlier, a benchmark was introduced to measure every micro development iteration of Batoo JPA. This benchmark was not created to put forward the areas Batoo JPA was fast so that other would believe in Batoo JPA, but was created to put together a most common domain model and persistence operations that existed in almost every JPA application - so that I knew how fast Batoo JPA was.

Performance Metrics

The scenario is:

  • A Person object
    • With phonenumbers - PhoneNumber object
    • With addresses - Address object
      • That point to country - Country Object


Common life-cycle tasks has been introduced:

  • Persist 100K person objects with two phone numbers and two addresses in lots of 10 per session
  • Locate and load 250K person objects with lots of 10 per session
  • Remove 5K person objects with lots of 5 per session
  • Update 100K person objects with lots of 100
  • Query person objects 25K times using Object Oriented Criteria Querying API.
  • Query person objects 25K times using JPQL - Java Persistence Query Language, an SQL-like query scripting language.


For the sake of simplicity, the benchmark was run on top of in-memory embedded Derby with the profiler slicing the times spent at the

  • Unit Test Layer
  • JPA Layer
  • Derby Layer


The times spent at the Unit Test Layer is omitted from the Results due to irrelevancy.

Results

The times given in the below tables are in milliseconds spent in the JPA layer while running the benchmark scenario. The same tests are run for Batoo and Hibernate JPA in different runs to isolate boot, memory, cache, garbage collection etc. effects.

The tables below show

  • the total time spent at Derby Layer as DB Operation Total
  • the type of the test as Test
  • the times for each test at Derby Layer as DB Operation
  • the times for each test at JPA Layer as Core Operation
  • the total time spent at JPA Layer as Core Operation Total
  • the total time spent at both JPA and Derby Layers as Operation Total




Below are the ratios of CPU resources spent by Hibernate and Batoo JPA. It is assumed that an an application generates average 1 save, 5 locate, 2 remove and 3 update and 5 + 5 total of ten queries in ratios. Now although these numbers are extremely dependent on the application nature, some sort of assumption is needed to measure the overall speed comparison.


Given the scenario above, Batoo JPA measures over 15 times faster than Hibernate - the leading JPA implementation.

As you may have noticed Batoo JPA not only performs insanely fast at the JPA Layer it also employs a number of optimizations to relieve the pressure on the database. This is why Batoo JPA measures half the time at DB Layer in comparison to the one off of Hibernate.

Interpretation of Results

We do appreciate that JPA is not the single part of an application. But we do believe that the current JPA implementation consume quite a bit of your server budget. While a typical application cluster spends CPU resources for persistence layer about %20 to %40, Batoo JPA will well be able to bring your cluster down to half of its size allowing you save a lot on licensing administration and hardware, as well as room to scale up even for non-cluster friendly applications - in my experience I saw applications running on 96 core Solaris systems simply because they are not scalable.

Conclusion

We have managed to create a JPA Product that allows you to enjoy the great features of JPA Technology but also do not require you to compromise on performance!

On top of that Batoo JPA is developed using the Apache Coding Standards and has valuable documentation within the code. The project codebase is released with LGPL license and there is absolutely no closed source part and we envision that it would be that way forever.

As stated earlier, it also has a complementary Maven and Eclipse plugin to provide instrumentation for build and development phases.

Batoo JPA deviates from the specification almost zero, making it easy for existing JPA applications be migrated to Batoo JPA, while requiring no additional learning phase to start using it.

Last but not the least, Batoo JPA not only saves you when you run your application, but also during the time you deploy your application. Batoo JPA employs parallel deployer managers to handle deployment in parallel. Considering a developer deploys the application during his / her development phase well 10x times a day if not 100, with a moderately large domain model this may take quite a bit of developers time when summed up. Although we haven’t made a concrete benchmark on deployment speed, we know that Batoo JPA  deploys about 3 4 times faster than Hibernate.

Reference

 

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (39)

Very interesting topic. I'm certainly going to evaluate this product.

Why is it free, BTW? Hib is free because it helps to promote JBoss. I though would be somewhat hesitated to relay on a fully free JPA implementation, knowing how much efforts it takes to keep it up-to-date and bug-free. I would sleep better if I knew the developers are well paid for it and spend all their day time (not moonlighting) making the product better.

October 9, 2012 | Unregistered CommenterVlad

Dear Vlad,

Thank you for your comment.

I have made my life and career thanks to open source software. And I have very good confidence in Batoo JPA that the enterprises will pay for support when the product is out of incubation. That is why it is open source and free. That will be the first of a series of high performance implementations that we are planning. I am sure it will pay off in the long term to stay as OSS and free.

October 9, 2012 | Unregistered CommenterHasan Ceylan

Dear Vlad, thank you for your comments.

I have earned my life and built my whole career on Open Source. I learnt a lot, and it is time to give back.

On the other hand, I have much confidence in my product that, it will generate the required revenue in the long run. In addition to that we are planning to release a few other technologies along with JPA and their focus again will be on performance.

October 10, 2012 | Unregistered CommenterHasan Ceylan

How does one actually run the benchmark? The README mentions eclipse .launch files. Is that the only way?

October 10, 2012 | Unregistered CommenterSteve Ebersole

"In contrast to most, I read articles and codes written by great programmers like Linus Torvalds, Gavin King, Ed Merks and so many others."

^ In addition to my skepticism of the 15x claim, you and your project lost all credibility there.

October 10, 2012 | Unregistered Commenteranon

Dear @Steve, thank you for your interest. Currently the only way to run the benchmark is running it in Eclipse. This is because the benchmark was originally created for internal use. I will create a README in benchmark how to use it. It is not shoot and get the results doesn't much to run either.

@Anon, if you like you can mail me your contact details and we can run the benchmark together. To be Frank, I do no get, what's wrong with reading and learning from the codes created by great developers is a bad thing? After all, there is no need to explain anything, it is open source and it has the benchmark. Although we created the benchmark to be unbiased, solely to measure Batoo over Hibernate and others, the benchmark is also open source, you can check out the code and run the test for yourself.

October 10, 2012 | Unregistered CommenterHasan Ceylan

Not following what you are saying. You will add README about how to run benchmarks outside Eclipse? There is already a README that references running them in Eclipse...

Also, the last sentence is not clear to me.

I wanted to run it because you left off very pertintent Hibernate configuration that would make Hibernate run very much faster here. So I want to try myself and see the numbers.

October 10, 2012 | Unregistered CommenterSteve Ebersole

@Steve,

Benchmark has been converted to perform the tests during maven build.
Also it now has a SUMMARY mode that is currently a static boolean and by default it is true.
you may find the test output in /benchmark/target/surefire-reports/org.batoo.jpa.benchmark.BenchmarkTest-output.txt

October 11, 2012 | Unregistered CommenterHasan Ceylan

Congrats for releasing another JPA implementation.
However I'd have some questions:
- why is your benchmark mainly targeting batch processing operations like massive delete/insert ? ORMs are supposed to allow one application to load a relatively complex object graph, execute complex OO queries and your benchmark is not targeting this at all.
- still on the batch processing topic. It seems you are not applying good practices related to JDBC batching, are you ?
- last but not least, you are using local in memory derby instance, did you try on a remote MySQL instance? My point is that the _relative_ time spent on JPA internals should not be significant compared to network serialization/deserialization, remote DB access, db index traversal.

October 11, 2012 | Unregistered CommenterAnthony Patricio

Vlad, Hibernate is not free because it "helps promote JBoss". Hibernate was around and already largely popular before any association with JBoss (the Hibernate project is over 10 years old).

Hibernate is free because we believe in Open Source as the best model for developing kick-ass software.

October 11, 2012 | Unregistered CommenterSteve Ebersole

Nice Job. [=

Good to know about others JPA implementations.

October 12, 2012 | Unregistered CommenterHebert

Dear Anthony,

> Congrats for releasing another JPA implementation.
Thanks a lot. It will mean nothing without fellow community members like yourself. ;)

>However I'd have some questions:
>- why is your benchmark mainly targeting batch processing operations like massive delete/insert ? ORMs are supposed to allow one application to load a relatively complex object graph, execute complex OO queries and your benchmark is not targeting this at all.

On top of that FIND operation also navigates over lazy collection phones to simulate the graph navigation.

I have done similar END-TO-END tests on real application with JSF, Webservice, Restful service, etc natures. Although it is hard to measure the JPA layer alone and compare, Batoo JPA accounted about 1.5 - 5 times overall speed improvements on those trials, again these numbers are end-to-end including the HTTP and JSF / WS / Restful layers. For the later part, although not extremely I think the model in the benchmark has moderate complexity. Having said I would be happy to benchmark with real world examples but I would need support from the community members.

> - still on the batch processing topic. It seems you are not applying good practices related to JDBC batching, are you ?

Benchmark is indeed not focused on batch processing. In contrast to jpab.org benchmark which operates on hundreds even thousand entities, the benchmark persists 10 persons, remove 5 persons, update updates 1 person in a single transaction. Therefore I do not consider this as 'batch'. Having said that I specifically choose not to work with a single person in all operations to see how the prioritization in Hibernate and Batoo compares. (see next part). Could it be the case you missed the transaction size and saw the total operation sizes? Given the fact that some of the JPA developers start (EJB) transactions pervasively, finding the entities changed is also adds an important task to JPA Implementation.

On batch processing, I considered that option. However to maintain the referential integrity, the order of the statements are critical. On top of that (of course excl. the db payload) Batoo JPA is fast enough that batching the operations will not add significant gain on performance but the batching operation will add overhead to application that never do batch operations at all. AFAIK, none of the JPA frameworks use batches.

> - last but not least, you are using local in memory derby instance, did you try on a remote MySQL instance? My point is that the _relative_ time spent on JPA internals should not be significant compared to network serialization/deserialization, remote DB access, db index traversal.

You have a point in serialization overhead. I have done the same test with MySQL and the performance was still above 12x.

Hope that I covered all questions and I appreciate the time you have taken to analyze and come up with useful questions / suggestions.

Regards,
Hasan Ceylan

October 13, 2012 | Unregistered CommenterHasan Ceylan

Ok, so I started looking into the benchmark a little bit because some people asked me to. In my humble opinion, as Anthony points out, this is benchmark falls into the bucket of essentially testing batch oriented processes. That is not to say the benchmark has no merit at all. Batch process are a part of many applications using Hibernate and JPA. And clearly they should perform well (enough).

In general I need to point out what seems completely illogical in how you came to this "15 times" figure. On initial checkout, here were the results of my run:


=============================================================
Prvdr | Total Time | JPA Time | DB Time | Name Of The Test
____________________________________________________________

BATOO | 0000005563 | 0000000283 | 0000005280 | Criteria Test
BATOO | 0000038794 | 0000002077 | 0000036716 | Find Test
BATOO | 0000005429 | 0000000404 | 0000005024 | Jpql Test
BATOO | 0000036877 | 0000001303 | 0000035574 | Persist Test
BATOO | 0000002760 | 0000000150 | 0000002610 | Remove Test
BATOO | 0000011874 | 0000000358 | 0000011515 | Update Test
_____________________________________________________________

HBRNT | 0000011677 | 0000004592 | 0000007085 | Criteria Test
HBRNT | 0000083768 | 0000037068 | 0000046699 | Find Test
HBRNT | 0000010603 | 0000003822 | 0000006781 | Jpql Test
HBRNT | 0000064042 | 0000016997 | 0000047045 | Persist Test
HBRNT | 0000005678 | 0000002210 | 0000003467 | Remove Test
HBRNT | 0000016982 | 0000005248 | 0000011734 | Update Test

=============================================================

So taking "persist", Batoo took a total of 36,877 milliseconds. Even if I take the times from Hibernate based on your unconfigured set up, Hibernate took a total of 64,042. Thats not even 2 times slower. Yet somehow you make the leap from those numbers to Batoo being 1397.62% faster?!

Again, given the "batchy" nature of the tests and given the "persist" numbers, I have a pretty strong suspicion that Hibernate had not been configured to leverage JDBC batching. And sure enough, it was not. So I enabled that using 2 specific settings ('hibernate.jdbc.batch_size' and 'hibernate.order_inserts') and adding another ('hibernate.id.new_generator_mappings') that can generally have a good performance impact though really its just good form and our documentation even states that it should be the proper set up for non-legacy apps. That very first change brought the Hibernate time from my clean check-out time of 64,042 down to 37,956 compared to the Batoo time of 36,877. Obviously, these numbers are very comparable. I stopped there, because to be quite honest given what I know of other providers (and I apologize, but I did not have time to look through Batoo code to verify this) Hibernate will always be at somewhat of a disadvantage because of its use of a "central persistence context" whereas other providers leverage bytecode-enhancement and "distributing" persistence context state into the enhanced entities. Each approach has pros/cons. Given that this is a use case almost talior made to illustrate Hibernate's short coming in terms of this central persistence context coupled with such a minor overall variance in the numbers, I decided not to go any further here. Far cry from 1397.62% or 15x.

Next I decided to invesitage the huge numbers that skewed the results. You know the ones where Hibernate took 15 (or more) times longer to complete one of the benchmark tests. Yet oddly enough I could not find even one test metric where Hibernate took 15 times longer or more compared to Batoo. Interesting. In fact I could not even find one metric where Hibernate took even 2 times as long. Again, interesting. So how exactly do you arrive at this 15x figure?

Honestly, at this point I stopped looking at the benchmark. The numbers you are claiming simply do not match up with your benchmark's results. And the one metric I did look into specifically, I essentially removed most of the variance between Hibernate and Batoo with 3 simply config values.

October 13, 2012 | Unregistered CommenterSteve Ebersole

Here you go...
I took the liberty to make the divisions for you Steve... To create the table below you may simply divide the Hibernate figures to Batoo figures. But I think it is a good idea to provide the divisions in the benchmark output too.

Comparison 209.90% 1622.61% 134.19% Criteria Test
Comparison 215.93% 1784.69% 127.19% Find Test
Comparison 195.30% 946.04% 134.97% Jpql Test
Comparison 173.66% 1304.45% 132.25% Persist Test
Comparison 205.72% 1473.33% 132.84% Remove Test
Comparison 143.02% 1465.92% 101.90% Update Test

- "Based on your configuration setup" - Fell free to configure the way you want.
- "given the "batchy" nature of the tests" - Fell free to modify the test the way you like, nevertheless I would not consider persisting 5 entity in a transaction a batch.
- "Given that this is a use case almost talior made to illustrate Hibernate's short coming" - again feel free to provide any benchmark you like. I would love to find out what aspects of Batoo JPA this case puts forward.
- "I decided not to go any further here. Far cry from 1397.62% or 15x." - Sorry I am only interested in the resources spent by Batoo JPA vs Hibernate. :)
- "So how exactly do you arrive at this 15x figure" - Please see the division table above. My frameworks is JPA, and so is Hibernate. So the context we are looking at the CPU times spent at JPA Layer. However if you are satisfied Hibernate brings the whole application down to half of its performance end-to-end, good for you. I wasn't, that was where Batoo JPA started.
- "I essentially removed most of the variance between Hibernate and Batoo with 3 simply config values" - Benchmark is a simple Person / Phone / Address / Country domain model. Batoo is so fast that I did not need to create any deviation from the JPA Spec except for DDL which has nothing to the with the runtime performance. As I mentioned in the article, the benchmark was not created to put forward where Batoo JPA is strong, but to aid the development of Batoo JPA in creating a "High Performance JPA Implementation".
- "Honestly, at this point I stopped looking at the benchmark" - So would I... :)

Again I challenge Hibernate! You may create your own benchmark and I am sure Batoo JPA will win by far in every way.

Regards,
Hasan Ceylan

October 13, 2012 | Unregistered CommenterHasan Ceylan

Wow, so wrong in so many ways. I am actually kind of speechless.

First, you cannot just look at "the resources spent by Batoo JPA vs Hibernate". Sorry, but the *only* number that would even arguably "count" here would be totals. Last time I checked, JPA is intended for dealing with back end storage. Saying that you effectively don't care about the time it takes to execute what Batoo/Hibernate deems is necessary SQL shows a real lack of understanding of performance testing, which is odd for a "performance consultant".

So, as I pointed out, the table was from my initial run immediately after check out. And as I also pointed out I only really looked specifically at one metric: "presist" and gave you those numbers. But even this shows your interesting take on math. On initial checkout, the Hibernate run took 64,042; Batoo took 36,877. *Again* thats not 1397.62%. (the number from your own "calculation"). Thats not even anywhere close enough to 1397.62% to be explained by simple mathmetical variance. In fact its less that even 2x (200%). And you can try to twist my words all you want into taking that as me being "satisfied Hibernate brings the whole application down to half of its performance end-to-end". If this actually showed a real performance problem, I would in fact not be satisfied. But it is in fact not a real performance problem as I even continued to show you with numbers.

Unfortunately I threw away the results from my earlier run that came up with the specific 37,956 number for Hibernate persist run (you know in the garbage where argument like this really belong). So I had to run it again, with still just those 3 additional Hibernate settings, which gave:


=============================================================
Prvdr | Total Time | JPA Time | DB Time | Name Of The Test
_____________________________________________________________

HBRNT | 0000038323 | 0000006913 | 0000031410 | Persist Test
...
BATOO | 0000037447 | 0000001223 | 0000036223 | Persist Test

so following your directions for calculations would give:

Comparison 102.34% 565.24% 86.71% Persist Test

"Total Time"-wise, Batoo and Hibernate were essentially even. Batoo was faster in terms of "JPA Time", while Hibernate did better in terms "DB Time". I think I may be starting to see why you like to only consider "JPA Time" :)

"I did not need to create any deviation from the JPA Spec". Lol, yeah its amazingly difficult to be able to write a library without annoyiong, pesky little concerns such as backwards compatibility and legacy APIs and such. Great job :) But all that said, by sticking to the letter of the JPA spec, you also miss lots of features that users unfortunately need. Good luck with that.

And all your smirky comments aside, I really do wish you luck. More competition can only be a good thing. However, this will be my last comment here. I actually have work to do, you know, developing the actual best persistence library out there. Toodles.

October 13, 2012 | Unregistered CommenterSteve Ebersole

> Wow, so wrong in so many ways. I am actually kind of speechless.

I guessed so. :)

> First, you cannot just look at "the resources spent by Batoo JPA vs Hibernate". Sorry, but the *only* number that would even arguably "count" here would be totals. Last time I checked, JPA is intended for dealing with back end storage. Saying that you effectively don't care about the time it takes to execute what Batoo/Hibernate deems is necessary SQL shows a real lack of understanding of performance testing, which is odd for a "performance consultant".

First of all, I feel the frustration you have.

If you go back and read my article from start that it specifically states that the faster times are at JPA Layer. I will paste the related part here:

"Interpretation of Results
We do appreciate that JPA is not the single part of an application. But we do believe that the current JPA implementation consume quite a bit of your server budget. While a typical application cluster spends CPU resources for persistence layer about %20 to %40, Batoo JPA will well be able to bring your cluster down to half of its size allowing you save a lot on licensing administration and hardware, as well as room to scale up even for non-cluster friendly applications - in my experience I saw applications running on 96 core Solaris systems simply because they are not scalable."

> So, as I pointed out, the table was from my initial run immediately after check out. And as I also pointed out I only really looked specifically at one metric: "presist" and gave you those numbers. But even this shows your interesting take on math. On initial checkout, the Hibernate run took 64,042; Batoo took 36,877. *Again* thats not 1397.62%. (the number from your own "calculation"). Thats not even anywhere close enough to 1397.62% to be explained by simple mathmetical variance. In fact its less that even 2x (200%). And you can try to twist my words all you want into taking that as me being "satisfied Hibernate brings the whole application down to half of its performance end-to-end". If this actually showed a real performance problem, I would in fact not be satisfied. But it is in fact not a real performance problem as I even continued to show you with numbers.

It is very much interesting that you only looked at 'persist', where the overall gain is the least. If It is interesting for a person that in the first paragraph advocates how important the overall performance is and then only looks at a specific part which happens to be the part where Hibernate is 'the least sluggish'.

> Unfortunately I threw away the results from my earlier run that came up with the specific 37,956 number for Hibernate persist run (you know in the garbage where argument like this really belong). So I had to run it again, with still just those 3 additional Hibernate settings, which gave:

It is too bad, that while Batoo JPA has all the data / code / publicly available, Hibernate team makes tests, publishes the results publicly and then "throws away" the parameters that created the results. If you happen to relocate them, do share the tests, because - in contrast to you - I actually would love to see the aspects where Batoo JPA cannot beat Hibernate and learn where it is slow and go and fix it.


>=============================================================
>Prvdr | Total Time | JPA Time | DB Time | Name Of The Test
>_____________________________________________________________

>HBRNT | 0000038323 | 0000006913 | 0000031410 | Persist Test
...
>BATOO | 0000037447 | 0000001223 | 0000036223 | Persist Test

I find it amusing that you 'keep throwing away' stuff yet you keep the original results that proved how Hibernate was slow and even didn't hesitate to publish them here. Since you do not publish the 'settings' you applied to Hibernate, I am also curious about the other tests but obviously they become even slower so that you could put forward only 'persist'. So wise decision to cut out the parts that you do not like.


> so following your directions for calculations would give:

> Comparison 102.34% 565.24% 86.71% Persist Test

I am thankful to you for pointing out that if conditions are right, batching persist calls into batch insert statements , I can gain a lot, so Hibernate's last standing point would be invalidated. Having read this part now I understand why you threw away all the results but kept 'persist' test results. Glad that there still is room for improvement.

> "Total Time"-wise, Batoo and Hibernate were essentially even. Batoo was faster in terms of "JPA Time", while Hibernate did better in terms "DB Time". I think I may be starting to see why you like to only consider "JPA Time" :)

Yes you are right! :) I am on the other hand seeing why you publish only some numbers without the parameters and only for 'persist'.

> "I did not need to create any deviation from the JPA Spec". Lol, yeah its amazingly difficult to be able to write a library without annoyiong, pesky little concerns such as backwards compatibility and legacy APIs and such. Great job :) But all that said, by sticking to the letter of the JPA spec, you also miss lots of features that users unfortunately need. Good luck with that.

Thanks, I consider myself lucky with the path I took, cause it paid off. Let me also iterate that Batoo JPA is fast enough that it does not need tweaks by the Lead Developer (which apparently helped improving the persist operations but worsened the other operations), nor it needs to find ways to lock the customer in - I have confidence in my product.


> And all your smirky comments aside, I really do wish you luck. More competition can only be a good thing. However, this will be my last comment here. I actually have work to do, you know, developing the actual best persistence library out there. Toodles.

Thank you for the wishes. It is great that what you earlier though was a 'garbage' is now a competition. Good progress in about minutes, I cannot image what you will think about Batoo JPA in a week or so with the pace given.

I think a project is successful only with its community and users. I did say that I learnt a lot from Hibernate, and I appreciate that The JPA Standard and its market is created mostly - if not all - by the Hibernate Team. Yes, it is also a good idea to go back to work since, while you already had years old bugs sitting waiting to be taken care off, now you face another threat.

As for the 'actual best persistence library out there', let's give it some time and let the community decide...!

Regards,
Hasan Ceylan

October 13, 2012 | Unregistered CommenterHasan Ceylan

Hi again Hasan,
in order to be transparent, I must say I'm a hibernate guy, involved in the project from 2002 to 2005. Author of 2 books related to Hibernate first, then JPA.
As I said, respect for your work, creating a fresh new JPA implementation is something hard. Your different approach is interesting and I was really enthusiast about digging into your code *after* double checking your benchmark results.
To be honest, I knew from the beginning that something was weird in those results. Performance AND features are the 2 critical points for the hibernate community.

I started looking at the benchmark code and rewrote the following query
"select a from Person p inner join p.addresses a left join fetch a.country left join fetch a.person where p = :person"
to
"select a from Address a left join fetch a.country left join fetch a.person p where p = :person"
That shouldn't change much the results but it's a more natural way to write the query.
Guess what? Batoo fails in running such a basic query.
If one tool does less, there are chances it goes faster.

So we have a first problem here. Batoo is not handling correctly some (many? who knows) kinds of query.
Would that be sufficient to assert that Hibernate is 100x more JPA compliant than Batoo?
Still on the features topic, you mainly target the JPA specification. Developers used to JPA know the spec is a very good spec but you can't live only with it. You need extra features quite soon. Hibernate proposes tons of useful features that are not in the spec. For example, it allowed to quickly tune the configuration to get better performance results on your specific benchmark.
What are you going to say to users who need to map an existing exotic schema?

Next, when doing performance tuning, you start from the most consuming tier. So I set up a real local network with good bandwidth and ping. Now I have a representative setup. And it looks like most of the time is, as I expected, spent on network.
The second tier would be the database engine itself, its indexes traversal, ... something that cannot be tested using your benchmark. The schema s too simple, the row number too low.

Anyway my new setup still shows that Batoo is ways faster and ... that the JDBC? (DB time) part is faster when using Batoo. This is annoying, how could that be? What's your secret ? Hibernate rarely hits the database for nothing and since we are supposed to be running on the same schema ... we have a problem here.
A pure JDBC implementation of your benchmark would be very interesting, because believe it or not, you CAN'T be 10x faster than plain JDBC. And the fact is that a regular hibernate user should not observe more than 5% overhead compared to plain JDBC.

Actually the truth is elsewhere, certainly because your implementation is based on bytecode instrumentation, the time spent is not caught by your timers...

What if we simply add a timer in the root method (doTest(Type type)) ? Would you think that is a good idea? Here are the results:
BENCHMARK_LENGTH = 100;
Remote MySql instance
ping 0.5ms
BATOOTotal :1056864
ELINKTotal :1110915
HBRNTTotal :1032627
Surprise, Hibernate is even faster...

Last but not least, I could easily enable second level caching on Hibernate (which you can't do for now), rerun the benchmark and claim that hibernate is 1500 times faster than batoo, I won't do it.

October 15, 2012 | Unregistered CommenterAnthony Patricio

Dear Anthony,

First of all, I am very pleased to see such a comprehensive and constructive comment after the reaction by the current Hibernate Lead Developer's rather insulting, nonobjective response, calling on my career and calling Batoo JPA 'garbage'. I think what makes open source it is, is that the constructive people, like yourself. Obviously you took some time off to really study what is out there, so thank you.

Based on my understanding of the spec the fetch joins are only for eagerly fetching the relations. The query you rewrote is actually (again according to my understanding) according to the spec therefore is illegal. So Batoo JPA in that sense does what it is supposed to do. If you also look at the Criteria API, the Fetch interface does not extend Expression therefore you cannot create restriction based on that. if you rewrite the query as below, you'll see that it works and handling is still as fast.
"select a from Address a left join fetch a.country left join fetch a.person p where p = :person" ->
"select a from Address a left join fetch a.country left join a.person p where p = :person"

Although in that instance, Batoo JPA does not have a bug, I am not saying that it is bug-free, if you browse the website, you'll see the word 'incubation' all around, having said that, there is currently 230 unit tests in the project covering the main bullet points of the spec. I know unit tests are not the full reflection of the real world, but still useful to build trust on a project. Even Hibernate still has such simple bugs: see http://stackoverflow.com/questions/12795407/jpa-how-to-select-objects-wich-has-no-multiple-attributes/12813716#12813716 .

On the cache issue, if you look at the source repo, the cache implementation is actually under way and almost complete, what is left is invalidation of the query name spaces, which is not a complex task, yet I was so much caught up with other tasks recently. When the cache is done, Batoo's speed will be more meaningful. Because without the cache there's a great deal of db operation that takes up big weight in the total times. However when this is eliminated, Batoo and the others will be alone (taking up way more weight in total cpu times that will boost the ration on totals) in the comparison, so actually I am looking for that one as well. :)

As a separate topic, L2 cache while on benchmarks create miraculous results, on fairly write centric applications (non-website apps, more like business apps) with distributed caching (this is important for my assertion), creates the overhead of invalidation, serialization across the network too. Unless you have a very intelligent (which I believe none of the implementations currently have) cache algorithms, on large databases with more flat data distribution the life of a node is very limited. AFAIK, take my benchmark model, if a customer gets updated all the while query cache for the queries which have Customer in it gets invalidated. This creates a lot of overhead to traverse the cache and invalidate stuff then a great deal of serialization afterwards, do you agree?

While L2 has this nature, most of the modern databases actually do have table / index / query caches with intelligent algorithms that invalidate only the *really affected* parts on a write operation. I also have some ideas on L2, I'll give a hint, combining a Parent-Child into a single node if Child is removeOrphans type, don't you think this is great as say a customer has 3 phones, that alone will fetch the record in a single round trip to the cache, rather then 4 in current Hibernate for now.

I appreciate what you think of applications running across networks. However, while the benchmark cannot handle that, while the statement is waiting for a response from the server, the worker thread is actually sleeping, context is switched and the execution is given to the next runnable thread. Batoo JPA's speed is not about response time but about CPU consumption, therefore based on your Networked Mysql Example, in a multi threaded environment, I still believe Batoo will be handling way more requests then any other implementation. Just for this very reason, the benchmark was created using an in memory database to have the least possible Wait-On-IO. For that you could check out the HelloJSF example at https://github.com/BatooOrg/HelloJSF . It contains a JMeter test that as you run it will show you by simply switching to Batoo JPA the whole application performs twice as fast as JMeter test runs multithreaded. I think that counts a lot.

On top of that although certainly not all the applications, some applications share the database with a lot of external applications and services. That's why you almost always have a lot of redundant capacity at the DB Layer. While this is the case most applications' bottleneck is the J2EE Layer. If you assume that JPA takes %50 of the J2EE layer, that means by using Batoo JPA, you will double the capacity or cut the hardware by half. Again great saving!

You asked why Batoo JPA is that much fast and referred to byte-code instrumentation. Actually while that's one of the many things, it accounts for a little portion of the whole. To prove that take persist, where there's no byte-code instrumentation as the entities are external and the their references must be preserved. The way I archived that much speed is maybe material for a different article, and I plan to write that one soon. So every single line of Batoo JPA is written with performance being the priority.

As for the features, as I mentioned to Steve, I have very much respect in Hibernate, what made me fall in love with J2EE 10 years ago. I appreciate that it has a lot of things that doesn't exist in JPA Spec. But take batch processing, I have implemented this a day ago, and that is totally transparent to the developer / administrator. It intelligently gathers statement up and batches them into a single SQL. As for the size of the batch 10 or 50 does not matter too much. So I wouldn't like to bother and confuse the developer with that cause once it is 10 if say the gain is %90 then making it 20 will bring it to %95 and to 50 should make something like %98. But you cannot go higher then, cause if you have that large batch then you really should think using plain JDBC. Numbers are made up and the point here is not stick to JPA, but hide the complexity into the framework and do not expose it unless it really counts for the developer / administrator.

Having said that stuff like Batch Processing, Filters, Envers, @Index etc. are great additions to the JPA spec. I already implemented @Index annotation and Batch processing into Batoo JPA, @Index being propriety API while BP is buried in. Most of the things are ready for Envers-equivalent, it is a matter of time to implement that. Filters are planned.

Although I haven't checked that with lawyers, but if legally OK and I get blessing from the JBoss executives (not the current Hibernate Lead, certainly mature and fair people), I plan on masquerading HibernateSession to enable drop-in replacement for applications that are built on HibernateSession rather than JPA.

Once again, I really appreciate the time you took to inspect and write that response.

Regards,
Hasan Ceylan

October 15, 2012 | Unregistered CommenterHasan Ceylan

Hassan,
true about second level cache, it's a false friend but many people will use it just to boost some benchmark results. Just like using in-memory database (yes I insist).

Please don't be surprised people might understand your title as offending. Reading it, naive people could think response times will be 15 times faster using your implementation, implicitly saying that hibernate is badly written. This is totally wrong (see more realistic setup in my previous comment).
For now, you have zero impact on real life setup. Up to you to set up a more elaborate domain model, db size, ... to prove the opposite. I mean it, take a quite regular hardware, run the test, make the response time really different OR the JVM crashes because of CPU load or whatever.
You are trying to make some buzz by attacking hibernate, great, you look smart and I guess that was not the better option.

You should review your argumentation and detail that you *might* have better performance in terms of CPU load (not exactly the same as response time in a full real life architecture). I'm using *might*, because unless you have more or less the same set of features, you can't compare your implementation to Hibernate, true?

People should also understand that during race condition, the db pool, the db itself, the network, the web server will certainly fall BEFORE the AS crashes because of the CPU load.

Last but not least, I was going to test your HelloJSF sample when I noticed that _again_ your were hitting the in memory H2 ExampleDS where it is documented that it is not for production use.

I'll check your project again in some weeks but be careful when saying your young implementation is 15 times faster than 2 projects (Hibernate and elink) built by dozens of experimented developers during 10+ years.

Regards and good luck
Anthony

October 15, 2012 | Unregistered CommenterAnthony Patricio

Dear Anthony,

Thank you for the response.

Allow me to re-iterate that :
- Word "response" never mentioned in the article, let alone "Response Time".
- Emphasize was clearly on CPU resources usage and there is clear indication that the speed is at the JPA Layer.
- There is a complete paragraph that states that the overall gain will be only on the JPA layer and a little on the DB Layer. The paragraph also declares that "by using Batoo JPA, bring your cluster down to half of its size".
- Never insulted other implementations, nor said they were badly written.

You have a point in saying that Hibernate and EclipseLink do have many additional features and again I never asserted that Batoo JPA has all the bells and whistles the other two has. But again for those that are content with what JPA provides, Batoo JPA is the product I am offering. But IMHO you cannot make an argument that just because others have additional features on top of JPA, does not necessarily mean that there should be a huge performance penalty (i.e. if Batoo JPA did not have JPQL and Criteria API support it wouldn't perform better on CRUD operations). Nevertheless, again you can make a distinction there, if someone is in need of versioned data, obviously Batoo JPA is out of the choices. While at it, let me also say that, Batoo JPA will not compromise on performance to add fantastic features - bold features will only be added if they are doable if they are doable with a small performance compromise.

I agree on Batoo JPA cannot serve ALL the applications currently running on top of Hibernate, however those that are OK with the features provided by JPA Spec, will have a huge benefit using Batoo JPA.

Bottom line is, I still believe I am announcing what I have.

Hope this clarifies everything. And once again thank you for taking time to respond.

Regards,
Hasan Ceylan

October 15, 2012 | Unregistered CommenterHasan Ceylan

If you really want the fastest JPA, you need to reconsider your database.
The problems are the database implementation, not in a hibernate driver.

http://community.versant.com/JPA.aspx

Cheers,
-Robert

October 17, 2012 | Unregistered CommenterRobert Greene

Hassan,
it's hibernate.order_inserts not hibernate.order_updates as you pushed in git.

Btw, I'm using a real QA environment to do some measures on a real system, more to come soon.

Regards,
Anthony

October 17, 2012 | Unregistered CommenterAnthony Patricio

Anthony,

Frankly I do not get your point on using in-memory databases. As long as you are not after the volume but the performance ratios, using an in-memory database has nothing to do with a performance benchmark.

You have told that you did not test HelloJSF. I do not understand your decision on " I noticed that _again_ your were hitting the in memory H2 ExampleDS where it is documented that it is not for production use" and quit there rather. It is sfairly easy to point Example DS to a MySQL Database and this I cannot provide in the project since JTA is employed.

Nevertheless, a new Benchmark which has Spring REST MVC, Spring Data, JPA & MySQL JDBC layers has been committed to GitHub repository to demonstrate a different scenario.

Test scenario basically assumes,
1) User browses the persons
2) User creates a person
3) User browses the persons
4) User create a person
5) User updates the person
6) Uer browses the persons
7) User deletes a person

In this scenario, while with Batoo JPA, the test scores 2681 Req/s, when replaced with Hibernate the throughput drops to 1742 Req/s.

Feel free to modify the test .

Regards,
Hasan Ceylan

October 22, 2012 | Unregistered CommenterHasan Ceylan

Greetings,

I'm no JPA implementation coders so I'm not going to comment on the technical aspects. But from a general developer aspect I have observed that in order for a project to be successful it has to fill a niche or need (such as Hibernate and other ORM tools originally did). Then it has to spread and develop a community and more features. Finally it has to be well-know enough to gather commercial support. All those take time and it doesn't mean anyone shouldn't try - open source is evolution in action, either the dinosaur is killed by the ferret or then the dinosaur ignores the ferret and the world moves on ;-)

When I see a claim like "15x faster" I automatically think "wall clock time" and when a claim is too good to be true (e.g. someone claims to have achieved cold fusion in their basement) it usually is. With so many OS-project available nowadays, I understand that you might have to make extravagant claims to be noticed, but they might also come back and bite you. So I might recommend a bit more of a humble approach for the author and wish him good luck. I'd love to be proven wrong and if you can claim to be a "household name in JPA/persistence", say, 23.10.2014, feel free to quote my post and I'll step forward and admit to being wrong.

October 22, 2012 | Unregistered CommenterNicklas Karlsson

Dear Nicklas,

Even though you are not directly related, I appreciate you stopping by and providing your insights.

A typical application consists of several layers, presentation layer, business layer, persistence layer and finally the database.
The performance of the application highly relates to the weaker link in the chain. Thus, I thought the persistence layer could indeed be faster then what's out there, Batoo JPA was created.

Take MySQL for example... The RDBMS were invented long before it. So it never filled a niche. But it was high performance, open source and free to use. With its performance and easy use, it became to top database in such a rather short time. Now you can do a search on web to see that how MySQL outperforms several databases. But the benchmark results always constrained to the database layer. Yet unlike given today's car engines' technology, in computing when you have a better technology in each stack, it is very well possible to archive several multiplications of your existing performance.

To give you an example, the last project I worked on took around 2 mins to deploy on Weblogic, whereas it took less then 20 secs to deploy on JBoss 7.1. Certainly JBoss did not invent the cold fusion during 7.x rewrite but with careful analysis and implementation yielded that performance. That is what we did when creating Batoo JPA. Not that the are but we will look into other heavy technologies like JSF and Expression Evaluation, and if there is room for considerable improvement there too, in the mid term you will hopefully see way better overall performance gains.

Finally, not to prove you are wrong, but to let you know that we made it, I am noting the date to get back. :)

Regards,
Hasan Ceylan

October 23, 2012 | Unregistered CommenterHasan Ceylan

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>