Petabyte-scale data management

                              Posts about database management for databases with petabytes of user data.

                              April 17, 2017


                              Interana has an interesting story, in technology and business model alike. For starters:

                              And to be clear — if we leave aside any questions of marketing-name sizzle, this really is business intelligence. The closest Interana comes to helping with predictive modeling is giving its ad-hoc users inspiration as to where they should focus their modeling attention.

                              Interana also has an interesting twist in its business model, which I hope can be used successfully by other enterprise software startups as well. Read more

                              November 19, 2015

                              CDH 5.5

                              I talked with Cloudera shortly ahead of today’s announcement of Cloudera 5.5. Much of what we talked about had something or other to do with SQL data management. Highlights include:

                              While I had Cloudera on the phone, I asked a few questions about Impala adoption, specifically focused on concurrency. There was mention of: Read more

                              September 17, 2015

                              Rocana’s world

                              For starters:

                              Rocana portrays itself as offering next-generation IT operations monitoring software. As you might expect, this has two main use cases:

                              Rocana’s differentiation claims boil down to fast and accurate anomaly detection on large amounts of log data, including but not limited to:

                              Read more

                              September 14, 2015

                              DataStax and Cassandra update

                              MongoDB isn’t the only company I reached out to recently for an update. Another is DataStax. I chatted mainly with Patrick McFadin, somebody with whom I’ve had strong consulting relationships at a user and vendor both. But Rachel Pedreschi contributed the marvelous phrase “twinkling dashboard”.

                              It seems fair to say that in most cases:

                              Those generalities, in my opinion, make good technical sense. Even so, there are some edge cases or counterexamples, such as:

                              *And so a gas company is doing lightweight analysis on boiler temperatures, which it regards as hot data. ??

                              While most of the specifics are different, I’d say similar things about MongoDB, Cassandra, or any other NoSQL DBMS that comes to mind: Read more

                              June 8, 2015

                              Teradata will support Presto

                              At the highest level:

                              Now let’s make that all a little more precise.

                              Regarding Presto (and I got most of this from Teradata)::

                              Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project:? Read more

                              March 17, 2015

                              More notes on HBase

                              1. Continuing from last week’s HBase post, the Cloudera folks were fairly proud of HBase’s features for performance and scalability. Indeed, they suggested that use cases which were a good technical match for HBase were those that required fast random reads and writes with high concurrency and strict consistency. Some of the HBase architecture for query performance seems to be:

                              Notwithstanding that a couple of those features sound like they might help with analytic queries, the base expectation is that you’ll periodically massage your HBase data into a more analytically-oriented form. For example — I was talking with Cloudera after all — you could put it into Parquet.

                              2. The discussion of which kinds of data are originally put into HBase was a bit confusing.

                              OpenTSDB, by the way, likes to store detailed data and aggregates side-by-side, which resembles a pattern I discussed in my recent BI for NoSQL post.

                              3. HBase supports caching, tiered storage, and so on. Cloudera is pretty sure that it is publicly known (I presume from blog posts or conference talks) that:? Read more

                              February 28, 2015

                              Databricks and Spark update

                              I chatted last night with Ion Stoica, CEO of my client Databricks, for an update both on his company and Spark. Databricks’ actual business is Databricks Cloud, about which I can say:

                              I do not expect all of the above to remain true as Databricks Cloud matures.

                              Ion also said that Databricks is over 50 people, and has moved its office from Berkeley to San Francisco. He also offered some Spark numbers, such as: Read more

                              January 19, 2015

                              Where the innovation is

                              I hoped to write a reasonable overview of current- to medium-term future IT innovation. Yeah, right. ?? But if we abandon any hope that this post could be comprehensive, I can at least say:

                              1. Back in 2011, I ranted against the term Big Data, but expressed more fondness for the V words — Volume, Velocity, Variety and Variability. That said, when it comes to data management and movement, solutions to the V problems have generally been sketched out.

                              2. Even so, there’s much room for innovation around data movement and management. I’d start with:

                              3. As I suggested last year, data transformation is an important area for innovation.? Read more

                              April 17, 2014

                              MongoDB is growing up

                              I caught up with my clients at MongoDB to discuss the recent MongoDB 2.6, along with some new statements of direction. The biggest takeaway is that the MongoDB product, along with the associated MMS (MongoDB Management Service), is growing up. Aspects include:

                              Read more

                              December 8, 2013

                              DataStax/Cassandra update

                              Cassandra’s reputation in many quarters is:

                              This has led competitors to use, and get away with, sales claims along the lines of “Well, if you really need geo-distribution and can’t wait for us to catch up — which we soon will! — you should use Cassandra. But otherwise, there are better choices.”

                              My friends at DataStax, naturally, don’t think that’s quite fair. And so I invited them — specifically Billy Bosworth and Patrick McFadin — to educate me. Here are some highlights of that exercise.

                              DataStax and Cassandra have some very impressive accounts, which don’t necessarily revolve around geo-distribution. Netflix, probably the flagship Cassandra user — since Cassandra inventor Facebook adopted HBase instead — actually hasn’t been using the geo-distribution feature. Confidential accounts include:

                              DataStax and Cassandra won’t necessarily win customer-brag wars versus MongoDB, Couchbase, or even HBase, but at least they’re strongly in the competition.

                              DataStax claims that simplicity is now a strength. There are two main parts to that surprising assertion. Read more

                              Next Page →

                              Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


                              Search our blogs and white papers

                              Monash Research blogs

                              User consulting

                              Building a short list? Refining your strategic plan? We can help.

                              Vendor advisory

                              We tell vendors what's happening -- and, more important, what they should do about it.

                              Monash Research highlights

                              Learn about white papers, webcasts, and blog highlights, by RSS or email.

                                                          Super League

                                                          Premier League

                                                          Premier League

                                                          Real estate