Databricks, Spark and BDAS

                              Discussion of BDAS (Berkeley Data Analytics Systems), especially Spark and related projects, and also of Databricks, the company commercializing Spark.

                              August 17, 2017

                              More notes on the transition to the cloud

                              Last year I posted observations about the transition to the cloud. Here are some further thoughts.

                              0. In case any doubt remained, the big questions about transitioning to the cloud are “When?” and “How?”. “Whether”, by way of contrast, is pretty much settled.

                              1. The answer to “When?” is generally “Over many years”. In particular, at most enterprises the cloud transition will span multiple CIO’s tenure in their positions.

                              Few enterprises will ever execute on simple, consistent, unchanging “cloud strategies”.

                              2. The SaaS (Software as a Service) vs. on-premises tradeoffs are being reargued, except that proponents now spell SaaS C-L-O-U-D. (Ali Ghodsi of Databricks made a particularly energetic version of that case in a recent meeting.)

                              3. In most countries (at least in the US and the rest of the West), the cloud vendors deemed to matter are Amazon, followed by Microsoft, followed by Google. And so, when it comes to the public cloud, Microsoft is much, much more enterprise-savvy than its key competitors.

                              Read more

                              August 10, 2017

                              Notes on data security

                              1. In June I wrote about burgeoning interest in data security. I’d now like to add:

                              We can reconcile these anecdata pretty well if we postulate that:

                              2. My current impressions of the legal privacy vs. surveillance tradeoffs are basically: Read more

                              June 30, 2017

                              Analytics on the edge?

                              There’s a theory going around to the effect that:

                              There’s enough truth to all that to make it worth discussing. But the strong forms of the claims seem overblown.

                              1. This story doesn’t even make sense except for certain new classes of application. Traditional business applications run all over the world, in dedicated or SaaSy modes as the case may be. E-commerce is huge. So is content delivery. Architectures for all those things will continue to evolve, but what we have now basically works.

                              2. When it comes to real-world appliances, this story is partially accurate. An automobile is a rolling network of custom Linux systems, each running hand-crafted real-time apps, a few of which also have minor requirements for remote connectivity. That’s OK as far as it goes, but there could be better support for real-time operational analytics. If something as flexible as Spark were capable of unattended operation, I think many engineers of real-world appliances would find great ways to use it.

                              3. There’s a case to be made for something better yet. I think the argument is premature, but it’s worth at least a little consideration.? Read more

                              June 16, 2017

                              Generally available Kudu

                              I talked with Cloudera about Kudu in early May. Besides giving me a lot of information about Kudu, Cloudera also helped confirm some trends I’m seeing elsewhere, including:

                              Now let’s talk about Kudu itself. As I discussed at length in September 2015, Kudu is:

                              Kudu’s adoption and roll-out story starts: Read more

                              June 14, 2017

                              Cloudera Altus

                              I talked with Cloudera before the recent release of Altus. In simplest terms, Cloudera’s cloud strategy aspires to:

                              In other words, Cloudera is porting its software to an important new platform.* And this port isn’t complete yet, in that Altus is geared only for certain workloads. Specifically, Altus is focused on “data pipelines”, aka data transformation, aka “data processing”, aka new-age ETL (Extract/Transform/Load). (Other kinds of workload are on the roadmap, including several different styles of Impala use.) So what about that is particularly interesting? Well, let’s drill down.

                              *Or, if you prefer, improving on early versions of the port.

                              Read more

                              April 13, 2017

                              Analyzing the right data

                              0. A huge fraction of what’s important in analytics amounts to making sure that you are analyzing the right data. To a large extent, “the right data” means “the right subset of your data”.

                              1. In line with that theme:

                              2. Business intelligence interfaces today don’t look that different from what we had in the 1980s or 1990s. The biggest visible* changes, in my opinion, have been in the realm of better drilldown, ala QlikView and then Tableau. Drilldown, of course, is the main UI for business analysts and end users to subset data themselves.

                              *I used the word “visible” on purpose. The advances at the back end have been enormous, and much of that redounds to the benefit of BI.

                              3. I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the template:

                              That continues to be tough work. Attempts to productize shortcuts have not caught fire.

                              Read more

                              March 12, 2017

                              Introduction to SequoiaDB and SequoiaCM

                              For starters, let me say:


                              Unfortunately, SequoiaDB has not captured a lot of detailed information about unpaid open source production usage.

                              Read more

                              December 18, 2016

                              Introduction to Crate.io and CrateDB

                              Crate.io and CrateDB basics include:

                              In essence, CrateDB is an open source and less mature alternative to MemSQL. The opportunity for MemSQL and CrateDB alike exists in part because analytic RDBMS vendors didn’t close it off.

                              CrateDB’s not-just-relational story starts:

                              Read more

                              November 23, 2016

                              DBAs of the future

                              After a July visit to DataStax, I wrote

                              The idea that NoSQL does away with DBAs (DataBase Administrators) is common. It also turns out to be wrong. DBAs basically do two things.

                              • Handle the database design part of application development. In NoSQL environments, this part of the job is indeed largely refactored away. More precisely, it is integrated into the general app developer/architect role.
                              • Manage production databases. This part of the DBA job is, if anything, a bigger deal in the NoSQL world than in more mature and automated relational environments. It’s likely to be called part of “devops” rather than “DBA”, but by whatever name it’s very much a thing.

                              That turns out to understate the core point, which is that DBAs still matter in non-RDBMS environments. Specifically, it’s too narrow in two ways.

                              My wake-up call for that latter bit was a recent MongoDB 3.4 briefing. MongoDB certainly has various efforts in administrative tools, which I won’t recapitulate here. But to my surprise, MongoDB also found a role for something resembling relational database design. The idea is simple: A database administrator defines a view against a MongoDB database, where views: Read more

                              October 21, 2016

                              Rapid analytics

                              “Real-time” technology excites people, and has for decades. Yet the actual, useful technology to meet “real-time” requirements remains immature, especially in cases which call for rapid human decision-making. Here are some notes on that conundrum.

                              1. I recently posted that “real-time” is getting real. But there are multiple technology challenges involved, including:

                              2. In early 2011, I coined the phrase investigative analytics, about which I said three main things: Read more

                              Next Page →

                              Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


                              Search our blogs and white papers

                              Monash Research blogs

                              User consulting

                              Building a short list? Refining your strategic plan? We can help.

                              Vendor advisory

                              We tell vendors what's happening -- and, more important, what they should do about it.

                              Monash Research highlights

                              Learn about white papers, webcasts, and blog highlights, by RSS or email.

                                                          search for

                                                          Super League

                                                          Second-hand housing


                                                          search for


                                                          search for