Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern?
Star-Schema Design
data-warehousedatabasedesign-patternsdimensional-modelingstar-schema
Related Solutions
The blog post you quoted overstates its claim a bit. FP doesn't eliminate the need for design patterns. The term "design patterns" just isn't widely used to describe the same thing in FP languages. But they exist. Functional languages have plenty of best practice rules of the form "when you encounter problem X, use code that looks like Y", which is basically what a design pattern is.
However, it's correct that most OOP-specific design patterns are pretty much irrelevant in functional languages.
I don't think it should be particularly controversial to say that design patterns in general only exist to patch up shortcomings in the language. And if another language can solve the same problem trivially, that other language won't have need of a design pattern for it. Users of that language may not even be aware that the problem exists, because, well, it's not a problem in that language.
Here is what the Gang of Four has to say about this issue:
The choice of programming language is important because it influences one's point of view. Our patterns assume Smalltalk/C++-level language features, and that choice determines what can and cannot be implemented easily. If we assumed procedural languages, we might have included design patterns called "Inheritance", "Encapsulation," and "Polymorphism". Similarly, some of our patterns are supported directly by the less common object-oriented languages. CLOS has multi-methods, for example, which lessen the need for a pattern such as Visitor. In fact, there are enough differences between Smalltalk and C++ to mean that some patterns can be expressed more easily in one language than the other. (See Iterator for example.)
(The above is a quote from the Introduction to the Design Patterns book, page 4, paragraph 3)
The main features of functional programming include functions as first-class values, currying, immutable values, etc. It doesn't seem obvious to me that OO design patterns are approximating any of those features.
What is the command pattern, if not an approximation of first-class functions? :) In an FP language, you'd simply pass a function as the argument to another function. In an OOP language, you have to wrap up the function in a class, which you can instantiate and then pass that object to the other function. The effect is the same, but in OOP it's called a design pattern, and it takes a whole lot more code. And what is the abstract factory pattern, if not currying? Pass parameters to a function a bit at a time, to configure what kind of value it spits out when you finally call it.
So yes, several GoF design patterns are rendered redundant in FP languages, because more powerful and easier to use alternatives exist.
But of course there are still design patterns which are not solved by FP languages. What is the FP equivalent of a singleton? (Disregarding for a moment that singletons are generally a terrible pattern to use.)
And it works both ways too. As I said, FP has its design patterns too; people just don't usually think of them as such.
But you may have run across monads. What are they, if not a design pattern for "dealing with global state"? That's a problem that's so simple in OOP languages that no equivalent design pattern exists there.
We don't need a design pattern for "increment a static variable", or "read from that socket", because it's just what you do.
Saying a monad is a design pattern is as absurd as saying the Integers with their usual operations and zero element is a design pattern. No, a monad is a mathematical pattern, not a design pattern.
In (pure) functional languages, side effects and mutable state are impossible, unless you work around it with the monad "design pattern", or any of the other methods for allowing the same thing.
Additionally, in functional languages which support OOP (such as F# and OCaml), it seems obvious to me that programmers using these languages would use the same design patterns found available to every other OOP language. In fact, right now I use F# and OCaml everyday, and there are no striking differences between the patterns I use in these languages vs the patterns I use when I write in Java.
Perhaps because you're still thinking imperatively? A lot of people, after dealing with imperative languages all their lives, have a hard time giving up on that habit when they try a functional language. (I've seen some pretty funny attempts at F#, where literally every function was just a string of 'let' statements, basically as if you'd taken a C program, and replaced all semicolons with 'let'. :))
But another possibility might be that you just haven't realized that you're solving problems trivially which would require design patterns in an OOP language.
When you use currying, or pass a function as an argument to another, stop and think about how you'd do that in an OOP language.
Is there any truth to the claim that functional programming eliminates the need for OOP design patterns?
Yep. :) When you work in a FP language, you no longer need the OOP-specific design patterns. But you still need some general design patterns, like MVC or other non-OOP specific stuff, and you need a couple of new FP-specific "design patterns" instead. All languages have their shortcomings, and design patterns are usually how we work around them.
Anyway, you may find it interesting to try your hand at "cleaner" FP languages, like ML (my personal favorite, at least for learning purposes), or Haskell, where you don't have the OOP crutch to fall back on when you're faced with something new.
As expected, a few people objected to my definition of design patterns as "patching up shortcomings in a language", so here's my justification:
As already said, most design patterns are specific to one programming paradigm, or sometimes even one specific language. Often, they solve problems that only exist in that paradigm (see monads for FP, or abstract factories for OOP).
Why doesn't the abstract factory pattern exist in FP? Because the problem it tries to solve does not exist there.
So, if a problem exists in OOP languages, which does not exist in FP languages, then clearly that is a shortcoming of OOP languages. The problem can be solved, but your language does not do so, but requires a bunch of boilerplate code from you to work around it. Ideally, we'd like our programming language to magically make all problems go away. Any problem that is still there is in principle a shortcoming of the language. ;)
Use
__new__
when you need to control the creation of a new instance.
Use
__init__
when you need to control initialization of a new instance.
__new__
is the first step of instance creation. It's called first, and is responsible for returning a new instance of your class.
In contrast,
__init__
doesn't return anything; it's only responsible for initializing the instance after it's been created.In general, you shouldn't need to override
__new__
unless you're subclassing an immutable type like str, int, unicode or tuple.
From April 2008 post: When to use __new__
vs. __init__
? on mail.python.org.
You should consider that what you are trying to do is usually done with a Factory and that's the best way to do it. Using __new__
is not a good clean solution so please consider the usage of a factory. Here's a good example: ActiveState Fᴀᴄᴛᴏʀʏ ᴘᴀᴛᴛᴇʀɴ Recipe.
Best Solution
Using star schemas for a data warehouse system gets you several benefits and in most cases it is appropriate to use them for the top layer. You may also have an operational data store (ODS) - a normalised structure that holds 'current state' and facilitates operations such as data conformation. However there are reasonable situations where this is not desirable. I've had occasion to build systems with and without ODS layers, and had specific reasons for the choice of architecture in each case.
Without going into the subtlties of data warehouse architecture or starting a Kimball vs. Inmon flame war the main benefits of a star schema are:
Most database management systems have facilities in the query optimiser to do 'Star Transformations' that use Bitmap Index structures or Index Intersection for fast predicate resolution. This means that selection from a star schema can be done without hitting the fact table (which is usually much bigger than the indexes) until the selection is resolved.
Partitioning a star schema is relatively straightforward as only the fact table needs to be partitioned (unless you have some biblically large dimensions). Partition elimination means that the query optimiser can ignore patitions that could not possibly participate in the query results, which saves on I/O.
Slowly changing dimensions are much easier to implement on a star schema than a snowflake.
The schema is easier to understand and tends to involve less joins than a snowflake or E-R schema. Your reporting team will love you for this
Star schemas are much easier to use and (more importantly) make perform well with ad-hoc query tools such as Business Objects or Report Builder. As a developer you have very little control over the SQL generated by these tools so you need to give the query optimiser as much help as possible. Star schemas give the query optimiser relatively little opportunity to get it wrong.
Typically your reporting layer would use star schemas unless you have a specific reason not to. If you have multiple source systems you may want to implement an Operational Data Store with a normalised or snowflake schema to accumulate the data. This is easier because an ODS typically does not do history. Historical state is tracked in star schemas where this is much easier to do than with normalised structures. A normalised or snowflaked Operational Data Store reflects 'current' state and does not hold a historical view over and above any that is inherent in the data.
ODS load processes are concerned with data scrubbing and conforming, which is easier to do with a normalised structure. Once you have clean data in an ODS, dimension and fact loads can track history (changes over time) with generic or relatively simple mechanisms relatively simply; this is much easier to do with a star schema, Many ETL tools (for example) provide built-in facilities for slowly changing dimensions and implementing a generic mechanism is relatively straightforward.
Layering the system in this way providies a separation of responsibilities - business and data cleansing logic is dealt with in the ODS and the star schema loads deal with historical state.