But I can imagine a system roughly like the following. First, find the verb and it's frame and true literal meaning (maybe it actually does have more than one). This verb frame will impose some restrictions on its arguments (for instance, drive might say that both the agent and theme have to be animate). If you encounter something where this is not true (eg., a "car" as a theme or "passion" as an agent), you know that this must be a metaphorical usage. At this point, you have to deduce what it must mean. That is, if we have some semantics associated with the literal interpretation, we have to figure out how to munge it to work in the metaphorical case. For instance, for drive, we might say that the semantics are roughly "E = theme moves & E' = theme executes E & agent causes E'" If the patient cannot actually execute things (it's a nail), then we have to figure that something else (eg., in this case, the agent) did the actual executing. Etc.
Sounds like a job for FrameNet (if FrameNet were better ... and the page actually loaded, that is, you may have to settle for the Wikipedia entry). My own review of a sense disambiguation hypothesis here.