Python – simplifying data structures and condition statements in python code

conditionaldata-structurespython

I was wondering if there are any ways to simplify the following piece of Code. As you can see, there are numerous dicts being used as well as condition statements to weed out bad input data. Note that the trip rate values are not all inputed yet, the dicts are just copied and pasted for now

EDIT

In any of the rates, (x,y):z . x and y are correct, the z values are not as they're just copy/pasted

this code works in case you want to copy, paste, and test it

import math


# step 1.4 return trip rates
def trip_rates( population_stratification, analysis_type, low_income, medium_income, high_income ):
  ''' this function returns the proper trip rate tuple to be used based on input 
    data 
    ADPT = Average Daily Person Trips per Household
    pph = person per household
    veh_hh = vehicles per household
    (param_1, param_2): ADPT
  '''
  li = low_income
  mi = medium_income
  hi = high_income
  # table 5 -
  if analysis_type == 1:
    if population_stratification == 1:
      rates = {( li, 1 ):3.6, ( li, 2 ):6.5, ( li, 3 ):9.1, ( li, 4 ):11.5, ( li, 5 ): 13.8,
               ( mi, 1 ):3.9, ( mi, 2 ):7.3, ( mi, 3 ):10.0, ( mi, 4 ):13.1, ( mi, 5 ): 15.9,
               ( hi, 1 ):4.5, ( mi, 2 ):9.2, ( mi, 3 ):12.2, ( mi, 4 ):14.8, ( mi, 5 ): 18.2}
      return rates
    if population_stratification == 2:
      rates = {
               ( li, 1 ):3.1, ( li, 2 ):6.3, ( li, 3 ):9.4, ( li, 4 ):12.5, ( li, 5 ): 14.7,
               ( mi, 1 ):4.8, ( mi, 2 ):7.2, ( mi, 3 ):10.1, ( mi, 4 ):13.3, ( mi, 5 ): 15.5,
               ( hi, 1 ):4.9, ( mi, 2 ):7.7, ( mi, 3 ):12.5, ( mi, 4 ):13.8, ( mi, 5 ): 16.7
              }
      return rates
    if population_stratification == 3: #TODO: input actual rate
      rates = {
               ( li, 1 ):3.6, ( li, 2 ):6.5, ( li, 3 ):9.1, ( li, 4 ):11.5, ( li, 5 ): 13.8,
               ( mi, 1 ):3.9, ( mi, 2 ):7.3, ( mi, 3 ):10.0, ( mi, 4 ):13.1, ( mi, 5 ): 15.9,
               ( hi, 1 ):4.5, ( mi, 2 ):9.2, ( mi, 3 ):12.2, ( mi, 4 ):14.8, ( mi, 5 ): 18.2
              }
      return rates
    if population_stratification == 4: #TODO: input actual rate
      rates = {
               ( li, 1 ):3.1, ( li, 2 ):6.3, ( li, 3 ):9.4, ( li, 4 ):12.5, ( li, 5 ): 14.7,
               ( mi, 1 ):4.8, ( mi, 2 ):7.2, ( mi, 3 ):10.1, ( mi, 4 ):13.3, ( mi, 5 ): 15.5,
               ( hi, 1 ):4.9, ( mi, 2 ):7.7, ( mi, 3 ):12.5, ( mi, 4 ):13.8, ( mi, 5 ): 16.7
              }
      return rates
  #table 6
  elif analysis_type == 2:
    if population_stratification == 1: #TODO: Change rates
      rates = {
               ( 0, 1 ):3.6, ( 0, 2 ):6.5, ( 0, 3 ):9.1, ( 0, 4 ):11.5, ( 0, 5 ): 13.8,
               ( 1, 1 ):3.9, ( 1, 2 ):7.3, ( 1, 3 ):10.0, ( 1, 4 ):13.1, ( 1, 5 ): 15.9,
               ( 2, 1 ):4.5, ( 2, 2 ):9.2, ( 2, 3 ):12.2, ( 2, 4 ):14.8, ( 2, 5 ): 18.2,
               ( 3, 1 ):4.5, ( 3, 2 ):9.2, ( 3, 3 ):12.2, ( 3, 4 ):14.8, ( 3, 5 ): 18.2
              }
      return rates
    if population_stratification == 2: #TODO: Change rates
      rates = {
               ( 0, 1 ):3.6, ( 0, 2 ):6.5, ( 0, 3 ):9.1, ( 0, 4 ):11.5, ( 0, 5 ): 13.8,
               ( 1, 1 ):3.9, ( 1, 2 ):7.3, ( 1, 3 ):10.0, ( 1, 4 ):13.1, ( 1, 5 ): 15.9,
               ( 2, 1 ):4.5, ( 2, 2 ):9.2, ( 2, 3 ):12.2, ( 2, 4 ):14.8, ( 2, 5 ): 18.2,
               ( 3, 1 ):4.5, ( 3, 2 ):9.2, ( 3, 3 ):12.2, ( 3, 4 ):14.8, ( 3, 5 ): 18.2
              }
      return rates
    if population_stratification == 3: #TODO: Change rates
      rates = {
               ( 0, 1 ):3.6, ( 0, 2 ):6.5, ( 0, 3 ):9.1, ( 0, 4 ):11.5, ( 0, 5 ): 13.8,
               ( 1, 1 ):3.9, ( 1, 2 ):7.3, ( 1, 3 ):10.0, ( 1, 4 ):13.1, ( 1, 5 ): 15.9,
               ( 2, 1 ):4.5, ( 2, 2 ):9.2, ( 2, 3 ):12.2, ( 2, 4 ):14.8, ( 2, 5 ): 18.2,
               ( 3, 1 ):4.5, ( 3, 2 ):9.2, ( 3, 3 ):12.2, ( 3, 4 ):14.8, ( 3, 5 ): 18.2
              }
      return rates
    if population_stratification == 4: #TODO: Change rates
      rates = {
               ( 0, 1 ):3.6, ( 0, 2 ):6.5, ( 0, 3 ):9.1, ( 0, 4 ):11.5, ( 0, 5 ): 13.8,
               ( 1, 1 ):3.9, ( 1, 2 ):7.3, ( 1, 3 ):10.0, ( 1, 4 ):13.1, ( 1, 5 ): 15.9,
               ( 2, 1 ):4.5, ( 2, 2 ):9.2, ( 2, 3 ):12.2, ( 2, 4 ):14.8, ( 2, 5 ): 18.2,
               ( 3, 1 ):4.5, ( 3, 2 ):9.2, ( 3, 3 ):12.2, ( 3, 4 ):14.8, ( 3, 5 ): 18.2
              }
      return rates
  # table 7
  elif analysis_type == 3:
    if population_stratification == 1: #TODO: input actual rate
      rates = {
               ( li, 0 ):3.6, ( li, 1 ):6.5, ( li, 2 ):9.1, ( li, 3 ):11.5,
               ( mi, 0 ):3.9, ( mi, 1 ):7.3, ( mi, 2 ):10.0, ( mi, 3 ):13.1,
               ( hi, 0 ):4.5, ( mi, 1 ):9.2, ( mi, 2 ):12.2, ( mi, 3 ):14.8
              }
      return rates
    if population_stratification == 2: #TODO: input actual rate
      rates = {
               ( li, 0 ):3.6, ( li, 1 ):6.5, ( li, 2 ):9.1, ( li, 3 ):11.5,
               ( mi, 0 ):3.9, ( mi, 1 ):7.3, ( mi, 2 ):10.0, ( mi, 3 ):13.1,
               ( hi, 0 ):4.5, ( mi, 1 ):9.2, ( mi, 2 ):12.2, ( mi, 3 ):14.8
              }
      return rates
    if population_stratification == 3: #TODO: input actual rate
      rates = {
               ( li, 0 ):3.6, ( li, 1 ):6.5, ( li, 2 ):9.1, ( li, 3 ):11.5,
               ( mi, 0 ):3.9, ( mi, 1 ):7.3, ( mi, 2 ):10.0, ( mi, 3 ):13.1,
               ( hi, 0 ):4.5, ( mi, 1 ):9.2, ( mi, 2 ):12.2, ( mi, 3 ):14.8
              }
      return rates
    if population_stratification == 4: #TODO: input actual rate
      rates = {
               ( li, 0 ):3.6, ( li, 1 ):6.5, ( li, 2 ):9.1, ( li, 3 ):11.5,
               ( mi, 0 ):3.9, ( mi, 1 ):7.3, ( mi, 2 ):10.0, ( mi, 3 ):13.1,
               ( hi, 0 ):4.5, ( mi, 1 ):9.2, ( mi, 2 ):12.2, ( mi, 3 ):14.8
              }
      return rates

def interpolate( population_stratification, analysis_type, low_income, medium_income, high_income, x, y ):
  #get rates dict
  rates = trip_rates( population_stratification, analysis_type, low_income, medium_income, high_income )


  # dealing with x parameters
  #when using income levels, x_1 and x_2 are li, mi, or hi
  if analysis_type == 1 or analysis_type == 2 or analsis_type == 4:
    if x < high_income and x >= medium_income:
      x_1 = medium_income
      x_2 = high_income
    elif x < medium_income:
      x_1 = low_income
      x_2 = medium_income
    else:
      x_1 = high_income
      x_2 = high_income
  if analysis_type == 3:
    if x >= 3:
      x_1 = 3
      x_2 = 3
    else:
      x_1 = int( math.floor( x ) )
      x_2 = int( math.ceil( x ) )

  # dealing with y parametrs
  #when using persons per household, max number y = 5
  if analysis_type == 1 or analysis_type == 4:
    if y >= 5:
      y_1 = 5
      y_2 = 5
    else:
      y_1 = int( math.floor( y ) )
      y_2 = int( math.ceil( y ) )
  elif analysis_type == 2 or analysis_type == 3:
    if y >= 5:
      y_1 = 5
      y_2 = 5
    else:
      y_1 = int( math.floor( y ) )
      y_2 = int( math.ceil( y ) )

  # denominator
  z = ( ( x_2 - x_1 ) * ( y_2 - y_1 ) )

  result = ( ( ( rates[( x_1, y_1 )] ) * ( ( x_2 - x ) * ( y_2 - y ) ) / ( z ) ) +
             ( ( rates[( x_2, y_1 )] ) * ( ( x - x_1 ) * ( y_2 - y ) ) / ( z ) ) +
             ( ( rates[( x_1, y_2 )] ) * ( ( x_2 - x ) * ( y - y_1 ) ) / ( z ) ) +
             ( ( rates[( x_2, y_2 )] ) * ( ( x - x_1 ) * ( y - y_1 ) ) / ( z ) ) )

  return result

#test
low_income = 20000 #this is calculated using exchange rates
medium_income = 40000 # this is calculated using exchange rates
high_income = 60000 # this is calculated using exchange rates
population_stratification = 1 #inputed by user
analysis_type = 1 #inputed by user
x = 35234.34 #test income
y = 3.5 # test pph

print interpolate( population_stratification, analysis_type, low_income, medium_income, high_income, x, y )

Best Solution

Well, where to start? Here is just a first observation:

You have a lot of data there, and it seems code and data are mixed into each other.

Data and Code should be separate. Data is an external source, something you modify or read in. You could probably adapt your code to quickly parse Data from a good editable representation to a representation useful for your algorithms. I suspect your code will be shorter, clearer, and less error prone (did you notice all of the 'rates' dictionaries have multiple keys, and you miss a lot of 'hi' keys?).

If you need better abstractions such as matrices and arrays of data, look into numpy


Edit 1

Did you count your number of dimensions? You have a many-dimensional matrix here with X dimensions: analysis_type, population_stratification, income_level, index

If I see right this is a 3x4x3x3 (= 108 entries) "matrix" or "lookup table". If this is the data your model builds on, fine. But can't you put those numbers in a file, or table that you read in? Your code would be next to trivial.


Edit 2

Ok, I'll bite for some minor python style: Testing for values in a Set or a Range.

Instead of:

if analysis_type == 1 or analysis_type == 2 or analsis_type == 4:

you can use

if analysis_type in (1, 2, 4):

or even using readable names as (CUBIC, ..) as suggested.

Instead of:

if x < high_income and x >= medium_income:

you can used chained conditions; Python is one of the few programming languages where conditions chain to make nautral if statements:

if medium_income <= x < high_income:

Edit 3

More important than small code figures is of course code design and refactoring. Edit 2 can only give you some polish.

You should learn to loathe duplicate code.

Also, you have quite a lot of branches in one function. That is a good sign you should break it up into multiple functions. It can also reduce duplication. For example, when one variable like analysis_type can totally change what the function does, why have two different behaviors in one function? You shouldn't have the whole program in one function. Perhaps analysis_type == 3 is better expressed in its own function (as an example)?

Do you understand that your function trip_rates basically does an array lookup, where the array lookup is hardcoded as if ..: return .. if : return .., and the array is written out in full in the function? What if trip_rates could be implemented like this? Would it be possible?

data_model = compute_table(low_income, ...)
return data_model[analysis_type][population_stratification]