Unverified Commit 1ed04784 authored by Alexandre ANDORRA's avatar Alexandre ANDORRA Committed by GitHub

Expand pm.Data capacities (#3925)

* Initial changes to allow pymc3.Data() to support both int and float input data (previously all input data was coerced to float)
WIP for #3813

* added exception for invalid dtype input to pandas_to_array

* Refined implementation

* Finished dtype conversion handling

* Added SharedVariable option to getattr_value

* Added dtype handling to set_data function

* Added tests for pm.Data used for index variables

* Added tests for using pm.data as RV input

* Ran Black on data tests files

* Added release note

* Updated release notes

* Updated code in light of Luciano's comments

* Fixed implementation of integer checking

* Simplified implementation of type checking

* Corrected implementation for other uses of pandas_to_array
Co-authored-by: default avatarhottwaj <jonathan.a.clarke@gmail.com>
parent 7f307b91
......@@ -3,7 +3,7 @@
## PyMC3 3.9 (On deck)
### New features
- use [fastprogress](https://github.com/fastai/fastprogress) instead of tqdm [#3693](https://github.com/pymc-devs/pymc3/pull/3693)
- Use [fastprogress](https://github.com/fastai/fastprogress) instead of tqdm [#3693](https://github.com/pymc-devs/pymc3/pull/3693).
- `DEMetropolis` can now tune both `lambda` and `scaling` parameters, but by default neither of them are tuned. See [#3743](https://github.com/pymc-devs/pymc3/pull/3743) for more info.
- `DEMetropolisZ`, an improved variant of `DEMetropolis` brings better parallelization and higher efficiency with fewer chains with a slower initial convergence. This implementation is experimental. See [#3784](https://github.com/pymc-devs/pymc3/pull/3784) for more info.
- Notebooks that give insight into `DEMetropolis`, `DEMetropolisZ` and the `DifferentialEquation` interface are now located in the [Tutorials/Deep Dive](https://docs.pymc.io/nb_tutorials/index.html) section.
......@@ -14,6 +14,8 @@
- `pm.sample` now has support for adapting dense mass matrix using `QuadPotentialFullAdapt` (see [#3596](https://github.com/pymc-devs/pymc3/pull/3596), [#3705](https://github.com/pymc-devs/pymc3/pull/3705), [#3858](https://github.com/pymc-devs/pymc3/pull/3858), and [#3893](https://github.com/pymc-devs/pymc3/pull/3893)). Use `init="adapt_full"` or `init="jitter+adapt_full"` to use.
- `Moyal` distribution added (see [#3870](https://github.com/pymc-devs/pymc3/pull/3870)).
- `pm.LKJCholeskyCov` now automatically computes and returns the unpacked Cholesky decomposition, the correlations and the standard deviations of the covariance matrix (see [#3881](https://github.com/pymc-devs/pymc3/pull/3881)).
- `pm.Data` container can now be used for index variables, i.e with integer data and not only floats (issue [#3813](https://github.com/pymc-devs/pymc3/issues/3813), fixed by [#3925](https://github.com/pymc-devs/pymc3/pull/3925)).
- `pm.Data` container can now be used as input for other random variables (issue [#3842](https://github.com/pymc-devs/pymc3/issues/3842), fixed by [#3925](https://github.com/pymc-devs/pymc3/pull/3925)).
### Maintenance
- Tuning results no longer leak into sequentially sampled `Metropolis` chains (see #3733 and #3796).
......@@ -153,9 +153,9 @@ class Minibatch(tt.TensorVariable):
Consider we have `data` as follows:
>>> data = np.random.rand(100, 100)
if we want a 1d slice of size 10 we do
>>> x = Minibatch(data, batch_size=10)
......@@ -182,7 +182,7 @@ class Minibatch(tt.TensorVariable):
>>> assert x.eval().shape == (10, 10)
You can pass the Minibatch `x` to your desired model:
>>> with pm.Model() as model:
......@@ -192,7 +192,7 @@ class Minibatch(tt.TensorVariable):
Then you can perform regular Variational Inference out of the box
>>> with model:
... approx = pm.fit()
......@@ -478,16 +478,19 @@ class Data:
For more information, take a look at this example notebook
def __new__(self, name, value):
if isinstance(value, list):
value = np.array(value)
# Add data container to the named variables of the model.
model = pm.Model.get_context()
except TypeError:
raise TypeError("No model on context stack, which is needed to "
"instantiate a data container. Add variable "
"inside a 'with model:' block.")
raise TypeError(
"No model on context stack, which is needed to instantiate a data container. "
"Add variable inside a 'with model:' block."
name = model.name_for(name)
# `pm.model.pandas_to_array` takes care of parameter `value` and
......@@ -498,7 +501,6 @@ class Data:
# its shape.
shared_object.dshape = tuple(shared_object.shape.eval())
return shared_object
......@@ -111,6 +111,9 @@ class Distribution:
if isinstance(val, tt.TensorVariable):
return val.tag.test_value
if isinstance(val, tt.sharedvar.TensorSharedVariable):
return val.get_value()
if isinstance(val, theano_constant):
return val.value
......@@ -1244,7 +1244,7 @@ def set_data(new_data, model=None):
new_data: dict
New values for the data containers. The keys of the dictionary are
the variables names in the model and the values are the objects
the variables' names in the model and the values are the objects
with which to update.
model: Model (optional if in `with` context)
......@@ -1266,7 +1266,7 @@ def set_data(new_data, model=None):
.. code:: ipython
>>> with model:
... pm.set_data({'x': [5,6,9]})
... pm.set_data({'x': [5., 6., 9.]})
... y_test = pm.sample_posterior_predictive(trace)
>>> y_test['obs'].mean(axis=0)
array([4.6088569 , 5.54128318, 8.32953844])
......@@ -1275,6 +1275,8 @@ def set_data(new_data, model=None):
for variable_name, new_value in new_data.items():
if isinstance(model[variable_name], SharedVariable):
if isinstance(new_value, list):
new_value = np.array(new_value)
message = 'The variable `{}` must be defined as `pymc3.' \
......@@ -1501,7 +1503,17 @@ def pandas_to_array(data):
ret = generator(data)
ret = np.asarray(data)
return pm.floatX(ret)
# type handling to enable index variables when data is int:
if hasattr(data, "dtype"):
if "int" in str(data.dtype):
return pm.intX(ret)
# otherwise, assume float:
return pm.floatX(ret)
# needed for uses of this function other than with pm.Data:
return pm.floatX(ret)
def as_tensor(data, name, model, distribution):
This diff is collapsed.
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment