In ML.Net what are the counterparts of Numpy/ Pandas python libraries?
2 Answers
Here are all the available .NET counterparts that I know of:
Numpy
there are a few Tensor type proposals in dotnet/corefx:
There is also an implementation of NumPy made by the SciSharp org.
Pandas
On dotnet/corefx there is a DataFrame Discussion issue, which has spawned a dotnet/corefxlab project to implement a C# DataFrame library similar to Pandas.
There are also other DataFrame implementations:
ML.NET
In ML.NET, IDataView is an interface that abstracts the underlying storage for tabular data, ex. a DataFrame. It doesn't have the full rich APIs like a Pandas DataFrame does, but instead it supports reading data from any underlying source - for example a text file, SQL table, in-memory objects, etc.
There currently isn't a "data exploration" API in ML.NET v1.0, like you would have with a Pandas DataFrame. The current plan is for the corefxlab DataFrame class to implement IDataView, and then you can use DataFrame to do the data exploration, and feed it directly into ML.NET.
UPDATE: For a "data exploration" API similar to Pandas, check out the Microsoft.Data.Analysis package, which is currently in preview. It implements IDataView and can be fed directly into ML.NET to train or make predictions.
Comments
It is mostly the regular .NET types + the IDataView types. The document is a bit out of date.