We tend to pick some attributes on a product when we assess a performance of it. It can be resolution for displays or cameras and bit rates or battery capacity for phones. It's absolutely robust way to assess the performance. For that robustness, company were trying to improve numbers of these attributes to sale them, because It's so robust that sales department can explain strength of it easily.
However in terms of business, value of product is not correlated with the performance. People don't feel value for products with rational manner. They intuitively think "like" or "unlike" for the products.
For instance, Resolution. We have a capacity to recognize details of view. I bet that when you saw a phone with the display which have 10 times higher resolution, you won't pay 10 times more for the phone. Even if it has great performance, it can be in vain when the people can not recognize the performance.
Speaking of AI products, It becomes a little bit complex. Usually the performance of AI products are assessed with "properness", like If it returns proper answer for "us". For the task adding captions to photo, the properness of it can be "does the caption explain the photo properly". But if I ask people to add caption for one certain people, I'm sure that the answers won't be the same for all people. Here is the problem.
It's common way to assess performance of machine learning tasks that firstly preparing two types of dataset, learning dataset and validation dataset, which both have input and output, then after making model learnt with learning dataset, calculate the percentage of correctness between input and output with validation dataset. It's common, but still have problems for assessing the value of it.
- the performance can't be more than 100%. It easily leads to make the goal to "improve the number" even more than certain level would not be recognized as a value for people.
- Even it gets 100%, properness among people are different. it depends on data. you can not collect all opinions for the people all over the world, so that the argument "the number is worthless because I don't think this is the answer" can stand.
- Even it gets 100% it doesn't mean that it has value for people because it assesses just model not service. The way to deliver this performance also have important factor to make people feel value on the service.
With those problem, i think It's important to
- Verify recognition of the people to prevent too much pursuing for the performance
- Put the resource to not only improve performance of AI but also design of service to deliver value of it.