k80 sneaker

k80 sneaker

k1x sneaker lp sport

K80 Sneaker

CLICK HERE TO CONTINUE




Hi, my name is Takuma. I am a software and machine learning engineer at Mercari. Artificial Intelligence (AI) is a buzzword nowadays. We also often see terms, such as 'Deep Learning' and 'Deep Neural Networks' that are subsets of AI and machine learning. I would like to share our experiment on image classification using deep learning. Deep learning is a variation of neural networks techniques. In the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), held in Edinburgh, UK, Simard et al. (Microsoft Research) said in their paper 'Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis' that: After being extremely popular in the early 1990s, neural networks have fallen out of favor in research in the last 5 years. In 2000, it was even pointed out by the organizers of the Neural Information Processing System (NIPS) conference that the term “neural networks” in the submission title was negatively correlated with acceptance.




(I actually attended the conference as a student and I made a presentation on digits detection and recognition.) At that time, many researchers were being attracted to other algorithms like Support Vector Machine (SVM) and so on. Some researchers, such as, Yann LeCun, Geoff Hinton, Yoshua Bengio and Andrew Ng, continued to study neural networks. Thanks to their achievements, algorithms based on deep learning have achieved better results than others on many tasks and competitions, such as ILSVRC2012. I like their talk about their struggles in the winter: Deep Learning Gurus Talk about History and Future of Machine Learning. After their achievements, many researchers started using deep learning techniques again. Deep learning algorithm breakthroughs, coupled with the latest hardware improvements, made it more practical, since large amounts of data and huge computational resources are required for deep learning. I like the TED talk by Fei-Fei Li (director of Stanford's Artificial Intelligence Lab and Vision Lab) and her explanation as to why large amounts of data are needed for AI.




(I visited Stanford University last year, but I couldn't see her unfortunately...) One of the practical applications of deep learning is image classification / object recognition. We prepared an image data set from Mercari that contains 1 million item images from about 1,000 categories (1,000 images for each category). We used 90% of the images for training and the other for evaluation. We conducted our image classification experiment using TensorFlow. TensorFlow is Google's open source machine learning library. It's not just for neural networks. We used the Inception-v3 model. It is a powerful image classification algorithm based on deep neural networks. In order to train the model from scratch, TFRecord format data is needed. A script for converting images to TFRecord format data is included in the repository. ImageNet is a common academic image data set for image classification. The data set is described in the TED talk above. Also the Inception-v3 model uses the data set as a training example.




Although we didn't depend on ImageNet, we used imagenet_train.py as described in README. This is also available for any image data set without the need to make any changes as long as the number of categories is less than or equal to 1,000 and each category has around 1,000 images. Even if your data has more categories, you just have to change the number of categories and images for training in imagenet_data.py Then, you can run the training script. GPUs have been more commonly used for machine learning in recent years. It is possible to train the model without GPUs. However, it may take several or more months to obtain practical results. Even when using a single GPU, due to GPU memory limitations, the batch size for training the Inception-v3 model should be less than or equal to 32 in our environment (AWS EC2 p2.xlarge) that has a single TESLA K80 GPU. In general, a larger batch size leads to better results. One of the comments in inception_train.py said:




Following the comment, we used p2.8xlarge that has 8 TESLA K80 GPUs and set the batch size to 256. One of the features of TensorFlow is TensorBoard. We can monitor and check the training status and models through the web browser. We stopped the training at 90K steps. If we kept the training, the training loss would be improved little by little. And also better results would be obtained, however this is time consuming. Since it took around 2 days for 90K steps with K80s and 100 hours for 100K steps with K40s, K80 may be 2x faster than K40. The accuracy was lower than we had expected... Here are some possible reasons for why we got such a result. Some categories are very similar, such as, 'Men > Shoes > Sneakers' and 'Women > Shoes > Sneakers'. Also some varieties of clothes for men and women shared similarities. Moreover, some categories such as "Tickets" cannot be recognized without OCR (Optical Character Recognition). This is needed, for example to classify tickets for events featuring Japanese artists, foreign artists, as well as things like bus and train tickets etc...




In the tensorflow/models/inception, scripts are used for batch training and evaluation. When we want to use a trained model for non-batch image classification tasks, we need to write about 20 lines of code. Classification score is not treated in the imagenet_eval. Since it would be helpful to know the confidence of the classifications, we added this line of code scores = tf.nn.softmax(logits). This is actually calculated in the inception model, but it is not returned. After this, we get the classification results for the image. We applied this trained model to some images. I shared our experiment results of image classification with deep learning. Thanks to OSS such as TensorFlow, we don't need to write code, and also, knowledge of machine learning is not always required for a simple image classification task. All we need are large amounts of labeled data and huge computational resources for image classification tasks nowadays. Not only image classification, but also other machine learning applications like speech recognition, natural language processing and so on, are being improved dramatically by deep learning approaches.

Report Page