Author: emma
0 Preface
Every morning I wake up with a dilemma as I open my eyes: what to wear today? At this time, many options often pop up in my mind, but none of them satisfy me. I often fall asleep after thinking about it in the entanglement. After 20 minutes, I found that I had slept and woke up. I grabbed my t-shirt and shorts and put it on and left. I also dripped toothpaste on my body in a hurry.
So in the eyes of my colleagues, I have always been a sloppy female programmer, and I did not disappoint everyone. I often appeared in the image of cultural shirts and slippers.
But every morning, I never give up easily, and I still think: what to wear today? this problem. It seems to be a problem that I can never solve, but I am not willing to go around it.
How to solve
I have done more data analysis and recommendations, and I have the urge to collect data to solve problems when I see everything. So I came up with this brain-opening idea: use data analysis to solve the things that bother me every morning when I wake up, so that I can go to work happily and confidently.
I sorted out the overall process of using data to solve problems at work:
- Clearly define the problem that needs to be solved.
- Data collection, cleaning data.
- Define metrics and perform statistical calculations.
- Perform subdivision and drill-down comparisons on indicators, and observe data to draw conclusions.
- Come up with some typical case specific analysis.
- Through the conclusions of 4 and 5, optimize the strategy.
- Using the optimized strategy, continuously observe the metrics defined in 4.
There will also be many details in the middle, such as whether the indicators meet expectations, and the assumptions and verifications of problems encountered.
Write it, put it on the wall, and act. Every time a pit is opened, there is a complex emotion of excitement and tension coexisting.
Data analysis is such an exciting thing, and a lot of inspiration comes to mind. They need to be sorted out, otherwise it's easy to stray halfway.
When you don't see the data, you never know what to draw. Is the data that comes out the same as your expectations? If not what would be the reason? If not, what assumptions and verifications should be made?
The result is sometimes exciting, and often unavoidable loss. What I am most afraid of is not that the conclusion does not meet expectations, but that I have not found any useful conclusions after searching for a long time. It can only be accepted that there is no conclusion for the time being. Keep these data in mind, maybe someday there will be some inspiration to use.
It's really a work of logical reason and inspiration!
1. Define clearly the problem to be solved
In fact, I am not without clothes, although it is not too much, but it is also full of half of the wardrobe. When I was just starting to make money myself, I also "splurged" to buy a lot of Taobao explosions. But the feeling of being without clothes never seemed to go away.
To sort it out:
- I am often dissatisfied with my current selection of clothes
- I don't know how to buy it, it seems like I've been buying and still not enough
From the perspective of recommendation strategy, it can be considered that the wardrobe is our candidate pool. Various occasions and seasons in life represent the needs of users with different characteristics (in fact, it is me, who changes under different circumstances!).
For example (weekdays, work, spring, want to go to exercise after get off work, hope it is simple and bright, the sequence that I wore a few days ago (xxxxx), the sequence that got dirty and washed (xxxxx)) or (weekends, take the children to the park, summer, I will run and jump to take pictures, I hope it is convenient for action photos, …)
Recommended effect: Personal feelings, tangled for a long time or feel that the clothes are not enough. The effect needs to be improved.
The selection of clothing strategies and evaluation indicators here—whether one’s personal feelings are satisfactory or not, are relatively subjective and difficult to quantify. After all, women are so complicated that I don’t even understand myself.
And every time we are dissatisfied with our outfits, we feel that it is because there is no clothes to wear, that is, the pool (clothing) is insufficient.
So the problem I hope to solve is: how to optimize the pool to improve the effect when the distribution strategy and evaluation index are fixed .
Of course, since the pool was also bought according to my own decision, the problem is to solve: how to optimize the strategy of building a pool (buying clothes) . After all, the time to hesitate to buy clothes is often longer than to wear clothes.
If I can have a clear understanding of what kind of clothes I need, it will definitely save a lot of effort.
2. Data collection, cleaning data
Basic data construction and cleaning. Clean data is always the most important.
2.1 Basic data construction
Basic data: each piece of clothing, and its related attributes. Related properties are convenient for later statistics and drill-down. Each piece of clothing is photographed for case-by-case analysis.
If this analysis took me an entire weekend, that's 80% of the workload.
I smoothed out all the clothes in the closet and took pictures. I marked some tags and organized them in an excel sheet.
Combined with the goals of the analysis, the label is mainly based on the factors considered when buying clothes, the decision-making factors when wearing clothes, and finally whether the clothes should be worn or not, and the following labels are marked:
type (short-sleeved vest, pajamas, sweater, jumpsuit, etc.), season (spring and autumn, summer, winter)
Purchase time (student days, after going to work, within a year), purchase channels (shopping malls, Taobao, gifts from others). Color (flower, gray, stripes...)
Special degree (special, a little characteristic, quite satisfactory), upper body frequency (high, medium, low, gradually lower, I don't want to wear it again)
In fact, I want to mark more, such as who bought it with. When buying, the main purpose is whether to try it on when buying. But I really have no physical strength, and recalling the past and present of every piece of clothing is a very tiring thing.
2.2 Dirty data processing
If you don't take some samples in advance, or do some simple verification, it is easy to be trapped by dirty data. They tend to skew indicators such as the mean with very small quantities and very outlier values.
I removed some clothes. The main ones are: the elders think that I am suitable to wear and must give it to me, and those bought for special things cannot be worn a second time, such as performance costumes. These clothes were not chosen by me on my own initiative and are not included in the scope of analysis for the time being.
3. Define indicators for statistical calculation
3.1 Quantity
Simple and intuitive is also the most important indicator of the recommendation pool. After all, our "clothes are always not enough" appeal lies in quantity.
Contrast and segmentation thinking are mainly used here. Because the total amount must be quite a lot, it must be concentrated on the labels of certain subdivisions if it is not enough. Segmentation and comparison is all about finding these tags.
Look at the total first.
I don't know if this number is too much or not. This is a problem in data analysis: a lot of data needs to have an overall average or comparison to know the size. By observing this type of business data for a long time, some data can have a good idea of the mean and distribution, and you can know the size when you see it. For example, the click-through rate of mobile feeds ads is generally 1%+. Data such as the penetration rate of each tab of Cloud Music are known in advance.
And I don't have data on the number of other people's clothes or the average distribution. I can only make a simple estimate. 99 pieces are clothes, pants, outerwear and innerwear, all of which are counted.
There are 30 clothes for each season in three seasons. If the upper body and lower body are divided equally, each season becomes 15 clothes. 15 clothes in 4 months, the total amount is not too much (scratching head with a guilty conscience), at least not very exaggerated.
Simple drill-down and comparison of quantitative indicators - a very simple and easy way to draw conclusions
Summer clothes are the most and winter clothes are the least. match the southern climate.
When looking at each data, we will have a general prediction in our hearts. For example, for seasonal data, it can be preliminarily judged from the climate before looking at the data that summer should be the most. When the data is in line with our expectations, it is also a verification of the accuracy of the data.
When data does not match our expectations, attention and further verification checks are required.
Looking at the time, the clothes bought in the past 10 years still account for the vast majority. New clothes accounted for 33%, and 22% of clothes were 7 years ago. There are also a small number of clothes bought with more than 10 years of undergraduate degree. I don't seem to have gained much weight.
The distribution with frequency from low to high is left skewed. There are indeed many clothes that are used very infrequently (not preferred). The goal is to adjust the distribution to the right after confirming the feeling of "always feel that there is no suitable clothes".
The shopping mall buys the most clothes, and it is refreshing to take it away if you like it.
Formal clothes are less related to personal temperament. There are no formal occasions required. In line with expectations
Do some simple crosses in each dimension, and have some further conclusions
The problem of low frequency of use is the most serious in spring clothes, and there are fewer favorite clothes. The clothes that are currently in use in winter are still more often worn.
Occasions cross the season, and find that summer is really a romantic season, and there are more holiday styles. Formal clothes for each of the three seasons are perfect and sufficient. Next time you see more formal clothes, you don't have to spend time thinking about them.
Occasion cross special degree. There are more special clothes on holidays, and more normal clothes on weekdays. more reasonable.
Clothes also have a point that cannot be ignored - matching attributes. How the clothes don't match is also a big trouble in choosing.
The ratio of tops/bottoms is analyzed. Apart from dresses, jumpsuits do not need matching.
The inappropriate parts of the upper and lower assembly ratios appear:
- Spring 11.5 tops with a pair of pants
- There are very few jeans with all-match bottoms, and targeted replenishment is required
The analysis of quantitative indicators has given me a better understanding of my wardrobe. Know which categories need to be replenished. which are more adequate.
Besides quantity, quality is very important. Girls are more or less constantly buying clothes, but why do they keep buying clothes and still feel that they are not enough?
Focus on analyzing what clothes you don't want to wear again, what they look like. Learn from failures.
3.2 Elimination rate
Define elimination rate = clothes you never want to wear again / all clothes
"Buying clothes that I haven't worn much" is the biggest pain in my heart. It takes up space, doesn't wear it, costs money, and is told, "Look at so many clothes in the cabinet, why are you saying there are no clothes!"
Analyze the characteristics of clothes with a high elimination rate to avoid stepping on thunder. Also, give yourself some guidance when shopping for clothes in the future.
Likewise, dimensional segmentation thinking, and comparative thinking. as the main means.
The overall elimination rate is 30%. One-third of the invalid clothes, the proportion is still relatively high.
In terms of seasons, winter is particularly high. Although winter clothes are used more frequently, there are also more clothes that you don't want to wear anymore. Some need to be eliminated.
I want to discuss an issue here. There are many dimensions, how to choose when we drill down.
For large-scale data and high-dimensional situations, we can use machine learning methods to specify the indicator of the elimination rate, and then calculate the contribution of each feature.
But in data analysis, interpretability is very important, and a lot of data is to test our hypothesis. There is no need to make accurate predictions, or to train models. (Of course, if you use a model, you will generally still look at the features with high contribution, whether they are in line with expectations, and whether there is any inspiration)
Therefore, in data analysis, the preferred drill-down dimensions are: those that are most likely to be discriminative, those that can verify some assumptions, or those that have special meaning in scenarios.
For example, many drill downs in terms of quantity are expanded according to the "season" dimension. Because the dimension of season has a special meaning. Spring, summer and autumn clothes cannot be worn each other. Therefore, it is easier to find some problems by drilling down to this dimension first.
As for the indicator of elimination rate, priority drilling is most likely to be discriminative, and it is also a dimension that can verify assumptions: purchase time.
Are the clothes you don’t want to wear directly related to the old and new? If you just don't want to wear it because you've bought it for a long time, it's not a matter of decision-making at the time of purchase.
The elimination rate from high to low is: graduate or post-work purchase > undergraduate purchase > purchase within one year.
The elimination rate is not the lower the newer clothes. The elimination rate of undergraduate clothes is lower than after going to work. Does this mean early vision is better? It’s important to note that only 5% of your wardrobe is clothes you bought as an undergraduate.
The reason here can be imagined: the clothes bought by undergraduates were ten years ago, and the ones that can be kept until now are probably my favorite batch. If all the clothes of the undergraduate are kept until now, the elimination rate will definitely be much higher.
The elimination rate of clothes bought within a year is the lowest. There are still relatively few aesthetic pits recently.
Therefore, there is an unfair place in the elimination rate indicator: the elimination rate of clothes bought in the past year is obviously low.
Then, if there is a type of clothing with a low elimination rate, it is not necessarily because of my wise decision-making, but also because I have bought a lot of clothes recently, and the clothes within a year account for a large proportion.
So as seen earlier, the elimination rate of summer clothes is low, is it because more summer clothes are bought in one year?
Look at the season and the time of purchase.
It can be seen that the elimination rate of clothes purchased within a year and a year ago in summer is lower than that in spring and autumn. And it's exceptionally low within a year.
Considering that most of the short sleeves are in summer, it is not easy to step on the pit.
Of note is winter clothing. The elimination rate for purchases within a year is higher than a year ago. Although there are some active winter clothes that are used very frequently. But recently bought, the probability of not wanting to wear it at all is also higher. Reasonable shopping is needed in the near future.
Purchase channel is also an important dimension. Recently, the proportion of online shopping is increasing.
But what is more disturbing is that the elimination rate of clothes purchased online is actually higher than that of others.
In terms of style
Clothes that are more maverick are more likely to be eliminated. Moderate clothes are relatively safe and in line with common sense.
Especially for the special styles in spring, you need to be cautious, and the elimination rate is against the sky. It's not a problem to have more variety in summer.
4. Typical case specific analysis
Which dimensions have a relatively high failure rate, after having a general understanding. In order to further imprint the badcase in the heart, eat a cut and gain wisdom.
I put a reason tag on the clothes I don't want to wear anymore, and what's going on. Adopt traceability thinking. and give examples of solutions
5. Output conclusion: buying clothes strategy
In summary, this weekend summed up the following strategies
- Denim trousers are badly needed;
- Go to the mall and try on winter clothes. Winter clothes have always been worn by some older ones, and there is a risk that they will end if they are broken;
- Summer clothes are plentiful and personal satisfaction is high. Buying can be postponed; online shopping can be icing on the cake occasionally;
- Don't buy bells and whistles for spring clothes. Bought and basically not wearing;
- Online shopping for clothes that don't fit right back. Online shopping is not good-looking as the first reason for elimination;
6 Keep watching data as decisions change
Do not do scattered data, but do an analysis system. is a very important point.
In the analysis, the indicators that can find the problem settle down. It becomes critical to observe the business situation and the resulting changes in strategy.
When the measures in step 6 are executed, update the original data and observe the changes in the indicators. Timely adjustment of the direction is the key to maintaining the "ecological health" of the wardrobe.
But time is limited, and I am a little bit broken about the original data collection and entry. Hope to stick to it.
finally
Summarize the data analysis methods and key points encountered in this article:
- The problem needs to be sorted out and defined.
- Set key metrics.
- Clean underlying data is critical.
- Drilling down and comparative analysis of key indicators, although the method is simple, can draw a lot of conclusions.
- Some assumptions can be set to verify.
- Pay attention to whether the indicators are fair. If there are some natural deviations in the indicators, remember to analyze them in buckets.
- Analyzing badcases is a powerful tool for developing strategies.
- Avoid one-time work, and long-term observation constitutes an analysis system.
Thank you for seeing this, I went to pack more than 100 clothes.
This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit all kinds of technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at staff.musicrecruit@service.netease.com .
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。