原文地址:销售数据分析

Introduction

数据分析类,读取csv文件,对销售数据进行分析。

Sale

Case Description

This business case is mainly concerned with the forecasting of sales in different stores in the retail industry. The task involves the analysis of historical sales data collected from a nationwide retailer in the U.S. The aim is to expose you to a realistic business case and to gain understanding and insight about some of the ways in which data analytics can be used to support business decision making.

Description of the business case

Accurately forecasting sales is one of the most difficult challenges faced by retailers worldwide, especially when limited historical data is available. In this coursework project, you are provided with historical sales data for 45 stores located in different regions in the U.S. Each store contains a number of departments, and you are asked to predict the sales for each department at each store. In addition, the retailer runs several promotional marketing activities during holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the forecasting accuracy evaluation than non-holiday weeks. A challenge in this sales forecasting problem is to take into account the effects of promotional activities on sales given the fact that part of the promotion related data is absent from historical records. The available data are briefly introduced below.

stores.csv

This excel file contains the anonymised number, type and size of the 45 stores.

Column Description
Store the anonymised store number
Type store type, A: supercentre, B: superstore, C: supermarket
Size store size (in square feet)

features.csv

This excel file contains additional data related to the store, department, and regional activity for the given dates. It includes the following fields:

Column Description
Store the anonymised store number
Date the week with the dated Friday
Temperature average temperature in the region
Fuel_Price cost of fuel in the region
Promotions anonymised data related to promotions, mainly price reductions that the retailer is running. Promotion data is only available after Nov. 2011, and is not available for all stores all the time. Any missing value is marked with an NA.
CPI the consumer price index
Unemployment the unemployment rate
IsHoliday whether the week is a special holiday week

The four public holidays included in the data fall within the following weeks.


Super Bowl: 12/02/2010, 11/02/2011, 10/02/2012, 08/02/2013
Labor Day: 10/09/2010, 09/09/2011, 07/09/2012, 06/09/2013
Thanksgiving: 26/11/2010, 25/11/2011, 23/11/2012, 29/11/2013
Christmas: 31/12/2010, 30/12/2011, 28/12/2012, 27/12/2013

train.csv

This file contains the historical training data, which covers sales from 05/02/2010 to 26/10/2012. It includes the following fields:

Column Description
Store the anonymised store number
Department the anonymised department number
Date the week with the dated Friday
Weekly_Sales sales for the given department in the given store
IsHoliday whether the week is a special holiday week

test.csv

This file is identical to train.csv, except you need to predict the weekly sales for each triplet of store, department, and date from 02/11/2012 to 26/07/2013.

Evaluation of forecasting accuracy

In the coursework report the following weighted mean absolute error (WMAE) or other appropriate errors should be used to evaluate forecasting accuracy.

Understanding Data and their Environment: Assessed Work

For your assessment for this course you will need to complete two tasks: an essay and a report on a data analytical project. The report should be submitted should be submitted by the 15th March 2018 via Turnitin.

Report

Length Less than 3000 words (excluding tables, references and appendices - if needed).

You will find a set of datasets on blackboard in the assessed folder. Your task is to describe, pre-process and analyse the datasets so as to lead to the development of accurate predictive models.

Your work should cover (but not be limited to) the following:

  1. Review the available data and describe it in terms of its variables, quality, and relevance to the sales prediction.
  2. Link data sets together as appropriate.
  3. Pre-process the data as appropriate for further analysis.
  4. Identify the key factors affecting sales, for example, you may want to check whether fuel price and CPI have an impact on sales, and how public holidays cause sales fluctuations.
  5. Build at least one predictive model using the variables you identified.

Produce a report, which describes the process that you went through and present your analytical solution and any relevant exploratory/supporting analyses.

You can use whatever software you wish to carry out the task.

Some tips on the producing the report

  1. Imagine that you are writing the report for someone to read not simply to pass the course!

    • a. A report should include an introduction and a conclusion. Marks are available for these two sections.
    • b. A good report is a narrative; not simply a reporting of what you did.
    • c. Your goal is to communicate your findings not simply to churn out the analyses.
  2. The steps above are components that should be included in the analysis and reporting; how you include them is up to you. Reports that simply use the task descriptions above as headings will lose marks.
  3. Distinction level reports tend to go beyond the specification - adding extra ideas connections /analyses or ways of presenting the data that are not specified above. I view these favourably (as long as they are well done!) but they are not essential.
  4. Put some effort into the layout and presentation - these are easy marks.
  5. Exploratory analysis should be included in the main report where appropriate and where it adds to the narrative. Assumption test output can be included in the appendices as can any exploratory analysis which adds to the story you trying to tell but would clutter up the main body of text.
  6. Strike the right balance between too few and too many charts and tables. One-two per page (depending on size) is a good rule of thumb.
  7. You should, in the conclusions, report on the limitations of the data you have used or on what future studies of the same topic might need to look for.
  8. You should label/number figures and tables fully and appropriately. A general rule of thumb is that a figures and diagrams should be understandable on their own without having to refer to the main text. Figures should be referred to them in the main text by "Figure n" or "Table n" where n is the number of the table or figure in the sequence through the paper. Note that the words "Table" and "Figure" have a capital first letter (as "Table 1" is a pronoun).

(本文出自csprojectedu.com,转载请注明出处)


csprojectedu
751 声望201 粉丝

Microsoft, ACMer, 现BAT全栈工程师。